Abstract
Objective
Research on healthy villages in China is currently constrained by limited evaluation criteria and a lack of systemic comprehensiveness. This study aims to develop a scientifically rigorous evaluation index system that is tailored to the regional characteristics of China.
Method
A modified Delphi method was employed to screen indicators based on literature review and expert consultation, followed by the Analytic Hierarchy Process (AHP) to determine the weights of these indicators. Innovatively, this study adopted a Human-AI synergistic approach throughout the research lifecycle; generative AI was utilized to refine indicator semantics during the Delphi phase, while an LLM-assisted comparative analysis served as a robustness check for the weighting system. Additionally, empirical validation was conducted in three pilot villages.
Results
The final system consists of 7 first-level, 31 s-level, and 61 third-level indicators. Metrics from expert consultations were satisfactory, with authority coefficients exceeding 0.80 and demonstrating strong coordination (P < 0.001). Weight analysis indicated that “Healthy Population” (0.340) and “Healthy Life” (0.228) are the most critical dimensions. The robustness check revealed a strong correlation (Pearson's r = 0.99) between human expert consensus and AI-simulated weights, thereby confirming the system's validity. Furthermore, empirical application in three pilot villages produced distinct scores (97.1, 77.3, and 39.7), which accurately reflected developmental disparities and identified specific weaknesses for targeted intervention.
Conclusion
The constructed index system integrates multi-dimensional health factors with a scientifically rigorous design validated through this Human-AI synergistic strategy. Ultimately, this approach pioneers new pathways for the deep integration of artificial intelligence and public health management, while providing a reference model for developing comprehensive evaluation systems in other developing countries.
Keywords: Healthy villages, Index system, Modified Delphi, Analytic hierarchy process, AI, China
Highlights
-
•
Constructs a multidimensional Healthy Village evaluation system in China.
-
•
Adopts a novel Human-AI synergistic approach integrating Delphi and AHP.
-
•
AI aids semantic refinement and logical verification to ensure objectivity.
-
•
Prioritizes people-centered health outcomes over simple infrastructure tasks.
-
•
Empirical validation confirms the system's exceptional discriminatory power.
1. Introduction
As the largest developing country, China confronts numerous challenges in enhancing rural health [1], [2]. Illness-related poverty and the risk of reverting to poverty remain significant issues in rural areas, impacting a substantial number of households [3], [4], [5], [6], [7]. Disparities in access to medical resources restrict the availability of maternal and child health services, health management for key populations, and other critical healthcare services, thereby exacerbating urban-rural differences in indicators such as life expectancy and under-five mortality [8], [9]. Concurrently, the prevalence of chronic non-communicable diseases, including cardiovascular diseases and diabetes, continues to rise [10], [11]. The COVID-19 pandemic further revealed deficiencies in rural healthcare response capabilities and basic environmental hygiene conditions, such as sewage and waste management [12]. These challenges have impeded agricultural and rural development while jeopardizing the health security of rural residents [13]. In the post-pandemic era, aligning rural revitalization with health protection has emerged as an urgent priority. In this context, the establishment of a clear and systematic evaluation index system for Healthy Villages is crucial for guiding implementation and assessing progress [14], [15].
Globally, the “Healthy Villages” strategy proposed by the World Health Organization (WHO) has evolved into an integrated framework that connects public health enhancement with environmental management [16]. Empirical evaluations reveal that most international Healthy Village programs concentrate on measurable indicators, such as water sanitation, housing conditions, and the control of communicable diseases. Research conducted in Asia, Africa, and Latin America demonstrates significant improvements in village environments and basic health outcomes following these interventions [17], [18], [19], [20], [21]. In China, however, studies on Healthy Villages remain limited and are still in the early stages. Evidence indicates that existing evaluation systems predominantly focus on physical infrastructure, including toilet renovations, waste management, and village cleanliness. Conversely, there has been less emphasis on health governance, access to primary healthcare services, and the specific needs of populations, particularly older adults [22], [23], [24]. Furthermore, substantial regional disparities in economic development and health resources hinder the application of a unified and scientifically validated evaluation standard across the nation. Consequently, assessment results often vary inconsistently among regions [25], [26], [27].
To address these limitations, this study seeks to develop a multidimensional evaluation index system that accurately reflects the realities of rural China while adhering to international public health principles. The selection of appropriate indicators for such a complex system presents significant challenges. Traditional methods, such as the Delphi technique and the Analytic Hierarchy Process, heavily rely on expert judgment, which may introduce subjectivity, particularly when managing extensive qualitative information. Recent research by Vaccaro et al. (2024) indicates that the effectiveness of AI-assisted approaches varies by task. AI-assisted human collaboration demonstrates greater potential in facilitating content refinement, such as text polishing and logical verification, but tends to be less effective in direct decision-making tasks [28]. In light of these findings, this study employs a hybrid human-AI approach. Generative AI assists experts in reviewing indicator descriptions, thereby reducing ambiguity and verifying logical consistency. Additionally, it supports the comparison of weights assigned by experts. To mitigate potential risks associated with AI-assisted decisions, the final determinations regarding indicator selection and weighting are retained by human experts through the Delphi process. This approach aims to enhance transparency and reinforce the scientific and practical value of the Healthy Villages evaluation system.
2. Methods
In this study, a literature review, semi-structured interviews, and expert consultations were employed to establish the foundational framework of the Healthy Villages evaluation indicator system. Subsequently, the Delphi method was utilized for the screening and modification of indicators to finalize the entries at all levels. Finally, the Analytic Hierarchy Process (AHP) was applied to ascertain the weight coefficients of the indicators across all levels. Details regarding the questionnaire can be found in Supplementary material 1. The research spanned 7 months, with indicator screening commencing in November 2023 and concluding with two rounds of Delphi consultation and the construction of the indicator system in May 2024. The theoretical methods and background utilized are detailed below, and a flowchart illustrating the process of indicator construction and weight determination is presented in Fig. 1.
Fig. 1.
Flow diagram of the study process.
2.1. Initial establishment of evaluation index system
The World Health Organization (WHO) defines “Healthy villages” as “a rural community with low incidence of infectious diseases, universal access to basic sanitation facilities and services, a stable and peaceful social environment, and harmonious community development” [29]. Furthermore, it outlines six recommendations for developing an evaluation indicator system [30]. Guided by this definition and framework, we conducted a review of relevant literature to synthesize both domestic and international experiences regarding the definition, planning, construction, evaluation criteria, and index system development of Healthy villages. By integrating these insights with the current status of rural development in China and the specific requirements for establishing a “Healthy China,” we elucidated the connotation and core characteristics of Healthy villages tailored to the national context. We contend that Healthy villages should focus on fostering healthy individuals, with a hygienic environment as the prerequisite, optimized health services as the foundation, and the promotion of a healthy culture as the connecting element. This approach aims to create a comprehensive system that enhances population health through multidimensional and holistic village planning, construction, and management, thereby facilitating the coordinated development of ecological, production, and living systems. Core characteristics of Healthy Villages encompass at least seven dimensions: a robust health governance mechanism, a hygienic and livable environment, a stable and harmonious society, accessible and sustainable health services, continuously improving population health levels, scientific and healthy lifestyles, and an actively promoted health culture.
To ensure the scientific rigor and practical applicability of the indicator system, we integrated macro-theoretical trends identified through bibliometric analysis with micro-level empirical insights obtained from semi-structured interviews, thereby systematically constructing the initial indicator pool. The interviews involved 15 key informants, of whom 5 possessed experience in international rural health projects. These discussions aimed to identify existing gaps in the construction approaches and evaluation methods for Healthy Villages, thereby informing targeted refinements to the indicator pool based on real-world conditions. The interview protocol addressed the current overall status of Healthy Villages construction, existing evaluation methods, and the applicability and measurability of potential indicators. We also incorporated targeted inquiries regarding international experience, including: “Which indicators from foreign Healthy Villages initiatives (such as community health promotion and environmental governance) can be adapted to the actual conditions of rural China?” and “How can foreign experiences (e.g., rural waste classification models and health literacy improvement strategies) be translated into actionable evaluation indicators?” By leveraging the practical insights of informants with experience in international projects, these questions sought to identify experiences that are cost-effective, feasible, and compatible with the realities of rural China. This approach aims to avoid the uncritical replication of foreign models while ensuring that the indicator pool reflects global expertise.
2.2. Literature analysis method
Bibliometric analysis is a quantitative methodology based on mathematics and statistics that clarifies developmental trends and research hotspots within a specific subject area by examining the quantitative characteristics and underlying patterns of the literature [31]. CiteSpace software enables bibliometric analysis of literature in a designated field through co-citation analysis and path-finding network algorithms. It systematically delineates the knowledge structure, research hotspots, and emerging trends in the research domain by employing analytical techniques such as scientific knowledge mapping, keyword co-occurrence, and emergence [32]. In this study, we employed CiteSpace 6.2.R2 software to conduct a bibliometric analysis of the literature, utilizing both internationally recognized and influential Chinese and English databases as sources. The English literature was retrieved from the Web of Science Core Collection database, where we utilized the Advanced Search function to configure the Fielded Search settings. The search criteria were established by setting the “Topic” field to include “Healthy Village” OR “Healthy Countryside,” with the publication date range specified from January 1, 1990, to March 31, 2024. After executing the search, we refined the results by selecting “Article” under Document Types, yielding 1206 valid documents, which were exported in “Plain text file” format. The Chinese literature was obtained from the China National Knowledge Infrastructure (CNKI) database by selecting the Advanced Search mode. The retrieval conditions were defined by selecting “Subject” and using the original Chinese search terms: “健康农村” (Healthy rural area), “健康村” (Healthy village), “健康村镇” (Healthy towns and villages), and “健康乡村” (Healthy countryside), with the publication year range set from 1990 to 2024. Since the data retrieval was conducted and finalized on March 31, 2024, the effective coverage period corresponds with that of the Web of Science dataset. We concurrently applied filters for “Full text only” and “Chinese-English expansion,” limiting the source category strictly to CSSCI journals. During the screening process, we excluded publications lacking academic rigor, such as newspapers, conference proceedings, and books, resulting in a total of 487 valid documents exported in “Refworks” format. The primary objective of this bibliometric analysis was to identify macro trends, core dimensions, and emerging topics within global Healthy Villages research, thereby providing theoretical support for the development of an initial indicator pool, rather than directly extracting specific indicators. Through keyword co-occurrence, clustering, and burst analysis, we delineated the fundamental dimensions of Healthy Villages research to ensure that the indicator system aligns with the developmental patterns of the field. Furthermore, by incorporating localized characteristics of domestic research, such as rural aging and the rural revitalization strategy, the indicator framework maintains an international perspective while also reflecting the actual conditions of rural China.
2.3. Modified Delphi method
The Delphi method entails gathering expert opinions through multiple rounds of correspondence, followed by the summarization and organization of feedback from each round. This iterative process facilitates the attainment of more consistent and reliable conclusions [33]. In this study, a modified Delphi approach is employed, as the initial indicators were derived from literature and interviews rather than through open-ended expert contributions. The selection of experts adhered strictly to the principles of authority, representativeness, and professionalism, with the following specific inclusion and exclusion criteria:
Inclusion criteria:
-
(1)
Engagement in professional fields related to Healthy Village construction, including patriotic health, environmental health, health education, disease control, maternal and child health, rural development, health economics, and social security;
-
(2)
Possession of a bachelor's degree or higher, or an equivalent professional and technical title;
-
(3)
A minimum of 10 years of relevant work experience, with employment in government departments (such as health, ecological environment, agriculture and rural areas, housing and urban-rural development), scientific research institutions, grassroots rural governance departments, and medical institutions;
-
(4)
Holding a senior technical title (e.g., professor, researcher, chief physician, senior engineer) or a core management position in related fields;
-
(5)
Voluntary participation in two rounds of expert consultation, with a commitment to provide truthful and constructive feedback;
-
(6)
Preference is given to experts with international research or practical experience in Healthy Villages or rural health, such as involvement in WHO Healthy Villages projects or international rural health cooperation initiatives, to ensure the incorporation of global perspectives.
Exclusion criteria:
-
(1)
employment in fields not directly associated with rural health or the construction of Healthy Villages;
-
(2)
refusal to sign the informed consent form or withdrawal from the consultation process;
-
(3)
submission of questionnaires containing numerous missing items, logical inconsistencies, or other invalid responses.
Prior to the consultation, an electronic informed consent form was sent to experts as an attachment to the questionnaire, clearly outlining the research purpose, the consultation process (approximately 30 min per round), potential risks (which are negligible), research benefits (providing references for the formulation of Healthy Villages policies), and confidentiality commitments (experts' personal information will be used solely for academic analysis and published anonymously). The completion and return of the questionnaire by experts were considered as voluntary participation in this study. Experts were consulted via email in two rounds. Following the first round of consultation, the evaluation indicators were revised based on the experts' feedback, resulting in the second round of the expert consultation form. Experts were then asked to screen and rate the indicators in the same manner. If expert opinions converge by the conclusion of the two rounds of consultation, the survey will be closed, and the final evaluation indicators will be established.
In this study, the significance and feasibility of evaluation indicators were assessed using a 5-point Likert scale [34]. The mean and standard deviation of the importance ratings were employed to reflect the concentration of expert opinions. Furthermore, the experts' positive coefficient, authority coefficient, and coordination coefficient were utilized to evaluate the reliability and scientific rigor of the expert consultation [35]. The experts' positive coefficient was represented by the questionnaire recovery rate, which exceeded 70%, indicating a high level of motivation among experts and their recognition of the study's value. The authority coefficient (Cr) was calculated based on the experts' judgment of the indicators (Ca) and their familiarity (Cs), expressed as Cr = (Ca + Cs)/2. A Cr value of ≥0.7 signifies a higher degree of expert authority, with values approaching 1 indicating stronger expert authority [36]. The experts' judgment basis (Ca) and familiarity (Cs) were quantified using a 3-point scale. The scoring standard for Ca was defined as follows: 1 = low (based on intuitive judgment), 2 = medium (based on partial literature or work experience), and 3 = high (based on sufficient literature support and extensive practical experience). The final Ca value was calculated as the mean of all experts' scores. The scoring standard for Cs was similarly defined: 1 = unfamiliar, 2 = fairly familiar, and 3 = very familiar, with the mean of experts' scores also serving as the final Cs value. The degree of dispersion in the experts' opinions was expressed by the coefficient of variation (CV), while the degree of coordination was represented by Kendall's W (KW) coefficient. The KW value ranges from 0 to 1, with values closer to 1 indicating greater consistency among experts' opinions. According to conventional standards for constructing evaluation index systems in the health field, when KW ≥ 0.2 and p < 0.05 via the χ2 test, the experts' opinions can be deemed coordinated and reliable, thereby meeting the consensus requirement for indicator screening. In this study, indicators were included only if they met two criteria: a mean score of ≥4 for both importance and feasibility, and a coefficient of variation (CV) of <0.25. This approach ensured that the selected indicators were both significant and practically applicable. All data were processed and analyzed using R 4.3.2 software.
In the qualitative analysis phase of the Delphi consultation, we employed a Generative AI model (GPT-4o-mini) to assist in processing the open-ended modification suggestions provided by experts. This strategy aimed to reduce the potential subjectivity bias associated with manual summarization and to improve the semantic clarity of the indicators. As illustrated in Fig. 2, the specific operational process included the following steps: (1) Data Pre-processing: The textual comments from experts regarding the “definition,” “modifiability,” and “measurability” of the indicators were extracted and anonymized to create a textual dataset. (2) Prompt Engineering with Few-Shot Learning: Building on the methodology of Hui et al. (2026) [37], which demonstrated that large language models (LLMs) utilizing few-shot learning excel in solution clarity (RMSE = 0.66) and actionable feedback, we developed a structured prompt. This prompt instructed the AI to function as a “methodological consultant” to categorize the experts' fragmented comments into three classifications: “Delete,” “Merge,” or “Refine,” and to generate standardized definitions for indicators that lacked precision (e.g., transforming qualitative indicators into quantitative indicators). (3) Human-in-the-Loop Validation: To ensure reliability, the research team reviewed the AI-generated suggestions. We compared the AI-generated indicator refinements with the original expert comments to confirm consistency. Only those refinements that were both logically sound and true to the experts' original intent were incorporated into the final index system.
Fig. 2.
Generative Al-assisted analysis workflow in Delphi consultation.
2.4. Analytic hierarchy process (AHP)
Hierarchical analysis is a method employed to assess the relative significance of evaluation elements through class analysis and problem stratification [38]. This process involves evaluating the importance of each element at various levels and subsequently synthesizing the judgments of experts to establish an overall hierarchy of relative importance for each element in the decision-making process. In this study, 20 experts who participated in the second round of the Delphi method consultation were invited to assess the significance of the indicators within each matrix using the Saaty 1–9 scale [39], [40]. The weight vector was derived using the geometric mean method, which facilitated the calculation of the weights corresponding to the experts' judgments on each factor. This calculation was executed using Excel 2021 with the following formula:
| (1) |
Where aij represents the relative importance of the comparisons between the i and j indicators in the hypothesized judgment matrix A, with i and j denoting the respective indicators, and n indicating the total number of indicators. The weight vector WI is derived using the geometric mean method. This process involves multiplying the elements of the hypothesized judgment matrix A row-wise to generate a new column vector. Each component of this new vector is then raised to the nth power, and the weight vector is obtained by normalizing the resulting column vector.
To ensure the reliability and validity of the evaluation matrix, it is essential to assess the consistency of the judgment matrix. Specifically, the consistency judgment indicator CR is calculated, with the formula as follows:
| (2) |
| (3) |
Where λmax represents the largest eigenvalue of the judgment matrix, RI is introduced to mitigate the issue of the consistency index (CI) increasing significantly with the matrix order n. The value of RI can be obtained from a reference table. A consistency ratio (CR) of ≤0.10 indicates that the judgment matrix is largely consistent, whereas a CR > 0.10 suggests that the judgment matrix should be adjusted based on the specific circumstances [40], [41].
2.5. AI-assisted comparative validation protocol
To assess the robustness of the weights derived from human experts, we employed a generative AI model based on the GPT-4 architecture, functioning as a “virtual expert panel.” To ensure that the AI's decision-making was grounded in the specific context of this study rather than relying solely on general training data, we adopted a knowledge-augmented approach. As illustrated in Fig. 3, the protocol comprised the following steps: (1) Role Definition: The AI was instructed to assume the role of a senior domain expert in public health and rural development, ensuring that its reasoning adhered to professional standards and ethical considerations. (2) Contextual Knowledge Integration: To enhance the credibility and specificity of the analysis, we input a curated corpus of domain-specific materials into the model's context window prior to scoring. This corpus included: (a) over 20 relevant policy documents issued by national and provincial authorities concerning Healthy Villages construction; (b) more than 10 seminal domestic and international academic references; and (c) key transcripts from semi-structured interviews and summaries of successful practical experiences. This step ensured that the AI possessed the requisite “prior knowledge” comparable to that of the human experts. (3) Blind Independent Scoring: In accordance with a blind protocol, the AI conducted independent pairwise comparisons of the indicators utilizing Saaty's 1–9 scale, relying solely on the provided knowledge base without access to the final results of human experts. The AI concentrated on generating local weights, which represent the weight of an indicator in relation to its direct parent level. (4) Statistical Correlation Analysis: The local weights derived by the AI were compared to those derived by humans through Pearson correlation analysis (r) and Root Mean Square Error (RMSE) to evaluate the overall structural validity of the proposed system. (5) Adjudication of Discrepancies: A mechanism was established to address divergences. We defined a “significant discrepancy” as an absolute difference between human and AI weights exceeding 0.05 (|WHuman - WAI| > 0.05). Indicators that met this threshold underwent a theoretical re-examination by the research team. Importantly, the consensus of human experts was upheld as the “Ground Truth” for the final evaluation system. This decision aligns with the “Human-in-the-Loop” (HITL) principle, recognizing that while AI excels in data logic, human experts offer irreplaceable insights in interpreting complex social contexts, cultural nuances, and localized implementation feasibility—elements that are often implicit and difficult for AI to fully comprehend.
Fig. 3.
The comparative validation framework for AI-derived and human-derived weights.
2.6. Empirical application
To assess the applicability and scientific validity of the constructed evaluation index system, we conducted an empirical study from October to November 2024 in Fujian Province, China. The choice of Fujian as the empirical setting was intentional. As a National Comprehensive Pilot Province for Healthcare Reform and a leader in implementing the ‘Healthy China 2030’ strategy, Fujian serves as a benchmark for rural health governance. Thus, evaluating Healthy Village construction in this region provides essential insights into the future trajectory of national standards. In this innovative context, we selected three representative villages to illustrate the complexity of rural development: Village A, a southern coastal demonstration village distinguished by advanced economic development; Village B, located in the eastern inland region with a developing economy and average infrastructure; and Village C, a remote village in the northwestern mountainous area characterized by relatively weak health foundations (Fig. 4).
Fig. 4.
Location map of test evaluation villages for the comprehensive evaluation index system of healthy villages in Fujian Province, China.
The rationale for this multi-site selection is its capacity to comprehensively capture the diverse geographical contexts, socio-economic levels, and health foundations inherent to the region. By encompassing the geographic variety of southern coastal areas, the eastern inland region, and northwestern mountainous areas, while also addressing the developmental spectrum from economically advanced zones to developing villages and resource-constrained remote regions, this intentional selection ensures that the study rigorously validates the applicability and scientific robustness of the evaluation index system across different rural scenarios. Data for this study were collected through on-site investigations conducted by professionals commissioned from the local Center for Disease Control and Prevention (CDC). The final comprehensive score for each village was calculated using the linear weighting method, as detailed in Eq. (4):
| (4) |
Where S represents the total comprehensive evaluation score of the Healthy Villages. The variable n denotes the total number of third-level indicators. Wi signifies the combined weight of the i-th third-level indicator, which is derived solely from the human expert panel using the AHP method. Pi represents the standardized evaluation score assigned to the i-th indicator, based on the results of the on-site investigation.
3. Results
3.1. Literature analysis
Bibliometric analysis indicates a consistent upward trajectory in global research activity related to “Healthy Villages” over the past three decades, with a significant increase observed in the last ten years (Fig. 5). Keyword co-occurrence (Table 1 and Fig. 6A/B) and clustering analyses (Fig. 6C/D) reveal that, while “health” remains the central theme, research emphases vary considerably by region. In China, researchers primarily focus on health services and vulnerable populations, such as the elderly and left-behind children, often in conjunction with evaluations of the “New Rural Cooperative Medical System” and rural revitalization strategies. In contrast, international studies tend to emphasize specific risk factors, including obesity and air pollution, as well as the environmental impacts on disease prevalence. Additionally, burst detection (Fig. 6E/F) and timeline analyses (Fig. 7) illustrate a global paradigm shift marked by a transition from clinical disease treatment to public health management, health promotion, and preventive measures. This shift underscores the necessity for a comprehensive and systematically constructed indicator framework to guide the development of Healthy Villages.
Fig. 5.
Annual Distribution of CNKI and WOS Literatures. The trend lines reflect a growing global interest in “Healthy Villages” over the past three decades, with a marked surge in activity during the last ten years. Domestically, despite a later inception and lower total volume compared to international studies, the annual publication frequency demonstrates a steady upward trajectory.
Table 1.
Top 10 Keywords in CNKI and WOS Literatures.
| CNKI |
WOS |
||||
|---|---|---|---|---|---|
| Rank | Frequency | Keyword | Rank | Frequency | Keyword |
| 1 | 76 | Health | 1 | 77 | health |
| 2 | 74 | Psychological Health | 2 | 66 | prevalence |
| 3 | 42 | Rural Elderly | 3 | 58 | risk |
| 4 | 38 | Rural | 4 | 46 | risk factors |
| 5 | 38 | Rural Residents | 5 | 44 | children |
| 6 | 30 | Left-behind Children | 6 | 31 | disease |
| 7 | 17 | Health Status | 7 | 29 | community |
| 8 | 16 | Rural Revitalization | 8 | 29 | physical activity |
| 9 | 16 | Physiological Health | 9 | 29 | impact |
| 10 | 15 | Health Insurance | 10 | 28 | association |
Fig. 6.
Bibliometric Analysis of Keywords in CNKI (A, C, E) and WOS (B, D, F) Literature: Co-occurrence, Clustering, and Citation Bursts. Keyword Co-occurrence: A. Highlights the core status of “Psychological Health” and “Public Health Services”, with a specific focus on vulnerable demographics such as the “Left-behind Elderly” and “Left-behind Children”. B. Centers on epidemiology, extensively exploring “Prevalence”, “Risk Factors”, and “Mortality”, while also emphasizing “Physical Activity” and disease “Management” at the community level. Keyword Clustering: Both datasets show high cluster credibility (Silhouette >0.7). C. Reveal a focus on social and economic determinants, including “mental health of left-behind children”, “financial risks”, and “health equity”. D. Prioritize environmental and lifestyle challenges, notably “obesity”, “food safety”, “climate change”, and “air pollution”. Citation Bursts: Temporal analysis indicates distinct evolutionary paths. E. Reflect China's socio-economic transitions, moving from “Health Security” (2004–2006) to policy reforms like “NRCMS” (2013–2019) and demographic shifts involving “Outmigration” and “Left-behind Elderly”. F. Demonstrate a global paradigm shift from clinical treatment (“Coronary Heart Disease”, “Cancer”) to preventive strategies, highlighted by the recent surge in “Overweight”, “Nutrition”, and “Health Promotion”.
Fig. 7.
Timeline Display of Keywords in CNKI(A)and WOS(B) Literatures. The timeline view delineates the evolutionary trajectory and inheritance relationships of research hotspots. A. CNKI: Domestic research shows a clear policy-driven evolution. The focus shifted from general “Health” strategies (2011–2014) to the evaluation of the “New Rural Cooperative Medical Scheme (NRCMS)” (2013–2019). Most recently (2020–2024), “Rural Revitalization” and “Rural Elderly” have become dominant nodes, reflecting the strategic integration of health into rural development and the urgent response to population aging. B. WOS: International research exhibits a transition from foundational concepts to lifestyle challenges. Early clusters around “Health” (1999) focused on bridging the urban-rural gap. A significant pivot occurred around 2012 with the emergence of “Overweight,” highlighting the impact of globalization and lifestyle transitions (dietary changes) on rural populations. Synthesis: Collectively, these timelines illustrate a dynamic progression from basic disease prevention to a multidimensional approach encompassing health promotion, environmental factors, and social determinants.
3.2. Development of the initial indicator pool
The bibliometric analysis indicated a significant paradigm shift in global Healthy Villages research, transitioning from a focus on specific disease treatment to a broader emphasis on comprehensive health promotion. This shift provides theoretical support for the extensive indicator framework. Keyword timeline analysis revealed that domestic research hotspots evolved from the “New Rural Cooperative Medical Scheme (NRCMS)” to “Rural Revitalization,” underscoring the critical role of governance mechanisms in the construction of Healthy Villages and directly supporting the establishment of the first-level indicator, “Health Mechanism.” High-frequency keywords identified in the co-occurrence analysis, such as “environmental factors,” “air pollution,” and “overweight,” underscored the necessity of the “Healthy Environment” and “Healthy Life” dimensions, aligning with the World Health Organization (WHO) assertion that “a healthy environment is the foundation of Healthy Villages.” Additionally, keyword clustering analysis indicated that domestic research prioritizes “health of vulnerable populations” and “rural risks,” while international research focuses on “community health” and “health promotion.” This distinction supports the establishment of dimensions such as “Healthy Population,” “Healthy Society,” and “Healthy Culture.” Finally, keyword burst analysis revealed a sustained emphasis on terms like “public health services,” “health equity,” and “nutrition,” further affirming the central importance of the “Healthy Services” dimension. Semi-structured interviews provided essential micro-level empirical grounding to translate broad concepts into specific, operable indicators. The interview results indicated that foreign experiences, such as “community-participatory health management” and “environmental sustainability indicators” (e.g., sewage resource utilization rate), can be adaptively localized and were ultimately incorporated as third-level indicators within the “Healthy Society” and “Healthy Environment” dimensions. Feedback from village committee staff regarding the ineffectiveness of sporadic health campaigns underscored the critical need for “Health Mechanism” indicators that emphasize long-term organizational structure and funding stability—an often-overlooked practical concern in theoretical models. Consequently, third-level indicators such as “Establishment of a Leading Group for the Construction of Healthy Villages” and “Village Regulations Related to Promoting Health” were developed. Local health providers stressed that physical infrastructure alone is inadequate without addressing “soft” barriers, such as low health literacy. This insight directly informed the specific content of the “Healthy Culture” dimension to monitor behavioral changes, including indicators like “Frequency of Health Literacy and Hygiene Behavior-Related Publicity” and “School Health Education Class Opening Rate.” Informants' concerns regarding the “last mile” of medical care accessibility specifically refined the “Healthy Services” indicators to encompass village-level resource allocation, such as “Number of Practicing (Assistant) Physicians per 1,000 Resident Population” and “30-Minute Basic Medical and Health Service Coverage Rate.”
This study integrates macro-theoretical trends with micro-empirical evidence to conclude that the concept of Healthy Villages should prioritize the cultivation of healthy individuals. This approach necessitates the creation of a healthy environment as a prerequisite, the optimization of health services as a foundation, the promotion of a healthy culture as a connecting element, and the establishment of a comprehensive system to facilitate the healthy development of the population. Such efforts should be realized through multidimensional and holistic village planning, construction, and management, ultimately fostering the coordinated development of ecological, production, and living systems. Consequently, the fundamental characteristics of Healthy Villages should encompass at least seven dimensions: a robust and comprehensive health mechanism, a hygienic and livable health environment, a stable and harmonious healthy society, continuous and accessible health services, steadily improving health levels, a scientific and healthy lifestyle, and an actively promoted healthy culture. To operationalize this framework, semi-structured interviews and expert consensus meetings were conducted to identify characteristic indicators, resulting in a pool of evaluation metrics. This pool includes seven dimensional indicators—health mechanism, healthy environment, healthy society, healthy services, healthy people, healthy life, and healthy culture—comprising 7 first-level indicators, 31 s-level indicators, and 101 third-level indicators (Supplementary2 Table S1).
3.3. Basic information on experts
The majority of the 22 experts invited to participate in this study's consultation were affiliated with government departments (63.64%). Their research areas spanned a diverse array of disciplines, including patriotic hygiene (13.64%), environmental sanitation (13.64%), health education (13.64%), and public health (9.09%). Most experts were aged between 40 and 50 years (81.82%), with work experience primarily ranging from 20 to 30 years (72.73%). The educational qualifications of the experts were predominantly at the master's degree level (77.27%), and most held senior technical positions (59.09%) (Supplementary2 Table S2).
3.4. The activity and authority coefficient of experts
In the first round of expert consultation, 22 questionnaires were distributed, and all were returned, resulting in an effective recovery rate of 100% (22/22). In the second round, 22 questionnaires were issued, with 20 returned, yielding an effective recovery rate of 90.91% (20/22). The positive coefficients for both rounds of expert consultation were 100% and 90.91%, respectively, both exceeding the threshold of 70%. During the first round, 7 experts provided suggestions regarding the elements and connotations of the indicators, while in the second round, 3 experts contributed, representing 31.82% (7/22) and 13.64% (3/22) of the total number of consulting experts, respectively. This reflects a high level of interest among the experts in the study and their enthusiasm for participating in the survey. The expert judgment basis coefficients (Ca) for the consulting experts in the two rounds of evaluation were 0.90 and 0.91, while the familiarity coefficients (Cs) were 0.71 and 0.74, and the authority coefficients (Cr) were 0.81 and 0.83, all exceeding 0.7. These results indicate that the experts were well-acquainted with the consultation content and possessed a high degree of authority, thereby ensuring the reliability of the research findings. (Supplementary2 Table S3).
3.5. The coordination and concentration degree of expert opinions
In the first round, the scores for importance and feasibility, along with the concentration of experts' opinions, were 4.29 ± 0.79 and 4.34 ± 0.79, respectively. The coefficients of variation were 0.18 for both measures, indicating values below the threshold of 0.25. The coefficients of concordance were 0.463 and 0.388, with p-values from the chi-square (χ2) tests both less than 0.001. In the second round, the scores for importance and feasibility, as well as the concentration of experts' opinions, increased to 4.40 ± 0.72 and 4.47 ± 0.67, respectively. The coefficients of variation improved to 0.16 and 0.15. The coefficients of concordance were 0.340 and 0.351, and the p-values from the χ2 tests remained below 0.001 (Table 2 and Supplementary2 Tables S4 ∼ S5).
Table 2.
The coordination and concentration degree of expert opinions.
| Project | Importance |
Feasibility |
||
|---|---|---|---|---|
| Round 1 | Round 2 | Round 1 | Round 2 | |
| N | 139 | 110 | 139 | 110 |
| Mean ± SD | 4.29 ± 0.79 | 4.40 ± 0.72 | 4.34 ± 0.79 | 4.47 ± 0.67 |
| Cv | 0.18 | 0.16 | 0.18 | 0.15 |
| Kendall's W | 0.463 | 0.340 | 0.388 | 0.351 |
| χ2 | 1404.637 | 740.920 | 1178.722 | 765.205 |
| df | 138 | 109 | 138 | 109 |
| p | <0.001 | <0.001 | <0.001 | <0.001 |
3.6. Establishment of the index system
Two rounds of consultation led to the removal of 40 tertiary indicators, each with mean importance or feasibility scores below 4, as well as the renaming of one secondary indicator and 14 tertiary indicators. Throughout the consultation process, we utilized large language models to perform in-depth analyses of indicators identified by experts as “controversial,” “highly subjective,” or “conceptually ambiguous,” and we proposed recommendations for restructuring. All optimized content ultimately received approval from the expert panel. Specifically, in response to experts' concerns regarding difficult-to-quantify indicators, the LLM analysis determined that many indicators within the Healthy Environment dimension were excessively subjective and challenging to evaluate accurately. It recommended transforming vague qualitative descriptions into concrete, verifiable composite quantitative metrics. For instance, indicator C2.5.2, “Absence of black and odorous water,” was revised to “Rural black and odorous water elimination rate (100%) + Water quality compliance rate (≥ Class III surface water standard).” Additionally, concerning issues of conceptual ambiguity and poor logical consistency in certain indicators, it was observed that the phrasing of indicator B1.2, “Health Planning,” is ambiguous and may be misinterpreted as exclusively pertaining to plans for enhancing healthcare service levels. It is advisable to modify the term to “Construction Planning” to better represent the comprehensive planning necessary for healthy village development, while also enhancing its alignment with the tertiary indicator C1.2.1, which pertains to the “Possession of a Healthy Village Development Plan.” To address discrepancies between certain indicators and national standards or specifications, it is suggested that C4.1.2, currently labeled “Number of General Practitioners,” be revised to “Number of Practicing (Assistant) Physicians per 1,000 Permanent Residents.” This change aligns with China's specific classification system for rural medical personnel and ensures that the indicator remains measurable and consistent with publicly available government data, such as the National Health Statistics Yearbook.
The finalized evaluation index system for Healthy Villages consists of seven first-level indicators, 31 s-level indicators, and 61 third-level indicators (Supplementary2 Table S6). To address the complexities of rural health, we designed detailed metrics within each dimension: Health Mechanism (A1): This dimension serves as the governance foundation and includes organizational guarantees (B1.1), such as the “Establishment of a Leading Group” (C1.1.2) to ensure leadership accountability, as well as institutional norms like “Village Regulations” (C1.3.1) to regulate daily health behaviors at the grassroots level. Healthy Environment (A2): Beyond basic sanitation, this dimension emphasizes sustainable ecology. We established indicators for both “Domestic Waste” (B2.1) and “Production Waste” (B2.2). Notably, the third-level indicator “Sewage Resource Utilization Rate” was prioritized over simple discharge metrics, reflecting a shift toward resource recycling in rural settings. Healthy Services (A4): This dimension focuses on accessibility and equity, including the “30-minute Basic Medical Service Coverage Rate” (C4.1.2) to quantify the “last mile” of rural healthcare access, ensuring that medical resources are physically reachable for villagers. Healthy People (A5) and Healthy Life (A6): These core dimensions track outcomes and drivers. A5 emphasizes objective health outcomes such as “Life Expectancy” and “Chronic Disease Prevalence,” whereas A6 concentrates on health-related software, particularly “Health Literacy” (B6.1), which is regarded as essential for fostering long-term behavioral change. Healthy Culture (A7) diverges from conventional evaluations that solely assess infrastructure; this dimension explicitly incorporates “Villagers' Participation Rate in Health Activities” (C7.2.1). This particular sub-indicator, developed through expert consensus, functions as a vital proxy for assessing the intrinsic motivation of rural residents.
3.7. Analytic hierarchy process
The maximum eigenvalue λmax of the first-level indicator matrix is 7.3120, with a consistency ratio (CR) of 0.0382. The maximum eigenvalue λmax of the second-level indicator matrix ranges from 2.0000 to 7.4221, while the consistency ratio CR varies between 0.0079 and 0.0888. For the third-level indicator matrix, the maximum eigenvalue λmax falls within the range of 2.0000 to 4.1120, and the consistency ratio CR spans from 0.0000 to 0.0419. All judgment matrices exhibit CR values less than 0.10, thereby satisfying the criteria and indicating that the weighting results are reliable (Supplementary2 Tables S7 ∼ S9). The weighting results for the first-level indicators, listed in descending order, are as follows: Healthy Population (34.04%), Healthy Life (22.82%), Healthy Service (16.00%), Healthy Environment (11.89%), Healthy Society (7.31%), Healthy Mechanism (4.43%), and Healthy Culture (3.51%). The weighting values for the second- and third-level indicators are presented in Table 3.
Table 3.
The weight values of three-level index system.
| First-level indexes | Weight | Second-level indexes | Weight | Third-level indexes | Weight | Combined Weight |
|---|---|---|---|---|---|---|
| A1 | 0.0443 | B1.1 | 0.1634 | C1.1.2 | 1.0000 | 0.0072 |
| B1.2 | 0.2970 | C1.2.1 | 1.0000 | 0.0131 | ||
| B1.3 | 0.5396 | C1.3.1 | 0.3333 | 0.0080 | ||
| C1.3.2 | 0.6667 | 0.0159 | ||||
| A2 | 0.1189 | B2.1 | 0.1674 | C2.1.1 | 0.4846 | 0.0096 |
| C2.1.2 | 0.1302 | 0.0026 | ||||
| C2.1.4 | 0.1663 | 0.0033 | ||||
| C2.1.5 | 0.2189 | 0.0044 | ||||
| B2.2 | 0.1032 | C2.2.1 | 0.0982 | 0.0012 | ||
| C2.2.3 | 0.2733 | 0.0034 | ||||
| C2.2.4 | 0.4860 | 0.0060 | ||||
| C2.2.5 | 0.1426 | 0.0017 | ||||
| B2.3 | 0.3988 | C2.3.1 | 0.6667 | 0.0316 | ||
| C2.3.2 | 0.3333 | 0.0158 | ||||
| B2.4 | 0.2751 | C2.4.1 | 0.3333 | 0.0109 | ||
| C2.4.2 | 0.6667 | 0.0218 | ||||
| B2.5 | 0.0555 | C2.5.1 | 0.4668 | 0.0031 | ||
| C2.5.2 | 0.2776 | 0.0018 | ||||
| C2.5.5 | 0.1603 | 0.0011 | ||||
| C2.5.7 | 0.0953 | 0.0006 | ||||
| A3 | 0.0731 | B3.1 | 0.0900 | C3.1.2 | 1.0000 | 0.0066 |
| B3.2 | 0.2636 | C3.2.1 | 0.4000 | 0.0077 | ||
| C3.2.2 | 0.4000 | 0.0077 | ||||
| C3.2.3 | 0.2000 | 0.0039 | ||||
| B3.3 | 0.3684 | C3.3.1 | 0.3333 | 0.0090 | ||
| C3.3.2 | 0.6667 | 0.0180 | ||||
| B3.4 | 0.1951 | C3.4.1 | 0.1634 | 0.0023 | ||
| C3.4.2 | 0.2970 | 0.0042 | ||||
| C3.4.4 | 0.5396 | 0.0077 | ||||
| B3.5 | 0.0829 | C3.5.2 | 1.0000 | 0.0061 | ||
| A4 | 0.1600 | B4.1 | 0.2705 | C4.1.1 | 0.5000 | 0.0216 |
| C4.1.2 | 0.2500 | 0.0108 | ||||
| C4.1.3 | 0.2500 | 0.0108 | ||||
| B4.2 | 0.4021 | C4.2.1 | 0.5000 | 0.0322 | ||
| C4.2.2 | 0.2500 | 0.0161 | ||||
| C4.2.4 | 0.2500 | 0.0161 | ||||
| B4.3 | 0.0433 | C4.3.1 | 1.0000 | 0.0069 | ||
| B4.4 | 0.1245 | C4.4.2 | 0.4000 | 0.0080 | ||
| C4.4.3 | 0.2000 | 0.0040 | ||||
| C4.4.4 | 0.4000 | 0.0080 | ||||
| B4.5 | 0.0827 | C4.5.1 | 1.0000 | 0.0132 | ||
| B4.6 | 0.0769 | C4.6.1 | 1.0000 | 0.0123 | ||
| A5 | 0.3404 | B5.1 | 0.3589 | C5.1.1 | 1.0000 | 0.1222 |
| B5.2 | 0.2438 | C5.2.1 | 0.4000 | 0.0332 | ||
| C5.2.2 | 0.4000 | 0.0332 | ||||
| C5.2.3 | 0.2000 | 0.0166 | ||||
| B5.3 | 0.1629 | C5.3.1 | 1.0000 | 0.0554 | ||
| B5.4 | 0.1002 | C5.4.1 | 1.0000 | 0.0341 | ||
| B5.5 | 0.0554 | C5.5.1 | 1.0000 | 0.0189 | ||
| B5.6 | 0.0558 | C5.6.1 | 0.7500 | 0.0143 | ||
| C5.6.2 | 0.2500 | 0.0048 | ||||
| B5.7 | 0.0230 | C5.7.5 | 1.0000 | 0.0078 | ||
| A6 | 0.2282 | B6.1 | 0.6667 | C6.1.1 | 1.0000 | 0.1521 |
| B6.2 | 0.3333 | C6.2.1 | 0.4747 | 0.0361 | ||
| C6.2.2 | 0.1630 | 0.0124 | ||||
| C6.2.3 | 0.1072 | 0.0082 | ||||
| C6.2.4 | 0.2551 | 0.0194 | ||||
| A7 | 0.0351 | B7.1 | 0.5396 | C7.1.1 | 0.7500 | 0.0142 |
| C7.1.3 | 0.2500 | 0.0047 | ||||
| B7.2 | 0.2970 | C7.2.1 | 1.0000 | 0.0104 | ||
| B7.3 | 0.1634 | C7.3.2 | 1.0000 | 0.0057 |
3.8. Robustness check: comparison between human experts and AI
The AI-simulated scoring results exhibited a high degree of convergence with the consensus of human experts, as evidenced by the absolute weight difference for all indicators remaining within an acceptable threshold (|WHuman - WAI| < 0.05) (Supplementary2 Tables S10): (1) Consistency in Macro-Level Judgment: As illustrated in Fig. 8A (Radar Chart), the distribution of weights for the seven first-level indicators demonstrated significant structural similarity. Both the human panel and the AI model identified “Healthy Population (A5)” and “Healthy Lifestyle (A6)” as core dimensions. The overlapping polygons confirm that the fundamental orientation of the evaluation system is widely acknowledged. (2) Correlation of Micro-Level Indicators: At the third level, we compared the local weights of all 61 indicators. As shown in Fig. 8B (Scatter Plot), the data points are closely clustered around the diagonal, yielding a Pearson correlation coefficient of 0.99. This finding indicates that, for the vast majority of indicators, the AI's independent judgments align with those of human experts. (3) Divergence Analysis: Despite the overall consistency, subtle differences emerged within specific sub-groups. As depicted in Fig. 8C (Panel Chart), within the “Health Behavior (B6.2)” group, the AI assigned higher local weights to objective metrics such as “Smoking Rate (C6.2.1),” whereas human experts placed greater emphasis on participatory metrics like “Fitness Activities (C6.2.4).” This observation suggests that while AI excels at prioritizing quantifiable hard data, human experts provide unique value by placing greater weight on “soft” cultural and behavioral factors.
Fig. 8.
Comparative Analysis of Indicator Weighting Profiles: Human Experts vs. AI Simulation. A. Presents a radar chart that compares the weights assigned to the seven first-level indicators. The areas of overlap indicate a significant consensus between Human Experts (Blue) and the AI Model (Red) regarding the macro-level framework. B. Displays a scatter plot for the correlation analysis of local weights associated with third-level indicators. The x-axis represents weights assigned by human experts, while the y-axis depicts weights generated by the AI. The high Pearson correlation coefficient (r = 0.99) underscores the robustness of the expert scoring system. C. Compares local weights across four representative second-level indicator groups. The bar charts reveal specific divergences, highlighting the AI's inclination toward quantitative metrics (e.g., Smoking Rate) in contrast to the human preference for behavioral and participatory metrics. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
3.9. Results and verification
The comprehensive evaluation results for the three pilot villages are presented in Fig. 9. Village A achieved an impressive total score of 97.1, categorizing it as a “Level 1 Healthy Village.” As a provincial demonstration site, it demonstrated exemplary performance across all dimensions, particularly attaining near-perfect scores in Health Governance (A1) and Health Environment (A2). Village B recorded a total score of 76.1, placing it in the “Developing” stage. Although it exhibited solid foundations in hard infrastructure, the evaluation identified relative weaknesses in “soft” dimensions, such as Health Governance (A1) and Health Culture (A7), highlighting the need for enhanced management mechanisms and increased resident participation. Village C scored 39.3, significantly trailing behind the other pilot sites. The heat map underscores critical deficiencies in Health Environment (A2) and Health Services (A4), accurately reflecting its ongoing challenges with basic sanitation infrastructure and access to medical resources. Verification Conclusion: The notable score stratification, ranging from 97.1 to 39.3, corresponds closely with the actual socioeconomic status and public reputation of these villages.
Fig. 9.
Heatmap visualization of comprehensive evaluation scores for the three case villages across seven dimensions.
4. Discussion
A rational evaluation system is essential for guiding the scientific establishment and management of healthy villages, thereby fostering sustainable rural development. However, the current methodologies for constructing evaluation indicators for healthy villages in China are inconsistent, lacking authoritative frameworks, standardized procedures, and scientifically sound guidance plans. To address these shortcomings and develop a rigorous assessment framework, this study employs a systematic mixed-methods approach, referencing pertinent domestic and international policy documents and literature. Initially, a preliminary pool of indicators suitable for China's healthy village rating system was developed through bibliometric analysis and semi-structured interviews. Subsequently, an enhanced Delphi method and Analytic Hierarchy Process (AHP) were utilized for indicator screening and weight assignment. Notably, to mitigate inherent limitations of traditional Delphi methods, such as semantic ambiguity that impedes consensus [42], expert cognitive heterogeneity that results in divergent scoring [43], and subjective biases in AHP [44], this study innovatively integrates Artificial Intelligence (AI) into the expert consultation process. By aiding experts in optimizing the semantic expressions of indicators, this AI-enhanced approach effectively minimizes potential cognitive friction and ensures that all judgments conform to a cohesive conceptual framework. Furthermore, AI functions as a logical validation tool. Its strong correlation (r = 0.99) with human scoring confirms that the developed system is grounded in sound public health logic rather than arbitrary subjectivity. In contrast to models constructed solely by humans, this novel ‘Human-AI Synergistic’ approach pioneers new pathways for the deep integration of artificial intelligence into public health management and offers a reference model for the development of comprehensive health village evaluation systems in other developing countries.
This study established a healthy village evaluation index system comprising seven first-level indicators, thirty-one second-level indicators, and sixty-one third-level indicators, while also determining the weights assigned to each indicator. A total of twenty-two experts, who have extensive experience in research across various disciplines, participated in this study. The expert authority coefficients from the two rounds of consultation were 0.81 and 0.83, respectively, with positive coefficients of 100% and 90.91%, indicating a high level of authority and a favorable attitude among the experts. Findings from the two rounds of Delphi expert consultations revealed that the scores for indicator importance and feasibility consistently remained at a relatively high level of approximately 4.3 points. Furthermore, the coefficient of variation for the scores in each round was below 0.25, and the coordination coefficient ranged from 0.340 to 0.463, with all P-values from the chi-square (χ2) test being less than 0.001. These results suggest a strong consensus among the experts regarding the indicator system, with their opinions demonstrating good concentration and coordination. Following the second round of consultation, scores increased further, and the coefficient of variation decreased slightly, indicating a gradual convergence of expert opinions. This collectively demonstrates that the indicator system developed in this study is highly scientific, rational, and based on expert consensus.
The hierarchical weight distribution of the index system illustrates a logical progression from strategic orientation to operational precision, signifying a paradigm shift toward “people-centered” health governance. At the first level (Macro-Strategic), the predominance of “Healthy Population” (A5, 34.04%) and “Healthy Lifestyle” (A6, 22.82%) indicates that the core evaluation standard has shifted from “infrastructure hardware” to “population health outcomes.” This distribution aligns seamlessly with the “Healthy China 2030” vision, wherein enhancing population quality serves as the ultimate objective. At the second level (Meso-Structural), the system underscores the structural drivers of these outcomes. Notably, within the “Healthy Lifestyle” dimension, “Health Literacy” (B6.1) commands a significant local weight of 0.6667, substantially surpassing other structural factors. This finding suggests that knowledge acquisition is regarded as the essential prerequisite for behavioral change. Similarly, under “Health Services,” “Public Health Services” (B4.2) exhibits a higher local weight (0.4021) compared to “Medical and Health Services” (0.2705), underscoring a strategic shift from “treatment-centered” to “prevention-oriented” rural healthcare. These meso-level indicators facilitate the connection between national objectives and grassroots implementation, ensuring that resources are prioritized for preventative and cognitive interventions. At the third level (Micro-Operational), the “health literacy level of residents” (C6.1.1) holds the highest rank, with a weight of 15.21%, underscoring its significant role. The Healthy China Initiative (2019–2030) identifies the enhancement of health literacy as a prerequisite for improving the overall health of the population [45]. However, the findings of the 2019 National Health Literacy Monitoring study revealed that the health literacy level among rural Chinese residents was only 19.17% [46]. This figure indicates a substantial gap between the health literacy of rural Chinese residents and both domestic health demands and international health standards. Consequently, it remains essential to prioritize and actively enhance health literacy levels. Moreover, the system exhibits considerable adaptability to rural contexts: “Number of General Practitioners per Hundred People” (C4.1.2) addresses the specific shortage of medical professionals in rural areas; “Collection and Treatment of Agricultural Production Waste” (C2.2.4) addresses the gap in existing systems by targeting non-point source pollution that is particularly prevalent in rural settings; and “Villagers' Participation Rate in Health Activities” (C7.2.1) underscores the importance of resident engagement, reflecting the WHO's fundamental principle of “community participation.”
The empirical application in three representative pilot villages further substantiates the diagnostic sensitivity and practical utility of the constructed index system. The significant score stratification observed between model Village A, which scored 97.1, and the lagging Village C, which scored 39.7, confirms that the evaluation framework possesses adequate discriminatory power to prevent score inflation and accurately reflect regional development disparities. More importantly, the detailed analysis of Village B illustrates the system's capacity for precise diagnosis. Although Village B achieved a moderate score of 77.3 with infrastructure metrics comparable to those of the model village, the evaluation successfully identified latent deficiencies in intangible dimensions such as health governance and culture. This finding highlights the instrument's ability to transcend superficial hardware indicators and capture the fundamental drivers of sustainable rural health. Consequently, it suggests that future resource allocation in developing areas should prioritize institutional capacity building and cultural engagement alongside infrastructural expansion.
The evaluation index system for Healthy Villages developed in this study exhibits both commonalities and notable rural-specific differences compared to the National Healthy City Evaluation Indicator System (2018 Edition) [47], highlighting the distinct priorities in urban and rural health development. Commonalities: Both frameworks identify “healthy population,” “healthy services,” and “healthy environment” as core dimensions. For instance, the “Healthy Population” (A5) in this study parallels the “Population Health” dimension in the Healthy City system, as both track essential metrics such as life expectancy and chronic disease management. This alignment reflects a shared consensus that “population health is the core of health construction.” Similarly, the “Healthy Services” dimension in both systems underscores the importance of accessible medical resources in achieving the WHO's goal of “Universal Health Coverage.” Differences: (1) Specificity of Environmental Indicators: The Healthy Villages system incorporates unique indicators tailored to agricultural production, such as “collection and treatment of agricultural production waste” (C2.2.4) and “livestock and poultry manure treatment,” which are not included in the urban framework. In contrast, the Healthy City indicators concentrate on high-density urban challenges, prioritizing metrics like “public toilet density” (Indicator 6), “per capita park green space” (Indicator 8), and “centralized sewage treatment,” which address issues arising from population agglomeration. (2) Focus of Social Dimension: Due to the relatively underdeveloped economy and higher aging population in rural areas, the Healthy Villages system places greater emphasis on “poverty alleviation” (C3.1.2) and “care for the ‘Five Guarantees’ households” (C3.2.3). Conversely, the Healthy City system prioritizes the stability of the urban social structure, placing greater emphasis on the “registered unemployment rate” (Indicator 14) and “health insurance reimbursement ratios” (Indicator 11). (3) Differences in the Cultural Dimension: The “Healthy Culture” (A7) of Healthy Villages underscores grassroots characteristics, including “village regulations and folk covenants” as well as traditional customs. In contrast, the Healthy City system focuses on the influence of modern media and infrastructure, specifically monitoring “media health science popularization” (Indicator 41) and “per capita sports venue area” (Indicator 12). This divergence fundamentally reflects the structural differences in resource endowments and development stages between urban and rural areas.
Despite the contributions of this study to the evaluation of Healthy Villages, several limitations merit consideration. One inherent constraint arises from the methodology itself, as reliance on the Delphi method subjects the results to the professional backgrounds and potential subjective biases of the participating experts. Regarding empirical validation, the geographical scope of the case studies poses a challenge to generalizability. Although the application to three pilot villages confirmed the system's sensitivity to developmental disparities, these villages are primarily located in eastern China, a region characterized by relatively advanced economic and infrastructural foundations. As a result, the applicability of the system to diverse regional contexts, particularly in central and western China or areas with distinct ecological features such as mountainous regions, has not been thoroughly tested. A final limitation pertains to the perspective of stakeholders. While expert consensus is foundational to the index system, it cannot fully substitute for the lived experiences of local residents; indeed, direct input from the villages would enhance the assessment system [48], [49], [50], [51], [52], [53], [54]. Moreover, research indicates that the construction of Healthy Villages is more effective and sustainable when villagers are actively engaged [51], [55]. The current lack of indicators for villager participation and satisfaction hinders the system's capacity to embody the people-oriented nature of these initiatives and to assess the actual effectiveness of planning implementation. To address these deficiencies, future research will concentrate on two primary areas. We intend to establish a dedicated sub-evaluation system for resident satisfaction through the use of questionnaire surveys and in-depth interviews, facilitating a more detailed analysis of the underlying causes of dissatisfaction. Concurrently, we will undertake large-scale empirical validation across various provincial contexts. This comprehensive data collection will allow for regular updates to the evaluation index in response to social developments and shifts in the disease spectrum, thereby ensuring that the framework remains both scientifically rigorous and practically relevant.
5. Conclusions
This study develops a comprehensive evaluation index system for Healthy Villages, consisting of 7 first-level, 31 s-level, and 61 third-level indicators that integrate population health with environmental and social dimensions. The scientific validity and practical applicability of this framework were validated through a dual verification strategy. Specifically, AI-assisted modeling confirmed the robustness of the expert consensus. Although the empirical application was limited to three pilot villages, the distinct score stratification demonstrates the system's sensitivity to developmental disparities, effectively identifying specific structural weaknesses. As a scalable diagnostic tool, this index system enables policymakers to monitor progress dynamically and optimize resource allocation. Therefore, its regular application is recommended to enhance the quality and sustainability of Healthy Villages construction globally.
Declaration of generative AI and AI-assisted technologies in the manuscript preparation process
During the preparation of this work, the author(s) used ChatGPT to assist in the qualitative analysis of expert feedback and the semantic optimization of evaluation indicators during the Delphi consultation phase, as well as to provide methodological support for structuring the index system during the Analytic Hierarchy Process (AHP) phase. After using this tool, the author(s) reviewed and edited the content as needed and take(s) full responsibility for the content of the published article.
CRediT authorship contribution statement
Kai Wang: Writing – review & editing, Writing – original draft, Validation, Software, Project administration, Methodology, Investigation, Funding acquisition, Formal analysis, Data curation. Yuanhao Hong: Supervision, Resources, Methodology, Conceptualization. Yimei Pan: Visualization, Validation, Investigation, Formal analysis, Data curation. Liang Chen: Validation, Supervision, Resources, Project administration.
Funding
This work was supported by the Fujian Health Science and Technology Project (2022RKA009).
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Footnotes
Supplementary data to this article can be found online at https://doi.org/10.1016/j.onehlt.2026.101352.
Appendix A. Supplementary data
Supplementary material 1
Supplementary material 2
Data availability
Data will be made available on request.
References
- 1.China TSCIOotsRo . The State Council Information Office of thePeople's Republic of China; Beijin: 2021. Full Text: Poverty Alleviation: China's Experience and Contribution. [Google Scholar]
- 2.Chen J., Rong S., Song M. Poverty vulnerability and poverty causes in rural China. Soc. Indic. Res. 2020;153(1):65–91. doi: 10.1007/s11205-020-02481-x. [DOI] [Google Scholar]
- 3.Huang L., Wang S., Xiao M., Li Z., Shi H. Promoting healthy village construction: challenges and countermeasures. Chin. J. Eng. Sci. 2021;23(5) doi: 10.15302/j-sscae-2021.05.020. [DOI] [Google Scholar]
- 4.Xu L., Guo M., Nicholas S., Sun L., Yang F., Wang J. Disease causing poverty: adapting the Onyx and Bullen social capital measurement tool for China. BMC Public Health. 2020;20(1):63. doi: 10.1186/s12889-020-8163-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Zhang L., Zhao Y. Breaking the vicious cycle between illness and poverty: empirical actions on land use in an oasis agricultural area. Land. 2021;10(4) doi: 10.3390/land10040335. [DOI] [Google Scholar]
- 6.Zhou Y., Guo Y., Liu Y. Health, income and poverty: evidence from China’s rural household survey. Int. J. Equity Health. 2020;19(1):36. doi: 10.1186/s12939-020-1121-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Zheng L., Peng L. Effect of major illness insurance on vulnerability to poverty: evidence from China. Front. Public Health. 2021;9 doi: 10.3389/fpubh.2021.791817. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.China NHCoPsRo . Peking Union Medical College Press; Beijing: 2020. China Health Statistical Yearbook 2020. [Google Scholar]
- 9.China NHCoPsRo . Peking Union Medical College Press; Beijing: 2021. China Health Statistical Yearbook 2021. [Google Scholar]
- 10.Wang D., Xie S., Wu J., Sun B. The trend in quality of life of Chinese population: analysis based on population health surveys from 2008 to 2020. BMC Public Health. 2023;23(1):167. doi: 10.1186/s12889-023-15075-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Sang S., Kang N., Liao W., Wu X., Hu Z., Liu X., Wang C., Zhang H. The influencing factors of health-related quality of life among rural hypertensive individuals: a cross-sectional study. Health Qual. Life Outcomes. 2021;19(1):244. doi: 10.1186/s12955-021-01879-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Wang J., Zhang R. COVID-19 in rural China: features, challenges and implications for the healthcare system. J. Multidiscip. Healthc. 2021;14:1045–1051. doi: 10.2147/jmdh.S307232. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Rong Y., Tang H., Zhang Y., Sun Y., Liu Z. Study on sewage characteristics in rural China and pollutants removal performance of biologically enhanced internal circulation treatment system. Sci. Rep. 2023;13(1):18058. doi: 10.1038/s41598-023-45085-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Zhao X., Xiang H., Zhao F. Measurement and spatial differentiation of farmers’ livelihood resilience under the COVID-19 epidemic outbreak in rural China. Soc. Indic. Res. 2023;166(2):239–267. doi: 10.1007/s11205-022-03057-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Lin J., Gong K., Chen C. Towards integrated sustainability for China’s rural revitalization: an analysis of income inequality and public health. Front. Public Health. 2023;11 doi: 10.3389/fpubh.2023.1328821. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Organization WH . World Health Organization; Geneva: 1990. Report of Informal Consultation on Urbanization and Environmental Health in Relation to the Healthy City Concept. [Google Scholar]
- 17.Allahyari M.S., Alipour H., Chbok R. Healthy village cooperative: an approach towards rural development. Sci. Res. Essays. 2010;5:2867–2874. [Google Scholar]
- 18.Organization WH . World Health Organization; Cairo,Egypt: 2007. Evaluation of the Healthy Village Programme in the Syrian Arab Republic. [Google Scholar]
- 19.Se Katsha. Environmental health interventions in Egyptian villages. Commun. Developm. J. 1994;29(3):232–238. doi: 10.1093/cdj/29.3.232. [DOI] [Google Scholar]
- 20.Yuasa M., Shirayama Y., Osato K., Miranda C., Condore J., Siles R. Cross-sectional analysis of self-efficacy and social capital in a community-based healthy village project in Santa Cruz, Bolivia. BMC Int. Health Hum. Rights. 2015;15(1):15. doi: 10.1186/s12914-015-0054-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Kiyu A., Steinkuehler A.A., Hashim J., Hall J., Lee P.F., Taylor R. Evaluation of the healthy village program in Kapit district, Sarawak, Malaysia. Health Promot. Int. 2006;21(1):13–18. doi: 10.1093/heapro/dai034. [DOI] [PubMed] [Google Scholar]
- 22.Hongxing L., Yong L., Hongyan M., Yong X., Hao Z., Xuejiao D., Qi Z. Research on guidelines for the construction and evaluation of healthy villages and towns in China. Chin. Health Educ. 2018;34(12) doi: 10.16168/j.cnki.issn.1002-9982.2018.12.021. [DOI] [Google Scholar]
- 23.Xue-Jiao D., Hong-Xing LI., Fu-Cheng F., Qi Z. Construction and evaluation indicator system of healthy village based on modified Delphi method and consensus meeting method. J. Environ. Health. 2019;36(04):346–350. https://link.cnki.net/doi/10.16241/j.cnki.1001-5914.2019.04.015 [Google Scholar]
- 24.Shen B., You L., Tian X., Ren X., Guo J., Jin F., Song Y., Su X., Liu Y. Establishment of comprehensive evaluation index system for healthy countryside(county-level) in China. Chin. J. Health Educ. 2019;35(03):203–207. doi: 10.16168/j.cnki.issn.1002-9982.2019.03.003. [DOI] [Google Scholar]
- 25.Yinan C., Xiaolin W., Suhai H., Yi G., Jun S., Ning Z. Construction of a healthy village monitoring index system in Zhejiangprovince based on the Delphi method. Chin. Publ. Health. 2025;41(9):1106–1111. doi: 10.11847/zgggws1147496. [DOI] [Google Scholar]
- 26.Lu C., Dongxian Z., Jiajia Z., Di M., Zhixiao S. Study on evaluation index system of healthy village in Hainan Province. Chin. Health Educ. 2022;38(12):1086–1090. doi: 10.16168/j.cnki.issn.1002-9982.2022.12.006. [DOI] [Google Scholar]
- 27.Maoyang Y., Zhengyu Z., Ya H., Hui C., Zhijian L. A study on the construction strategy of healthy villages and towns in Guizhou Province based on SWOT-PEST analysis. China Rural Health. 2022;5 doi: 10.3969/j.issn.1674-361X.2022.05.004. [DOI] [Google Scholar]
- 28.Vaccaro M., Almaatouq A., Malone T. When combinations of humans and AI are useful: a systematic review and meta-analysis. Nat. Hum. Behav. 2024;8(12):2293–2303. doi: 10.1038/s41562-024-02024-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Organization WH . In: WHO | Types of Healthy Settings. World Health Organization, editor. 2007. [Google Scholar]
- 30.Howard G., Bogh C., Goldstein G., Morgan J.E., Prüss A., Shaw R., Teuton J. Vol. 2002. 2002. Healthy Villages: A Guide for Communities and Community Health Workers. [Google Scholar]
- 31.Zhu H., Wang X., Lu S., Jianqiang W., Ou K., Li N. Bibliometric analysis on the progress of immunotherapy in renal cell carcinoma from 2003-2022. Hum. Vaccin. Immunother. 2023;19(2) doi: 10.1080/21645515.2023.2243669. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Liu S., Sun Y.P., Gao X.L., Sui Y. Knowledge domain and emerging trends in Alzheimer’s disease: a scientometric review based on CiteSpace analysis. Neural Regen. Res. 2019;14(9):1643–1650. doi: 10.4103/1673-5374.255995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Delbecq A.L., Van de Ven A.H., Gustafson D.H. Management Applications Series. xv. 1975. Group techniques for program planning : a guide to nominal group and Delphi processes; p. 174. illustrations. [DOI] [Google Scholar]
- 34.Shen L., Yang J., Jin X., Hou L., Shang S., Zhang Y. Based on Delphi method and analytic hierarchy process to construct the evaluation index system of nursing simulation teaching quality. Nurse Educ. Today. 2019;79:67–73. doi: 10.1016/j.nedt.2018.09.021. [DOI] [PubMed] [Google Scholar]
- 35.Keeney S., Hasson F., McKenna H.P., Askews, Holts . 2011. The Delphi Technique in Nursing and Health Research. [DOI] [Google Scholar]
- 36.Steurer J. The Delphi method: an efficient procedure to generate knowledge. Skeletal Radiol. 2011;40(8):959–961. doi: 10.1007/s00256-011-1145-z. [DOI] [PubMed] [Google Scholar]
- 37.Hui V., Guan S., Feng X. Development and quasi-experimental evaluation of a large language model-based automated feedback system for nursing innovation pitches. Nurse Educ. Pract. 2025;90 doi: 10.1016/j.nepr.2025.104672. [DOI] [PubMed] [Google Scholar]
- 38.Saaty T.L. 2nd ed edn. RWS Publications; Pittsburgh, PA: 1990. The Analytic Hierarchy Process : Planning, Priority Setting, Resource Allocation. [Google Scholar]
- 39.Behera M.D., Biradar C., Das P., Chowdary V.M. Developing quantifiable approaches for delineating suitable options for irrigating fallow areas during dry season-a case study from eastern India. Environ. Monit. Assess. 2020;191(Suppl. 3):805. doi: 10.1007/s10661-019-7697-4. [DOI] [PubMed] [Google Scholar]
- 40.Saaty T.L. Mcgraw-Hill; New York: 1980. The Analytic Hierarchy Process. [Google Scholar]
- 41.Zhao R., Liu F., Zhu K. Establishment of an evaluation index system of competencies for graduating students in general practice medicine. Int. J. General Med. 2020;17:85–92. doi: 10.21203/rs.3.rs-54659/v2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Keeney S., Hasson F., McKenna H.P. A critical review of the Delphi technique as a research methodology for nursing. Int. J. Nurs. Stud. 2001;38(2):195–200. doi: 10.1016/S0020-7489(00)00044-4. [DOI] [PubMed] [Google Scholar]
- 43.Hasson F., Keeney S., McKenna H. Research guidelines for the Delphi survey technique. J. Adv. Nurs. 2000;32(4):1008–1015. doi: 10.1046/j.1365-2648.2000.t01-1-01567.x. [DOI] [PubMed] [Google Scholar]
- 44.Ishizaka A., Labib A. Review of the main developments in the analytic hierarchy process. Expert Syst. Appl. 2011;38(11):14336–14345. doi: 10.1016/j.eswa.2011.04.143. [DOI] [Google Scholar]
- 45.China NHCotPsRo . 2019. Healthy China Initiative (2019–2030) In. Beijing. [Google Scholar]
- 46.Yang J., Gao Y., Wang Z. Increasing health literacy in China to combat noncommunicable diseases. China CDC Wkly. 2020;2(51):987–991. doi: 10.46234/ccdcw2020.248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Committee OotNPHC . 2018. Notice on Printing and Distributing the National Healthy City Evaluation Indicator System (2018 Edition) In. Beijing. [Google Scholar]
- 48.Chen S. The impact of villagers’ participation in the protection and Development of traditional villages on the revitalization of traditional Villages. HBEM. 2024;34:32–37. doi: 10.54097/yhp56065. [DOI] [Google Scholar]
- 49.Ikhlasiah M., Mutmainnah I., Hajar B.S. Education through health programmes: efforts to strengthen community health in Sukmajaya Village, Jombang District, Cilegon City. TLA. 2024;1(2):44–53. doi: 10.61397/tla.v1i2.95. [DOI] [Google Scholar]
- 50.Li Q., Lv S., Cui J., Liu Y., Chen Z. Research on the public environment renewal of traditional villages based on the social network analysis method. Sustainability. 2024;16(3):1006. doi: 10.3390/su16031006. [DOI] [Google Scholar]
- 51.Abidin Z. Innovative community service programs with local participation to build independent villages. Zabags Int. J. Engag. 2024;2(1):29–38. doi: 10.61233/zijen.v2i1.17. [DOI] [Google Scholar]
- 52.Rui J., Li X. Decoding vibrant neighborhoods: disparities between formal neighborhoods and urban villages in eye-level perceptions and physical environment. Sustain. Cities Soc. 2024;101 doi: 10.1016/j.scs.2023.105122. [DOI] [Google Scholar]
- 53.Abhishek S., Garg S., Keshri V.R. How useful do communities find the health and wellness centres? A qualitative assessment of India’s new policy for primary health care. BMC Prim. Care. 2024;25(1) doi: 10.1186/s12875-024-02343-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Li J. Bayesian joint modelling of life expectancy and healthy life expectancy and valuation of retirement village contract. Scand. Actuar. J. 2023;2024(2):149–167. doi: 10.1080/03461238.2023.2232816. [DOI] [Google Scholar]
- 55.Nikniaz A., Alizadeh M. vol. 13. 2007. Community Participation in Environmental Health: Eastern Azerbaijan Healthy Villages Project; p. 186. [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplementary material 1
Supplementary material 2
Data Availability Statement
Data will be made available on request.









