Abstract
This study developed and validated the Spatial Methodology Appraisal of Research Tool (SMART) using group concept mapping with discipline experts. The 16-item tool comprises four domains: (1) methods preliminaries, (2) data quality, (3) spatial data problems, and (4) spatial analysis methods. Validity testing demonstrated excellent content validity and expert agreement. Future studies will assess its usability and reliability to ensure consistent results. Its application in spatial epidemiology and health geography will enable more rigorous and transparent evidence synthesis. This contribution represents a significant step forward in improving the standards of quality appraisal in spatial research.
Supplementary Information
The online version contains supplementary material available at 10.1186/s12942-025-00401-1.
Keywords: Health geography, Spatial epidemiology, Systematic review, Quality appraisal, Critical appraisal, Methodological quality, Concept mapping
Background
Systematic reviews have been the cornerstone of evidence-based healthcare, and their popularity is growing in other fields, such as spatial epidemiology and health geography. This increased popularity is driven by the demand for high-quality, reliable evidence in a time of rapidly expanding research. Advancements in spatial analysis and technology (e.g., geographic information systems, mobile health applications, remote sensing, global positioning systems) have led to a surge in spatial health research over the past two decades [1–3]. As this body of work grows, so does the need for systematic approaches to synthesise evidence and evaluate methodological quality.
A critical component of systematic reviews is quality appraisal, which assesses the rigour and reliability of individual studies [4]. While established tools exist for appraising randomised controlled trials (RCTs) and observational studies [5], there is a notable lack of validated tools tailored to the unique methodological features of spatial research, such as those encountered in spatial epidemiology and health geography [6]. This gap hinders reviewers’ ability to effectively appraise key methodological components in spatial studies, as existing tools do not address the complexities of spatial methodologies.
While spatial methodologies have advanced our understanding of health patterns and disease distribution, they also introduce unique methodological challenges [7]. These include the modifiable areal unit problem (MAUP), ecological fallacy, and spatial dependency, all of which can introduce bias or uncertainty into analyses. Comprehensively synthesising and appraising the evidence within health geography remains difficult without a systematic approach to address these methodological challenges. A recent review [6] highlighted substantial variability in the use and adaptation of quality appraisal tools in health geography and spatial epidemiology research, with many systematic reviews employing unvalidated or inadequately justified tools. This highlights the need for consensus-based development and validation of quality appraisal tools tailored to evaluating spatial methodologies.
Consensus-based methods, such as the Delphi process and group concept mapping (GCM), are commonly used to develop quality appraisal tools by systematically gathering expert input [8–11]. The Delphi method relies on iterative, anonymous surveys to achieve consensus. In contrast, GCM enables participants to contribute to both idea generation and the organisation of concepts, making it valuable for capturing collective thinking on complex, multidimensional issues [10, 12]. While concept mapping in educational contexts is often used to visually organise and communicate knowledge [13], in participatory research, GCM is a structured, multi-step process designed to systematically collect, analyse, and visually represent the perspectives of a group to support consensus-building or conceptual framework development [14]. Drawing on principles from participatory research, multivariate statistics, and cognitive psychology [11, 14], GCM has been applied across healthcare [15, 16], public health [17, 18], health geography [19], evaluation [20], and in the development of new instruments and scale items [21, 22]. For this study, GCM was selected for its ability to facilitate structured expert input and visual synthesis, providing a participatory and intuitive method for building consensus on the domains and criteria relevant to a health geography quality appraisal tool.
The lack of tools specifically designed to evaluate the spatial methodologies commonly used in health geography research presents a significant challenge for conducting rigorous and transparent evidence synthesis and quality appraisal. This study directly addresses this gap by utilising GCM and an online survey with discipline experts to develop and validate a dedicated appraisal tool for spatial research.
Methods
Study design
This study was a two-phase mixed-methods design. Phase One utilised GCM [23] and Phase Two consisted of an online validation survey (Table 1). The GCM approach offered a standardised participative process to guide an expert reference group to sequentially (1) brainstorm, (2) sort, and (3) rate a set of ideas that informed the development of the quality appraisal tool [11, 24]. The use of GCM allowed for the construction of cluster maps to depict the compositive thinking of the group visually [23].
Table 1.
Study phases
| Phase One: Group Concept Mapping | |
|---|---|
| Step 1: Preparation | |
| Developing the Focus | The focus question prompted the expert reference group to brainstorm ideas on what the essential methodological components of a quality appraisal tool should comprise. |
| Selecting the Participants | An expert reference group was established to generate ideas through the brainstorming activity. |
| Step 2: Generation of Items | |
| Brainstorming | The expert reference group brainstormed ideas using the groupwisdom™ software. Additional items identified in an earlier scoping review [6] conducted by the authors were added to the final list. |
| Step 3: Structuring of Items | |
| Item Sorting | The expert reference group sorted the core group of items. |
| Items Rating | The expert reference group rated the core group of items. |
| Step 4: Representation of Items | |
| Map Development | A series of maps and graphs (e.g., cluster maps, Go-Zone Plot) were developed using the groupwisdom™ software. |
| Step 5: Interpretation of Maps | |
| Map Refinement | The advisory group examined the maps and reached group consensus on the final list of items and domain names. |
| Step 6: Utilisation of Maps | |
| Map Use | The maps were used to inform the initial quality appraisal tool items and any adaptations after pilot testing of the tool. |
| Phase Two: Content Validation | |
| Step 1: Content Validity Survey | |
| Validity Survey | To analyse the content validity of the quality appraisal tool, the expert reference group completed an online survey. |
| Step 2: Content Validity Assessment | |
| Analysis | Content validity was examined, and items were refined by the advisory group. |
Recruitment
Phase One involved recruiting an expert reference group to contribute to the development of the conceptual underpinnings of the quality appraisal tool through a consensus-based approach. The literature suggests that GCM generally requires a minimum of ten participants to ensure a diversity of perspectives and robust data for analysis. However, smaller groups may be appropriate and effective when the focus is narrow, and participants are selected for their specific expertise, as in this study [25]. The eligibility criteria required participants to hold a master’s or doctoral degree in health geography, geography, biostatistics, epidemiology, environmental science, or geospatial science, and to have authored at least five relevant publications within the past ten years. These publications needed to relate to fields such as health or medical geography, spatial epidemiology, geographic information systems, or quality appraisal.
Eligible participants were identified through university networks and publicly available sources, such as relevant publications and journal editorial board memberships. Contact details, typically email addresses, were obtained from institutional websites or published articles. An email invitation was sent outlining the purpose of the study, expected time commitment, and participation process. This email included a link to a Qualtrics registration survey, which also provided a Plain Language Statement for participants to read before expressing interest. A purposive snowball sampling approach was used, whereby invited participants were encouraged to forward the expression of interest to other eligible colleagues.
Those who registered interest were sent a link to the first GCM activity. Electronic consent was obtained before commencing the first activity. Participants created a username using their email address on the groupwisdom™ platform, enabling the research team to track individual participation across activities. Participants could choose whether to remain anonymous or be acknowledged in study publications. All participants remained anonymous to each other throughout the process. The GCM activities were undertaken between September and December 2023. An advisory group was established, comprising the research investigators and one participant from the expert reference group. This group completed the final step of Phase One, using a structured approach to reach consensus on the number of clusters, the items within each cluster, and appropriate domain names for the draft quality appraisal tool. The outcomes of the GCM activities and advisory group consensus informed the draft tool, which was subjected to further validation in Phase Two.
Phase Two involved recruiting participants identified from Phase One to participate in the validation survey. The literature on instrument development recommends a panel of at least three to ten experts for content validation, with five or more often suggested to ensure sufficient control over chance agreement [26–29]. These recommendations are primarily based on research in healthcare, nursing, and psychology, where expert panels are used to establish the content validity of new instruments or tools [26, 27, 29–31]. The validation survey was conducted between June and July 2024. The advisory group also participated in Phase Two to review and refine the tool’s items for comprehensiveness and clarity.
Ethics
The Deakin University Human Ethics Committee approved this study (reference number: HEAG-H 88_2023). All participants were provided with a plain language statement before participating in any phase of this study. Electronic consent was obtained before commencing the GCM activities and the validation survey.
Phase 1. group concept mapping
The study followed the six-step model outlined by Trochim and McLinden [11], and Anderson and Slonim [17], including Preparation, Generation, Structuring, Representation, Interpretation, and Utilisation (Table 1). All the GCM activities were conducted online using the groupwisdom™ software (Concept Systems, Inc., Ithaca, NY) [32]. This platform allowed participants to complete each activity independently within a set timeframe, reducing the potential influence of groupthink and enabling flexibility for participants to engage at their convenience.
Generation of statements (brainstorming)
Participants were asked to brainstorm responses to the focus prompt: When evaluating health geography studies, what do you think are the spatial methodological components a quality appraisal tool should assess? The focus prompt was designed to generate an exhaustive list of potential quality appraisal tool items. During the statement editing and synthesising process, the research team reviewed participant statements, separating items that contained multiple ideas and collapsing similar responses into a single, representative statement.
Addition of items
The final list of statements generated by the participants during the brainstorming activity was compared to a list of items developed by the authors during a scoping review of existing quality appraisal tools used in health geography and spatial epidemiology research [6]. Any items identified during the scoping review that were not generated as statements during the brainstorming activity were added to the final list after a discussion with the research team to ensure the items were unique and not similar to any of the brainstormed statements. This combined list formed the conceptual foundation for the subsequent sorting and rating procedures.
Structuring of statements (sorting and rating)
Sorting
Participants were asked to familiarise themselves with the statements generated in Step Two (brainstorming) and to sort them into clusters based on conceptual similarity. This sorting task was designed to reveal each participant’s perception of the interrelationships among the statements. The procedures for sorting were based on established methods detailed in the literature [22, 33, 34]. To ensure that items were grouped according to their conceptual meaning, rather than being ordered or ranked by perceived value, four restrictions were applied: (1) all statements could not be grouped into a single pile; (2) individual statements could be sorted into a pile on their own but all items could not be sorted into individual piles; (3) statements could not be grouped based on any sort of value (e.g., importance, relevance, frequency); and (4) there could not be any piles of unrelated items (e.g., ‘miscellaneous, other, or don’t know’ piles). The first two restrictions were included because if a participant grouped all items into a single pile or every item into its own, they would have provided no information about the interrelationships among the statements [34]. Participants then named each pile they created to capture its underlying theme; this process informed the names for the preliminary domains for the quality appraisal tool.
Rating
Participants then rated each statement for its importance and feasibility of appraisal using two rating questions: (1) How important do you think the item is for assessing the spatial methodological quality of health geography studies? and (2) How feasible do you think the item is to assess the spatial methodological quality of health geography studies? A 5-point Likert scale was used to rate the level of importance (5 = very important; 1 = not important at all) and feasibility (5 = very feasible; 1 = not feasible). Participants were instructed to use the entire rating scale and to rate each item relative to all other items rather than provide an absolute rating.
Group concept mapping analysis
After the participants had completed the sorting and rating activities, the research team performed a data quality check to ensure they followed the instructions (e.g., not creating piles based on a value and using the entire rating scale). Data were analysed using multidimensional scaling (MDS) and hierarchical cluster analysis. The MDS analysis generated a point map, a two-dimensional (x, y) visualisation of statements where distances reflect their perceived similarity. The stress value (a key diagnostic in multidimensional scaling with values from 0 to 1) measures how well the MDS map represents the participants’ sorting data, with lower values indicating a better fit. In GCM studies, there is a less than a 1% probability of a point map having either no structure or a random configuration if the stress value is below an upper limit of 0.39 [35, 36].
Hierarchical cluster analysis was undertaken to partition the statements on the point map into clusters of related items (cluster maps), representing conceptual groupings of the original set of statements. Mean ratings for each statement and cluster were calculated. Bridging values for an individual statement indicate whether that statement was generally sorted with nearby statements. Statements sorted more frequently with nearby statements had bridging values close to 0, and those sorted more frequently with statements in other areas had bridging values closer to 1. Lower bridging values for a cluster indicated a more stable and narrowly focused thematic content. To identify the most appropriate cluster representation of the participants’ sorting data, the advisory group examined the cluster maps from a 6-to-4-cluster solution to determine whether the contents conceptually belonged together and achieved group consensus on the most conceptually appropriate cluster solution.
Descriptive statistics were generated for importance and feasibility ratings for each statement. The mean ratings for each statement were then used to create a bi-variate scatter plot (Go-Zone Plot). This was divided into quadrants using the grand mean for each rating scale, visually representing each statement’s relative perceived importance and feasibility [35]. The division into quadrants facilitates the interpretation of results. Statements rated above the grand mean for both importance and feasibility (i.e., Q1 in the top right quadrant of the Go-Zone plot) were included in the final list of items for the quality appraisal tool.
Phase 2. content validation
Content validity refers to the degree to which the tool includes all relevant items and domains and adequately reflects the entire construct to be measured [26, 37, 38]. Aspects of content validity that should be assessed include (1) relevance, (2) comprehensiveness, and (3) clarity [39]. Content validation relies on experts’ judgment about the content of the items included in the tool [38]. The second phase of this study assessed the content validity of the quality appraisal tool items through an online validation survey.
Content validity survey
The final list of statements in Q1 of the Go-Zone plot underwent content validation to become items of the quality appraisal tool. The validation survey assessed the tool’s relevance, clarity, and comprehensiveness [29, 40]. The validation survey was developed in Qualtrics and sectioned into four domains, with items inputted under their respective domains. Each section briefly described the domain and asked participants to rate each item’s relevance. Relevance was assessed using a 4-point ordinal scale [41]. Clarity ensured the items were written clearly and appropriately for reviewers from varying backgrounds (e.g., health geographers, epidemiologists, or clinicians) by asking: How clear is this item? The survey evaluated the clarity of each item through a dichotomous Clear/Unclear response. Each item had the option to provide comments to improve the clarity. Comprehensiveness, assessed in the final survey question, explored whether the quality appraisal tool contained all pertinent items to appraise a study’s methodological quality by asking: Does the tool cover all the important items? Participants could also suggest additional items or items to remove. Any new or revised items were reviewed and validated by the advisory group.
Content validity analysis
The content validity of the quality appraisal tool was analysed using the content validity index (CVI), content validity ratio (CVR), and modified kappa [29]. Both, the item-level CVI (I-CVI) and the scale-level CVI (S-CVI) were analysed [26]. The I-CVI was calculated as the number of experts giving a relevance rating of either 3 or 4, dichotomising the ordinal relevance scale into relevant (score ≥ 3) or not relevant (score ≤ 2), divided by the total number of experts. Values for CVI range from 0 to 1, and an I-CVI of 0.8 or higher is recommended for a scale to be judged as having excellent content validity [29]. Items with values between 0.70 and 0.79 require revisions, and values below 0.70 are eliminated [27]. The S-CVI is calculated using the number of items in a tool that achieved a rating of ‘most relevant’. There are two methods for calculating S-CVI: the Universal Agreement (UA) among experts (S-CVI/ UA) and average CVI (S-CVI/Ave). The S-CVI/ UA is calculated by adding all items with I-CVI equal to 1 and dividing by the total number of items. The S-CVI/Ave is calculated by dividing the sum of the I-CVIs by the total number of items. The calculation of both the S-CVI/ UA and the S-CVI/Ave was undertaken to monitor for potential variability in values with six raters. The UA approach only includes items with 100% agreement and is, therefore, more conservative, potentially underestimating the scale content validity. The Average method is more liberal and is recommended [26]. For a scale to be judged as having excellent content validity, values for S-CVI/UA ≥ 0.80 and SCVI/Ave ≥ 0.90 are recommended. The CVR measures the essentiality of an item [27]. Values for CVR range between + 1.00 and − 1.00; a higher score indicates greater agreement among participants [42]. The modified kappa statistic was also calculated to determine the degree of agreement beyond chance, as the CVI does not consider the possibility of inflated values, and it is recommended to calculate a modified kappa alongside the CVI [27]. Kappa values range from + 1.00 to -1.00, with a positive kappa indicating inter-rater agreement occurring more frequently than would be expected by chance [43]. A kappa value of + 1.00 demonstrates complete agreement among raters; values above 0.75 are considered excellent, between 0.60 and 0.74 good, 0.40 and 0.59 fair, and 0.40 and 0.0 poor. A zero kappa indicates that agreements are no more than can be expected by chance [27].
Results
Participants
Thirty-seven experts were invited to participate in this study. Of those invited, nine participated in the GCM brainstorming activity, and six participated in the sorting and rating activities. Participants had expertise in spatial methodology and consisted of health geographers, spatial epidemiologists, and geospatial scientists from multiple countries, including Australia, New Zealand, the United States, Ireland, and the United Kingdom.
Phase 1. group concept mapping
Nine experts generated 53 statements during the brainstorming activity. After the statement synthesising (including editing to remove any commentary or examples provided) and adding items from the literature (n = 9), 62 unique items were included in the sorting and rating activity. The generated statements were related to multiple methodological areas, including data suitability, outcome measures, research alignment, causation, specification, spatial aggregation, spatial measures and analysis. Six experts participated in the rating and sorting activities. The potential cluster solutions were analysed with the advisory group, and a four-cluster solution was agreed upon (Fig. 1).
Fig. 1.
Cluster point map: final four-cluster solution
Each point (dot) represents an item in the sorting and rating activity. The closer the dots are to each other, the (generally) more frequently the ideas were sorted together and the more likely they were conceptually related. The map had a stress value of 0.26, indicating a good fit between the point map and the participants’ original sorting data [35]. The four-cluster solution was considered the best representation of the domains for quality assessment and included the following: (1) methods preliminaries, (2) data quality and measurement, (3) spatial data problems, and (4) spatial analysis methods.
The Go-Zone map (Fig. 2) displays the mean importance and feasibility ratings of all items, plotted across the quadrants of the bivariate plots. Each dot represents an item, with the dot’s colour indicating the cluster it was located in. The x-axis shows the range of mean values for importance, and the y-axis shows the range of mean values for feasibility. Dots in the green upper-right quadrant (Q1) of the Go-Zone plot represent items rated above the grand mean for both importance and feasibility in assessing the spatial methodological quality; 21 (34%) of the 62 items fall into this quadrant. See Table 2 for the list of original statements and clusters, including their mean importance and feasibility ratings, as well as each statement’s Go-Zone quadrant.
Fig. 2.
Go-Zone map: Each dot represents a quality appraisal item, coloured by cluster. The x-axis shows mean importance, and the y-axis shows mean feasibility. Quadrant 1 (green, upper right) includes items rated above the grand mean for both importance and feasibility
Table 2.
Mean importance and feasibility ratings for Spatial methodology appraisal components
| Mean rating | Go-Zone quadrant† | ||||
|---|---|---|---|---|---|
| Cluster Statement number | Bridging score* | Importance | Feasibility | I v F | |
| 1. Methods preliminaries | 0.62 | 3.63 | 3.71 | ||
| 10 | If there is a clear statement of applicability/relevance of method to research question. | 0.36 | 4.00 | 4.00 | 1 |
| 13 | Whether the paper has clearly outlined why they have used particular spatial methods in their approach. | 0.74 | 4.00 | 3.75 | 1 |
| 28 | Whether the paper has clearly defined the “issue” they are attempting to study. | 0.35 | 3.60 | 3.50 | 1 |
| 58 | If there is a clear statement of applicability/relevance of data to research question. | 0.36 | 4.00 | 4.25 | 1 |
| 3 | If there is a sufficient description of the setting to understand in what way the geography is similar to other settings. | 1.00 | 2.80 | 3.75 | 2 |
| 60 | Has due diligence been carried out to ensure the population characteristics and size make sense for the research question. | 0.89 | 3.40 | 3.00 | 3 |
| 2. Data quality and measurement | 0.23 | 3.09 | 3.08 | ||
| 18 | Data quality. | 0.00 | 4.00 | 4.00 | 1 |
| 20 | If the data are fit for purpose. | 0.26 | 3.60 | 4.00 | 1 |
| 26 | Whether assumptions about data quality have been clearly described. | 0.27 | 4.20 | 3.75 | 1 |
| 44 | Data are transparently presented for all main analyses. | 0.85 | 4.20 | 4.00 | 1 |
| 50 | Consistent and transparent definitions. | 0.13 | 4.40 | 3.75 | 1 |
| 54 | Country of analysis. | 0.96 | 1.60 | 3.75 | 2 |
| 5 | If the outcome measures are valid or reliable. | 0.11 | 3.20 | 3.25 | 3 |
| 12 | If the data collection tools are valid and reliable. | 0.00 | 3.20 | 2.75 | 3 |
| 42 | Data sources. | 0.00 | 3.60 | 3.00 | 3 |
| 43 | Geocoding quality. | 0.06 | 3.20 | 2.75 | 3 |
| 49 | Validation of data. | 0.00 | 3.75 | 2.00 | 3 |
| 51 | Appropriate definitions and classifications. | 0.37 | 4.00 | 2.75 | 3 |
| 61 | Whether assumptions about data limitations have been clearly described. | 0.89 | 3.40 | 3.25 | 3 |
| 2 | Quality of outcomes. | 0.11 | 2.60 | 3.25 | 4 |
| 4 | Ground truth of data. | 0.14 | 2.40 | 2.25 | 4 |
| 27 | If standardised measures have been used for the outcome measure. | 0.11 | 2.60 | 3.25 | 4 |
| 30 | Geocoding methods. | 0.06 | 3.00 | 3.25 | 4 |
| 32 | If a nationally consistent geographic standard has been used. | 0.46 | 2.80 | 2.75 | 4 |
| 35 | Self-reported outcome. | 0.11 | 2.00 | 1.50 | 4 |
| 38 | If the outcome measures are free from limitations. | 0.15 | 2.40 | 2.75 | 4 |
| 40 | If the outcome measures are well-established in the field. | 0.11 | 1.80 | 3.25 | 4 |
| 48 | Objective outcome. | 0.12 | 2.40 | 3.00 | 4 |
| 56 | If the subjective measures are reliable and valid. | 0.02 | 2.80 | 2.50 | 4 |
| 3. Spatial data problems | 0.65 | 3.09 | 3.42 | ||
| 16 | Spatial method considers the spatial scale. | 0.51 | 4.00 | 3.75 | 1 |
| 17 | If the relationship between the research question and spatial unit makes sense. | 0.85 | 4.00 | 3.50 | 1 |
| 21 | Has spatial autocorrelation been taken into account. | 0.50 | 3.60 | 4.75 | 1 |
| 62 | If area-level clustering has been taken into account. | 0.59 | 3.50 | 3.50 | 1 |
| 8 | Consideration of temporal issues. | 0.57 | 3.40 | 3.75 | 1 |
| 14 | Quality of match between outcome, spatial unit and modelling. | 0.60 | 3.20 | 4.00 | 1 |
| 39 | Scale of spatial units. | 0.68 | 3.00 | 3.75 | 2 |
| 59 | Ecological fallacy. | 0.48 | 3.00 | 3.50 | 2 |
| 6 | How missing data are handled. | 0.84 | 2.80 | 3.50 | 2 |
| 1 | Potential boundary issues. | 0.62 | 2.75 | 3.50 | 2 |
| 57 | Edge effects. | 0.54 | 2.60 | 3.50 | 2 |
| 55 | Sample size of spatial units. | 0.75 | 2.20 | 3.50 | 2 |
| 19 | Spatial representativeness. | 0.80 | 3.60 | 2.50 | 3 |
| 7 | If distributional assumptions has been taken into account. | 0.59 | 3.00 | 2.50 | 4 |
| 23 | Has spatial uncertainty been taken into account. | 0.67 | 3.00 | 3.00 | 4 |
| 36 | Distance/proximity. | 0.83 | 3.00 | 3.25 | 4 |
| 41 | Heterogeneity of spatial units. | 0.87 | 3.00 | 3.25 | 4 |
| 15 | Modifiable Areal Unit Problem (MAUP). | 0.47 | 2.80 | 3.00 | 4 |
| 52 | Point or region data. | 0.68 | 2.20 | 3.00 | 4 |
| 4. Spatial analysis methods | 0.61 | 3.34 | 3.35 | ||
| 11 | Sensitivity analysis - formal attempts to determine robustness of findings to assumptions and data. | 1.00 | 3.80 | 3.50 | 1 |
| 22 | Use of correct model. | 0.47 | 4.00 | 3.75 | 1 |
| 25 | If the spatial methods are fit-for-purpose. | 0.67 | 4.60 | 3.75 | 1 |
| 34 | Data interpretation. | 0.90 | 3.40 | 3.50 | 1 |
| 46 | If the analyses are conducted and presented correctly. | 0.74 | 4.40 | 3.50 | 1 |
| 47 | Multi-level models. | 0.34 | 3.20 | 3.33 | 1 |
| 33 | Spatial accessibility analysis methods. | 0.43 | 2.75 | 3.75 | 2 |
| 37 | Spatial association. | 0.49 | 2.60 | 4.25 | 2 |
| 53 | Analysis techniques. | 0.29 | 4.00 | 3.25 | 3 |
| 9 | Correlations/inter-relationships between independent variables. | 0.91 | 3.80 | 3.00 | 3 |
| 24 | If all modelled outputs are accompanied with a model of uncertainty. | 0.50 | 3.00 | 3.00 | 4 |
| 29 | Avoids controlling for variables on causal pathway. | 0.61 | 3.00 | 3.25 | 4 |
| 31 | Geographic Information Systems (GIS) methods. | 0.60 | 1.80 | 2.75 | 4 |
| 45 | Use of causal framework. | 0.65 | 2.40 | 2.25 | 4 |
| All statements (grand mean) | 3.29 | 3.39 | |||
† Go-Zone quadrant: 1 = above the grand mean on both importance and feasibility; 2 = above the grand mean on importance and below on feasibility; 3 = below the grand mean on importance and above on feasibility; 4 = below the grand mean on both
* Bridging score: Values closer to 0 indicate more coherent clusters or anchoring statements; values closer to 1 indicate less coherent clusters or bridging statements
Before commencing Phase Two (content validity), the advisory group reviewed the final list of items (version 1.0) for repetition and technical and grammatical consistency. A supplementary file outlines iterations of the quality appraisal tool development [see Supplementary file 1]. Three items were removed during the revision process. Two items, relating to area-level clustering (item 3.5) and autocorrelation (item 3.6) from the ‘spatial data problems’ domain were removed, as these were considered to be covered by an item in the ‘spatial analysis methods’ domain. One item pertaining to multi-level models (item 4.6) was removed from the ‘spatial analysis methods’ domain because it was thought to be covered by a second item relating to suitable models (item 4.2). The revised 18 items of the quality appraisal tool (version 1.1) underwent validation.
Addition of items
A comparative analysis compared the expert-derived items and those extracted from the literature to identify gaps in the brainstorming items. Nine additional items were added to the final list for sorting and rating. Only two (22%) of the nine items were rated above the grand mean for importance and feasibility for assessing the spatial methodological quality and were included in the final tool.
Phase two. content validity
Six experts participated in content validation of the quality appraisal tool items (version 1.1). The I-CVI and S-CVI were calculated for each item’s relevancy (Table 3). Seventeen (94%) items were identified as relevant (I-CVI from 0.80 to 1.00, indicating excellent content validity), with 12 items (67%) having an I-CVI = 1.00. One item (item 3.3) was eliminated due to poor content validity (I-CVI = 0.33). For scale-level validity, the Universal Agreement approach indicated good content validity (S-CVI/UA = 0.67), and the Average method indicated excellent content validity (S-CVI/Ave = 0.92). Removing item 3.3 improved these values to 0.71 (S-CVI/UA) and 0.95 (S-CVI/Ave). The overall kappa statistic was 0.90, indicating excellent agreement. Seventeen items (94%) had a kappa between 0.82 and 1.00, while item 3.3 had a kappa of 0.13, reflecting poor agreement. The CVR ranged from − 0.33 to 1.00 across items. Twelve items (67%) had a CVR of 1.00, five (28%) had a CVR of 0.67, and one (item 3.3) had a CVR of -0.33. The average CVR value was 0.83. Since item 3.3 was rated ‘somewhat relevant’ and had poor content validity, poor agreement, and a low CVR value, it was eliminated from the revised quality appraisal tool (version 1.2). Clarity and comprehensiveness were calculated as a dichotomous Clear/Unclear and Yes/No, respectively. For clarity, 77% of items were determined to be clear. Items rated unclear were reviewed and revised, informed by the revisions suggested by participants. For comprehensiveness, all experts responded ‘yes’ to the question, indicating that the tool was comprehensive and contained all pertinent items to appraise a study’s methodological quality.
Table 3.
Content validity results for the quality appraisal tool (Version 1.1) (n = 6)
| Item | I-CVI | UA | kappa | CVR | Clarity |
|---|---|---|---|---|---|
| Item-1 | 1.00 | 1 | 1.00 | 1.00 | 0.83 |
| Item-2 | 0.83 | 0 | 0.82 | 0.67 | 0.83 |
| Item-3 | 1.00 | 1 | 1.00 | 1.00 | 0.67 |
| Item-4 | 0.83 | 0 | 0.82 | 0.67 | 0.83 |
| Item-5 | 1.00 | 1 | 1.00 | 1.00 | 0.67 |
| Item-6 | 1.00 | 1 | 1.00 | 1.00 | 0.83 |
| Item-7 | 1.00 | 1 | 1.00 | 1.00 | 0.83 |
| Item-8 | 0.83 | 0 | 0.82 | 0.67 | 1.00 |
| Item-9 | 1.00 | 1 | 1.00 | 1.00 | 1.00 |
| Item-10 | 1.00 | 1 | 1.00 | 1.00 | 0.83 |
| Item-11 | 1.00 | 1 | 1.00 | 1.00 | 0.83 |
| Item-12 | 0.33 | 0 | 0.13 | 0.33 | 0.17 |
| Item-13 | 1.00 | 1 | 1.00 | 1.00 | 0.83 |
| Item-14 | 1.00 | 1 | 1.00 | 1.00 | 0.67 |
| Item-15 | 1.00 | 1 | 1.00 | 1.00 | 0.83 |
| Item-16 | 0.83 | 0 | 0.82 | 0.67 | 0.67 |
| Item-17 | 0.83 | 0 | 0.82 | 0.67 | 0.50 |
| Item-18 | 1.00 | 1 | 1.00 | 1.00 | 1.00 |
Key = I-CVI: Item-Content Validity Index; UA: Universal Agreement; CVR: Content Validity Ratio
Discussion
This study developed and validated the Spatial Methodology Appraisal of Research Tool (SMART) to address a critical gap in the appraisal of spatial research. Existing quality appraisal tools are not designed to evaluate the unique methodological features of spatial studies, making it difficult to systematically identify and assess potential biases or errors [6]. This limitation affects not only systematic reviews, but also other forms of study assessment, such as critical appraisal and peer review. Reviewers may overlook serious flaws without evaluating spatial methods and draw spurious conclusions about study quality [7]. To address this, we used GCM and expert input to develop a tool comprising four domains: methods preliminaries, data quality, spatial data problems, and spatial analysis methods. These domains reflect the key methodological components of spatial research deemed most important and feasible to assess.
The methods preliminaries domain had the highest mean cluster importance and feasibility ratings. This highlights its foundational and practically achievable role in evaluating whether studies clearly justify the suitability of data and spatial methods to address their research question or objectives. This domain is essential for understanding the relevance and applicability of findings to specific geographic contexts. The first item (item 1.1) in this domain aligns with those identified in the peer-reviewed literature, indicating consistency between expert consensus and existing knowledge. However, existing quality appraisal tools evaluate items related to sampling, recruitment, exposures, outcome measures, and confounding variables. They do not, however, include assessments of the suitability of spatial methods and data. The data quality domain evaluates whether the data used in the study meets the specific requirements, addresses quality concerns, and are consistently defined and transparently reported, including any necessary processing. This focus on data quality is crucial for ensuring the reliability and replicability of study findings. Existing quality appraisal tools may address items relating to the quality of outcome data but lack the specific focus on the unique challenges of spatial data, such as ensuring that data are adequately described and suitable for the analyses. The spatial data problems domain examines whether the study appropriately considers spatial units, scales, resolutions, and temporal issues relevant to the research question. This domain considers the complexity and potential biases unique to spatial research (e.g., MAUP). The domain of spatial analysis methods focuses on the suitability of the spatial methods and models used, how the analyses are presented, and the robustness of findings through sensitivity analyses.
Using appropriate methods and clearly presenting and interpreting results aids in drawing valid conclusions. While existing tools may evaluate the appropriateness of statistical methods and analyses in general, they do not consider specific spatial analysis methods. This tool fills a gap in existing quality appraisal tools by addressing the distinct complexities of spatial data, analysis, and interpretation.
Consistent with the development of existing quality appraisal tools, this study sought expert consensus to guide the process. The item generation for our quality appraisal tool followed a comprehensive approach, combining expert opinions with insights from a scoping review of existing literature [6]. Through a comparative analysis, we examined overlaps and gaps between the expert-derived items and those identified from the literature. Most of the literature-derived items included in the GCM sorting and rating were excluded from the final tool, including highly important items such as data collection tools and outcome measures. This exclusion was due to these items being rated as having low feasibility for assessing spatial methodological quality. However, as data collection tools and outcome measures in health geography become more valid and reliable, their feasibility may increase, potentially warranting their inclusion in future tool iterations.
Validity is a fundamental consideration when developing a quality appraisal tool. Existing tools have been criticised for their lack of rationale for the criteria, including the inclusion of items unrelated to internal or external validity and unjustified criteria weighting [44–46]. In this study, we assessed the content validity of the quality appraisal tool using several methods to quantify expert agreement, including CVI and the modified kappa statistic. Both the item-level (I-CVI) and scale-level (S-CVI) CVI were calculated and showed excellent content validity. Best practice quality appraisal tools should be developed through empirical research. This includes consensus on item inclusion, established validity of item construction, intra- or inter-rater reliability of the tool, and guidelines for tool applicability to ensure consistency among reviewers [47]. Validating this quality appraisal tool is the first step in its testing.
This newly developed and validated tool has the potential to enhance the reliability and transparency of spatial studies, providing a standardised framework for evaluating methodological quality. This quality appraisal tool could be used in conjunction with existing tools to ensure methodological assessment of both study design and spatial components. A follow-up study is currently underway to assess the tool’s usability and inter-rater reliability, further supporting its robustness and practical utility.
Strengths and limitations
This study describes the development of SMART, the first quality appraisal tool specifically designed to evaluate the spatial methodologies commonly used in health geography. A strength of this study is the use of GCM, an established method that rigorously incorporates expert perspectives and combines qualitative and quantitative approaches to enhance content validity. One limitation is the relatively small number of participants in the GCM activities, which may mean that not all perspectives in the field were captured. However, the study’s narrow focus and the participation of internationally recognised experts with extensive knowledge and experience help mitigate this concern. Furthermore, by systematically adding items identified as missing from the existing literature, the tool addressed recognised gaps and provided more comprehensive coverage of spatial methodological quality. While the literature suggests that GCM typically requires a minimum of ten participants to capture a range of perspectives, exceptions are appropriate when the study focus is narrow and participants are selected for their specific expertise, as in this study [25]. In the second phase, content validation involved six or more raters, meeting recommended standards [26, 29]. The excellent content validity and inclusion of relevant items indicate the tool is comprehensive.
Conclusions
This study addresses the critical need for a reliable method to appraise the quality of spatial health research, a field experiencing significant growth. We developed and validated SMART, a comprehensive quality appraisal tool that provides a rigorous means of evaluating the spatial methodologies commonly used in health geography. Informed by expert consensus and a scoping review, SMART captures the key methodological components deemed most important and feasible to assess quality. Its application in fields such as spatial epidemiology and health geography will enable more rigorous and transparent evidence synthesis, enhancing the reliability and standards of spatial health research.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Acknowledgements
The authors would like to acknowledge and thank Dr Matthew Hobbs, Dr Jesse Whitehead, Associate Professor Daniel Ierodiaconou, Mr Marcus Blake, and Dr Conor Teljeur for participating in the concept mapping activities. The authors would like to extend thanks to the participants who wish to remain anonymous—we greatly appreciate your time and expertise in developing this tool.
Author contributions
SW led the study design, concept mapping process, data analysis, and manuscript drafting. AWS, LA, KM, VLV provided supervision and input into the study design, concept mapping process, data analysis, and manuscript drafting. AD was involved in the concept mapping design, data analysis and manuscript drafting. NTC was involved in the concept mapping process, data analysis, and manuscript drafting. All authors read and approved the final manuscript.
Funding
This research is part of the first author’s PhD research, supported by a Deakin postgraduate Scholarship. SW, AWS, LA, NTC, and VLV are supported by the Australian Government’s Rural Health Multidisciplinary Training program.
Data availability
All data supporting the findings of this study are available within the paper and its Supplementary Information.
Declarations
Ethics approval and consent to participate
This study received Deakin University ethics approval (reference number: HEAG-H 88_2023). All participants were provided with a plain language statement before participating in any phase of this study. Electronic consent was obtained before commencing the group concept mapping study and the validation survey.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Albert DP, Gesler WM, Levergood B, editors. Spatial analysis, GIS, and remote sensing applications in the health sciences [Internet]. Chelsea, Mich: Ann Arbor Press; 2000. Available from: 10.1201/b12416
- 2.Apparicio P, Gelb J, Dubé A-S, Kingham S, Gauvin L, Robitaille É. The approaches to measuring the potential Spatial access to urban health services revisited: distance types and aggregation-error issues. Int J Health Geogr. 2017;16:32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Wood SM, Alston L, Beks H, Mc Namara K, Coffee NT, Clark RA, et al. The application of Spatial measures to analyse health service accessibility in Australia: a systematic review and recommendations for future practice. BMC Health Serv Res. 2023;23:330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Haddaway NR, Pullin AS. The policy role of systematic reviews: past, present and future. Springer Sci Reviews. 2014;2:179–83. [Google Scholar]
- 5.Mallen C, Peat G, Croft P. Quality assessment of observational studies is not commonplace in systematic reviews. J Clin Epidemiol. 2006;59:765–9. [DOI] [PubMed] [Google Scholar]
- 6.Wood SM, Alston L, Beks H, Mc Namara K, Coffee NT, Clark RA, et al. Quality appraisal of Spatial epidemiology and health geography research: A scoping review of systematic reviews. Health Place. 2023;83:103108. [DOI] [PubMed] [Google Scholar]
- 7.Hurd J, Hurley O, Asghari S. Issues to consider before initiating a project in medical geography. Front Public Health. 2016;4:14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Olivio S, Macedo L, Gadotti I, Fuentes J, Stanton T, Magee D. Scales to assess the quality of randomized controlled trials: a systematic review. PHYS THER. 2008;88:156–75. [DOI] [PubMed] [Google Scholar]
- 9.Verhagen AP. Quality assessment of randomised clinical trials [Internet]. maastricht university; 1999 [cited 2022 Feb 24]. Available from: https://cris.maastrichtuniversity.nl/en/publications/918e4af9-2270-4ea8-9f38-474f2e6eb6bd
- 10.Hsu C-C, Sandford BA. The Delphi Technique: Making Sense of Consensus. 2007 [cited 2022 Mar 24]; Available from: https://scholarworks.umass.edu/pare/vol12/iss1/10/
- 11.Trochim WM, McLinden D. Introduction to a special issue on concept mapping. Eval Program Plan. 2017;60:166–75. [DOI] [PubMed] [Google Scholar]
- 12.Ludwig B. Predicting the future: have you considered using the Delphi methodology?? J Ext. 1997;35:1–4. [Google Scholar]
- 13.Slieman TA, Camarata T. Case-Based group learning using concept maps to achieve multiple educational objectives and behavioral outcomes. J Med Educ Curric Dev. 2019;6:2382120519872510. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Kane M, Rosas S. Conversations About Group Concept Mapping: Applications, Examples, and Enhancements. 2018 [cited 2025 May 1]; Available from: https://research.ebsco.com/linkprocessor/plink?id=ad4634aa-40b8-3ba5-ac3e-6405be0060bf
- 15.Nabitz U, van Randeraad-van C, Kok I, van Bon-Martens M, Serverens P. An overview of concept mapping in Dutch mental health care. Eval Program Plan. 2017;60:202–12. [DOI] [PubMed] [Google Scholar]
- 16.Wentink C, Huijbers MJ, Lucassen PL, van der Gouw A, Kramers C, Spijker J, et al. Enhancing shared decision making about discontinuation of antidepressant medication: a concept-mapping study in primary and secondary mental health care. Br J Gen Pract. 2019;69:e777–85. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Anderson LA, Slonim A. Perspectives on the strategic uses of concept mapping to address public health challenges. Eval Program Plan. 2017;60:194–201. [DOI] [PubMed] [Google Scholar]
- 18.Tubbing L, Harting J, Stronks K. Unravelling the concept of integrated public health policy: concept mapping with Dutch experts from science, policy, and practice. Health Policy. 2015;119:749–59. [DOI] [PubMed] [Google Scholar]
- 19.Mehdipanah R, Malmusi D, Muntaner C, Borrell C. An evaluation of an urban renewal program and its effects on neighborhood resident’s overall wellbeing using concept mapping. Health Place. 2013;23:9–17. [DOI] [PubMed] [Google Scholar]
- 20.Szijarto B, Bradley Cousins J. Mapping the practice of developmental evaluation: insights from a concept mapping study. Eval Program Plan. 2019;76:101666. [DOI] [PubMed] [Google Scholar]
- 21.LaNoue MD, Gerolamo AM, Powell R, Nord G, Doty AM, Rising KL. Development and preliminary validation of a scale to measure patient uncertainty: the uncertainty scale. J Health Psychol. 2020;25:1248–58. [DOI] [PubMed] [Google Scholar]
- 22.Rosas SR, Camphausen LC. The use of concept mapping for scale development and validation in evaluation. Eval Program Plan. 2007;30:125–35. [DOI] [PubMed] [Google Scholar]
- 23.Trochim W, Kane M. Concept mapping: an introduction to structured conceptualization in health care. Int J Qual Health Care. 2005;17:187–91. [DOI] [PubMed] [Google Scholar]
- 24.Rosas SR, Ridings JW. The use of concept mapping in measurement development and evaluation: application and future directions. Eval Program Plan. 2017;60:265–76. [DOI] [PubMed] [Google Scholar]
- 25.Kane M, Trochim W. Concept Mapping for Planning and Evaluation - Chap. 2: Preparing for Concept Mapping [Internet]. 2455 Teller Road, Thousand Oaks California 91320 United States of America: SAGE Publications, Inc.; 2007 [cited 2023 Mar 9]. Available from: https://methods.sagepub.com/book/concept-mapping-for-planning-and-evaluation
- 26.Polit DF, Beck CT. The content validity index: are you sure you know what’s being reported? Critique and recommendations. Res Nurs Health. 2006;29:489–97. [DOI] [PubMed] [Google Scholar]
- 27.Rodrigues IB, Adachi JD, Beattie KA, MacDermid JC. Development and validation of a new tool to measure the facilitators, barriers and preferences to exercise in people with osteoporosis. BMC Musculoskelet Disord. 2017;18:540. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Yusoff MSB. ABC of content validation and content validity index calculation. Educ Med J. 2019;11:49–54. [Google Scholar]
- 29.Lynn MR. Determination and quantification of content validity. Nurs Res. 1986;35:382–5. [PubMed] [Google Scholar]
- 30.Roebianto A, Savitri I, Sriyanto A, Syaiful I, Mubarokah L. Content validity: definition and procedure of content validation in psychological research. TPM - Test. 2023;30:5–18. [Google Scholar]
- 31.Zamanzadeh V, Ghahramanian A, Rassouli M, Abbaszadeh A, Alavi-Majd H, Nikanfar A-R. Design and implementation content validity study: development of an instrument for measuring Patient-Centered communication. J Caring Sci. 2015;4:165–78. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Concept Systems. The Concept System® groupwisdom™ [Internet], Ithaca NY. 2021. Available from: http://www.conceptsystemsglobal.com
- 33.Rosenberg S, Kim MP. The method of sorting as a Data-Gathering procedure in multivariate research. Multivar Behav Res. 1975;10:489–502. [DOI] [PubMed] [Google Scholar]
- 34.Kane M, Trochim W. Concept Mapping for Planning and Evaluation - Chap. 4: Structuring the Statements [Internet]. 2455 Teller Road, Thousand Oaks California 91320 United States of America: SAGE Publications, Inc.; 2007 [cited 2023 Nov 21]. Available from: https://methods.sagepub.com/book/concept-mapping-for-planning-and-evaluation
- 35.Kane M, Trochim W. Concept Mapping for Planning and Evaluation - Chap. 5: Concept Mapping Analysis [Internet]. 2455 Teller Road, Thousand Oaks California 91320 United States of America: SAGE Publications, Inc.; 2007 [cited 2023 Mar 9]. Available from: https://methods.sagepub.com/book/concept-mapping-for-planning-and-evaluation
- 36.Rosas SR, Kane M. Quality and rigor of the concept mapping methodology: a pooled study analysis. Eval Program Plann. 2012;35:236–45. [DOI] [PubMed] [Google Scholar]
- 37.Crowe M, Sheppard L. A review of critical appraisal tools show they lack rigor: alternative tool structure is proposed. J Clin Epidemiol. 2011;64:79–89. [DOI] [PubMed] [Google Scholar]
- 38.Streiner DL, Norman GR. Health measurement scales: a practical guide to their development and use. 4th ed. Oxford; New York: Oxford University Press; 2008. [Google Scholar]
- 39.Terwee CB, Bot SDM, de Boer MR, van der Windt DAWM, Knol DL, Dekker J, et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol. 2007;60:34–42. [DOI] [PubMed] [Google Scholar]
- 40.Coles B, Tyrer F, Hussein H, Dhalwani N, Khunti K. Development, content validation, and reliability of the assessment of Real-World observational studies (ArRoWS) critical appraisal tool. Ann Epidemiol. 2021;55:57–e6315. [DOI] [PubMed] [Google Scholar]
- 41.Davis LL. Instrument review: getting the most from a panel of experts. Appl Nurs Res. 1992;5:194–7. [Google Scholar]
- 42.Ayre C, Scally AJ. Critical values for Lawshe’s content validity ratio: revisiting the original methods of calculation. Meas Evaluation Couns Dev. 2014;47:79–86. [Google Scholar]
- 43.Wynd CA, Schmidt B, Schaefer MA. Two quantitative approaches for estimating content validity. West J Nurs Res. 2003;25:508–18. [DOI] [PubMed] [Google Scholar]
- 44.Bilotta GS, Milner AM, Boyd IL. Quality assessment tools for evidence from environmental science. Environ Evid. 2014;3:14. [Google Scholar]
- 45.Moher D, Jadad AR, Nichol G, Penman M, Tugwell P, Walsh S. Assessing the quality of randomized controlled trials: an annotated bibliography of scales and checklists. Control Clin Trials. 1995;16:62–73. [DOI] [PubMed] [Google Scholar]
- 46.Jüni P, Altman DG, Egger M. Systematic reviews in health care: assessing the quality of controlled clinical trials. BMJ. 2001;323:42–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Katrak P, Bialocerkowski AE, Massy-Westropp N, Kumar VS, Grimmer KA. A systematic review of the content of critical appraisal tools. BMC Med Res Methodol. 2004;4:22. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All data supporting the findings of this study are available within the paper and its Supplementary Information.


