Table 4.
Quality reporting of observational school audit tool studies.
| ID | Was formative or pilot testing done, or are adaptations described? | What type (if any) of reliability testing was done? What were the results? | What type (if any) validity testing was done? What were the results? | Is the scoring protocol described? | Total Items Reported | |
|---|---|---|---|---|---|---|
| 1 | ACTION! Staff Audit | New; pilot study | 1 | |||
| 2 | Adachi et al, 2013 | Locations described, then other items summed to reflect total number of machines, filled slots, and machine-front advertising per school | 1 | |||
| 3 | Belansky et al, 2013 | Combined with other data to describe implementation changes, then qualitatively classified changes as effective, promising or emerging. | 1 | |||
| 4 | Branding Checklist | New; pilot-tested at first school | Combined with other data sources, qualitatively analyzed using constant comparative method to find patterns | 2 | ||
| 5 | Co-SEA | Adapted from Endorse and SPEEDY; Pilot tested to work through technical difficulties | Scored similarly to ENDORSE and SPEEDY, with change scores calculated from Year 1 to Year 2 | 2 | ||
| 5.1 | Co-SEA Unadapted | Used unadapted from original COMPASS study | 1 | |||
| 6 | EAPRS | New; input from parks officials/users and made revisions over several iterations | Interrater (% agreement, ICC, Kappas: 66% items had good-excellent reliabiliy) | Face (Several rounds of input from parks and rec staff and park users) | Variable created for each exposure category (summed binary or frequency items, averaged categorical items) | 4 |
| 7 | ENDORSE | New; Reviewed by experts and pilot-tested | Summed or counted, re-coded into 8 ′′ availability” variables, which were dichotomized or categorized into tertiles | 2 | ||
| 8 | Food Decision Environment Tool | New; developed using behavioral economics theories, modified based on feedback from school/study stakeholders throughout | Inter-rater (system to resolve discrepancies during analysis, conducted peer debriefing meetings to clarify) | Trustworthiness of methods (e.g., credibility, transferability, dependability, confirmability) were established and described | Data from observational form were summarized and triangulated using field notes, and analyzed qualitatively for emerging themes | 4 |
| 9 | GRF-OT | New; developed over several iterations with input from experts and field testing through PlayWorks | Inter-rater (weighted Kappa: 0.54–1.00, Scale ICC: 0.84); Test-retest (ICC: 0.95) | Convergent (associated with activity levels); Content (fit assessed using exploratory structural equation modeling) | 4-category items summed within subdomains | 2 |
| 10 | Hecht et al, 2017 | Inter-rater (Kappa: 0.88–1.00) | 1 | |||
| 11 | ISAT | Adapted from SPEEDY and IDEA, then customized by country | Inter-rater (% agreement: 83.9–100%; Kappa: 0.61–0.96) | Construct (could discriminate child PA between highest and lowest quintile schools) | Binary items were reported, and items were also summed within each category | 4 |
| 12 | Laurie et al, 2017 | New; developed using pre-existing polices and guidelines and piloted in 9 schools | Categorical data were expressed as frequencies and percentages | 2 | ||
| 13 | LCFO | Adapted from SNDA and unpublished tools, based on input from nutrition professionals | Inter-rater (% agreement: >80% with gold standard researcher; monthly quality control review) | 2 | ||
| 14 | PARA | Used unadapted from original PARA study of community physical activity resources | Type not specified (rs > 0.77) | Scored according to original PARA protocol (frequency of features, amenities, incivilities are summed; quality presented as a 3 or 4 item scale) | 3 | |
| 14.1 | PARA (Adapted) | Adapted from original PARA | Inter-rater (% agreement: >80% with research lead; monthly quality control review) | 2 | ||
| 15 | Patel et al, 2009 | New; developed by members community advisory board in several iterations, including a mock site visit | Inter-rater (Kappa = 0.65–1.0; ICC = 1.0) | Observers compare records to the foods/beverages that align with policy; Other info was qualitatively coded with other data sources to identify themes | 3 | |
| 16 | School Food Environment Scan | Inter-rater was not conducted because observers completed it together | Binary items summed into scale and dichotomized; frequency items summed and dichotomized (some vs none) | 2 | ||
| 17 | School Lunchroom Audits | Items are combined with field notes and photos to generate scale score for each service line; summed across all service lines in each school | 1 | |||
| 18 | SF-EAT | New; developed based on literature review and existing policy documents, then tested for feasibility in 7 schools | Face (circulated to Co-Is and project partners) | Items combined with other data sources into 6 pre-determined domains, each scored 1–5 based on extent to which initiatives are happening | 3 | |
| 19 | SNDA-III | Adapted from previous iterations of SNDA study | Binary variables on audit were combined with other data sources and summed in 3 different categories | 2 | ||
| 20 | SNEO | New; Conducted Q-sort with 8 research staff to select items | Inter-rater (Gewt’s AC1 = 0.73); Internal consistency (Cronbach’s a = 0.77–0.85) | Two subscales were created: recommended and non-recommended items | 3 | |
| 21 | SPACE Checklist | Used unadapted from SPACE (Spatial Planning and Children’s Exercise) study, but applied to school | Inter-rater (system to resolve discrepancies on-site) | 2 | ||
| 22 | SPAN-ET | Adapted from several existing instruments | Inter-rater (Percent agreement: 80.8–96.8%; Kappa: 0.61–0.94) | Face and content (field tested, with school personnel provided subject-matter expertise) | Binary items are summed within each category, then categories are explained by a 4-item scale | 4 |
| 23 | SPEEDY | New, but based on existing green space instrument | Inter-rater (% agreement: 76–90%; Kappa: 0.67–1) | Face (draft sent to 3 experts); Construct (could discriminate child PA between highest and lowest quintile schools) | Binary items were summed, frequencies were weighted by response mean, scales were weighted, then all were summed within each category | 2 |
| 23.1 | SPEEDY (Adapted; Dias et al, 2017) | Adapted from SPEEDY, used only sports and play facility category | Scored according to original SPEEDY protocol | 2 | ||
| 23.2 | SPEEDY (Adapted; Harrison et al, 2016) | Slightly adapted from SPEEDY, added 3 facilities commonly recorded as ‘other’ in original audit | Scored according to original SPEEDY protocol | 3 | ||
| 23.3 | SPEEDY (Adapted; Tarun et al, 2017) | Adapted from SPEEDY, removed few items and added “comments" (advised by local experts) | Inter-rater (Kappa scores: 0.4–1.0, % agreement: 61.9%–100.0%) | Scored according to original SPEEDY protocol | 3 | |
| 23.4 | SPEEDY (Unadapted; Chalkley et al, 2018) | 0 | ||||
| 23.5 | SPEEDY (Unadapted; Hyndman and Chancellor, 2017) | Used unadapted from original SPEEDY study | Scored according to original SPEEDY protocol | 2 |