Table 4.

Quality reporting of observational school audit tool studies.

ID		Was formative or pilot testing done, or are adaptations described?	What type (if any) of reliability testing was done? What were the results?	What type (if any) validity testing was done? What were the results?	Is the scoring protocol described?	Total Items Reported
1	ACTION! Staff Audit	New; pilot study				1
2	Adachi et al, 2013				Locations described, then other items summed to reflect total number of machines, filled slots, and machine-front advertising per school	1
3	Belansky et al, 2013				Combined with other data to describe implementation changes, then qualitatively classified changes as effective, promising or emerging.	1
4	Branding Checklist	New; pilot-tested at first school			Combined with other data sources, qualitatively analyzed using constant comparative method to find patterns	2
5	Co-SEA	Adapted from Endorse and SPEEDY; Pilot tested to work through technical difficulties			Scored similarly to ENDORSE and SPEEDY, with change scores calculated from Year 1 to Year 2	2
5.1	Co-SEA Unadapted	Used unadapted from original COMPASS study				1
6	EAPRS	New; input from parks officials/users and made revisions over several iterations	Interrater (% agreement, ICC, Kappas: 66% items had good-excellent reliabiliy)	Face (Several rounds of input from parks and rec staff and park users)	Variable created for each exposure category (summed binary or frequency items, averaged categorical items)	4
7	ENDORSE	New; Reviewed by experts and pilot-tested			Summed or counted, re-coded into 8 ′′ availability” variables, which were dichotomized or categorized into tertiles	2
8	Food Decision Environment Tool	New; developed using behavioral economics theories, modified based on feedback from school/study stakeholders throughout	Inter-rater (system to resolve discrepancies during analysis, conducted peer debriefing meetings to clarify)	Trustworthiness of methods (e.g., credibility, transferability, dependability, confirmability) were established and described	Data from observational form were summarized and triangulated using field notes, and analyzed qualitatively for emerging themes	4
9	GRF-OT	New; developed over several iterations with input from experts and field testing through PlayWorks	Inter-rater (weighted Kappa: 0.54–1.00, Scale ICC: 0.84); Test-retest (ICC: 0.95)	Convergent (associated with activity levels); Content (fit assessed using exploratory structural equation modeling)	4-category items summed within subdomains	2
10	Hecht et al, 2017		Inter-rater (Kappa: 0.88–1.00)			1
11	ISAT	Adapted from SPEEDY and IDEA, then customized by country	Inter-rater (% agreement: 83.9–100%; Kappa: 0.61–0.96)	Construct (could discriminate child PA between highest and lowest quintile schools)	Binary items were reported, and items were also summed within each category	4
12	Laurie et al, 2017	New; developed using pre-existing polices and guidelines and piloted in 9 schools			Categorical data were expressed as frequencies and percentages	2
13	LCFO	Adapted from SNDA and unpublished tools, based on input from nutrition professionals	Inter-rater (% agreement: >80% with gold standard researcher; monthly quality control review)			2
14	PARA	Used unadapted from original PARA study of community physical activity resources	Type not specified (rs > 0.77)		Scored according to original PARA protocol (frequency of features, amenities, incivilities are summed; quality presented as a 3 or 4 item scale)	3
14.1	PARA (Adapted)	Adapted from original PARA	Inter-rater (% agreement: >80% with research lead; monthly quality control review)			2
15	Patel et al, 2009	New; developed by members community advisory board in several iterations, including a mock site visit	Inter-rater (Kappa = 0.65–1.0; ICC = 1.0)		Observers compare records to the foods/beverages that align with policy; Other info was qualitatively coded with other data sources to identify themes	3
16	School Food Environment Scan		Inter-rater was not conducted because observers completed it together		Binary items summed into scale and dichotomized; frequency items summed and dichotomized (some vs none)	2
17	School Lunchroom Audits				Items are combined with field notes and photos to generate scale score for each service line; summed across all service lines in each school	1
18	SF-EAT	New; developed based on literature review and existing policy documents, then tested for feasibility in 7 schools		Face (circulated to Co-Is and project partners)	Items combined with other data sources into 6 pre-determined domains, each scored 1–5 based on extent to which initiatives are happening	3
19	SNDA-III	Adapted from previous iterations of SNDA study			Binary variables on audit were combined with other data sources and summed in 3 different categories	2
20	SNEO	New; Conducted Q-sort with 8 research staff to select items	Inter-rater (Gewt’s AC1 = 0.73); Internal consistency (Cronbach’s a = 0.77–0.85)		Two subscales were created: recommended and non-recommended items	3
21	SPACE Checklist	Used unadapted from SPACE (Spatial Planning and Children’s Exercise) study, but applied to school	Inter-rater (system to resolve discrepancies on-site)			2
22	SPAN-ET	Adapted from several existing instruments	Inter-rater (Percent agreement: 80.8–96.8%; Kappa: 0.61–0.94)	Face and content (field tested, with school personnel provided subject-matter expertise)	Binary items are summed within each category, then categories are explained by a 4-item scale	4
23	SPEEDY	New, but based on existing green space instrument	Inter-rater (% agreement: 76–90%; Kappa: 0.67–1)	Face (draft sent to 3 experts); Construct (could discriminate child PA between highest and lowest quintile schools)	Binary items were summed, frequencies were weighted by response mean, scales were weighted, then all were summed within each category	2
23.1	SPEEDY (Adapted; Dias et al, 2017)	Adapted from SPEEDY, used only sports and play facility category			Scored according to original SPEEDY protocol	2
23.2	SPEEDY (Adapted; Harrison et al, 2016)	Slightly adapted from SPEEDY, added 3 facilities commonly recorded as ‘other’ in original audit			Scored according to original SPEEDY protocol	3
23.3	SPEEDY (Adapted; Tarun et al, 2017)	Adapted from SPEEDY, removed few items and added “comments" (advised by local experts)	Inter-rater (Kappa scores: 0.4–1.0, % agreement: 61.9%–100.0%)		Scored according to original SPEEDY protocol	3
23.4	SPEEDY (Unadapted; Chalkley et al, 2018)					0
23.5	SPEEDY (Unadapted; Hyndman and Chancellor, 2017)	Used unadapted from original SPEEDY study			Scored according to original SPEEDY protocol	2