. 2019 Dec 19;15(4):858–877. doi: 10.5964/ejop.v15i4.1773

Table 2. Extraction Table of the Evaluated Studies.

Study / Sample Characteristics	Research Design	Creativity Measures (duration)	Creativity Parameters Assessed	Exercise Modality (intensity and duration)	Scoring	Outcomes and Conclusions
Blanchette et al. (2005)
N = 60, 30 males; 30 females; Age: 18–27 (M = 20)	within-subject	1. TTCT Figural Tests A and B (10 minutes each form) 2. Creative Strengths Questionnaire	Abstractness of titles, fluency, originality, elaboration, resistance to premature closure per TTCT scoring guide	Acute exercise protocol; primarily aerobic; self-selected (moderate: 30-minute)	Four independent authors scored all of these anonymous instruments in random order. Interrater reliability was high with Pearson Correlation medians of .818 (range .766–.886) for H1, .850 (range .789–.870) for H2, and .826 (range .781–.917) for H3.	Creative potential was elevated immediately post-exercise, relative to control (p < .001) Creative potential was elevated 2-hrs post-exercise, relative to control (p < .001) No statistically significant temporal differences were determined Between the two exercise conditions (p = .251)
Colzato et al. (2013)
N = 96 (Age: M = 21) Experimental Group: 48 habitual exercisers Control Group: 48 inactive individuals	between-subjects cross-over design	30 RAT triads RAT (10 triads per condition) 3 AUT items (1 item per condition)	Flexibility, fluency, originality, elaboration	Cycle ergometer ((rest (6-minute), (moderate (6-minute), (and intense (6-minute)) exercise (12-minute total cycling time) Creativity was assessed during exercise for half of the participants in each group (24-minutes total protocol), and immediately after for the remaining half (36-minute total protocol)	RAT scored numerically via an index of total correct responses AUT scoring was completed by two independent raters for the divergent thinking measure-no indication if participant responses were blinded to raters Cronbach’s alpha scores for fluency, flexibility, originality and elaboration ranged from 0.74 to 1.00. We assume the authors intended to report inter-rater reliability	Intense exercise was associated with reductions in convergent thinking among inactive participants, compared to engaging in moderate exercise (p = .002) and rest (p = .029). Creative flexibility on the AUT was higher at rest, than for intense exercise (p = .011) for both groups. There was no difference in AUT flexibility performance during rest or moderate-intensity exercise for both groups (p = .150).
Curnow and Turner (1992)
N = 46, 35 females; Age: 18–24 (M = 19) A) Music and Exercise B) Exercise Only C) Music Only D) Control Group (no exercise-no music) (no sample size reported for each separate group)	between-subjects	TTCT Figural tests A (pre) and B (3-minute post condition)	Fluency, originality and elaboration	Cycle ergometer (20-minute submaximal workload of 150 kpm at a rate of 55 rpm	The Scholastic Testing Service, Earth City, MO scored the assessments. However, no inter-rater reliability was reported.	There were no statistically significant differences between groups for any creativity measure assessed.
Gondola and Tuckman (1985)
Control (no PA): n = 23 Experimental Group: n = 26	Mixed model	AUT, Match Sticks and Consequences	Pre-study and post-study chronic creativity (before exercising) Match Sticks, Obvious Consequences, Remote Consequences and AUT	8-week chronic training study (20-minute run for 16 sessions-2× per week)	Followed scoring guides for convergent and divergent thinking measures Did not detail how the scoring was completed, or if scoring was conducted by internal or external raters. No inter-rater reliability was reported	The experimental group outperformed the control group on the AUT (p < .01) No additional differences were determined for the included creativity assessments
Gondola (1986)
Experimental Group 1: n = 23 Experimental Group 2: n = 19 Co-ed undergraduates (no other demographics reported) Control: no sample size reported	Mixed model	AUT, Match Sticks and Consequences	Group 1 and 2: Pre-study and post-study chronic creativity (before exercising) Match Sticks, Obvious Consequences, Remote Consequences Group 2: Acute Creativity (Match Sticks, Obvious Consequences, Remote Consequences and AUT) measured pre-and post-exercise for session 1	Group 1: 8-week chronic training study: 20-minute run for 16 sessions (2× per week) Group 2: 6-week chronic training study: 20-minute run for 12 sessions (2× per week)	Scoring was completed by the author and one assistant. No inter-rater reliability was reported	Both experimental groups performed better on the AUT relative to controls (p < .001). Group 2 scored higher on Remote Consequences than the other two groups (p < .01). Pre and post-acute creativity scores for Remote Consequences and the AUT were statistically significantly different for Group 2 (p < .001).
Gondola (1987)
N = 37 females; Age: 19–35 (M = 23) Experimental Group: n = 21 Control Group (no PA): n = 16	Mixed-model	AUT and Consequences	Acute creativity assessed at baseline and 5-minute post-exercise 1 week later (two visits)	20-minute moderate-to-vigorous aerobic dance	No description of scoring methods was provided for replication. No inter-rater reliability was reported	The experimental group scored higher on the AUT than the control group (p < .0001) The experimental group scored higher on the Remote Consequences than the control group (p < .01)
Herman-Tofler and Tuckman (1998)
N = 52 third graders randomized into an Experimental (aerobic exercise physical education) or Control Group (traditional physical education) No sample size per group was reported.	Mixed-Model	TTCT Figural Test-Forms A (vertical parallel lines) and B (circles) Time to complete the creativity assessments was not reported	Picture construction-original and detailed stories; multiple associations and divergent thinking	3 aerobic exercise sessions per week for 8 weeks	Scoring per the TTCT manual TTCT test-retest reliability coefficients were reported for the figural test forms (0.71–0.85). No inter-rater reliability was reported	The aerobic exercise group achieved increased figural fluency scores pre-to-post-intervention, compared to the control group (p = .04) Aerobic power (measured via an 800-m run) was not statistically significantly different from pre-intervention to post-intervention (p = .266)
Hinkle et al. (1993)
N = 85 Experimental Group: n = 42; 20 males; 22 females Control Group: n = 43; 24 males; 19 females (Age: M = 13)	Mixed-Model	Figural and Verbal versions of the TTCT tested in a group setting	Verbal: divergent thinking, fantasy, unique thinking Figural: elaboration, fluency, originality, and breaking set	Five outdoor running sessions per week for 8 weeks (no duration provided)	No description of scoring methods was provided for replication All creativity assessments were scored by one independent rater. Therefore, no inter-rater reliability could be reported	Pre-to-post scores for fluency, flexibility, and originality were marginally higher in the treatment group compared to controls (p < .05) Females, irrespective of condition assignment, achieved marginally higher increases in verbal flexibility, verbal originality, and figural elaboration (p < .05).
Oppezzo and Schwartz (2014)
Experiment 1: N = 48 undergraduate psychology students Experiment 2: N = 48; sit-sit; sit-tread; tread-sit conditions Experiment 3: N = 40; sit-sit; sit-walk; walk-sit; and walk-walk Experiment 4: N = 40; sit inside; walk inside; sit outside; walk outside	1) within-subject 2) between-subjects 3) between-subjects 4) between-subjects (2 × 2 design)	1) AUT (4-minute × 2 tasks consisting of 6 items total) and RAT (4-minute for 16 triads) 2) AUT (4-minute × 2 tasks consisting of 6 items total) × 2 3) AUT (same as above) 4) BSE (5-minute × 3 tasks-16-minute total session)	Ideation, novelty, appropriate uses, appropriate novelty, and non-repetitive uses 3 only) alfresco code (“outdoor” ideas) 4 only) analogy production coded for appropriateness, novelty, and high-quality responses, further determined by degree of detail and semantic distance	1) 12-minute seated followed by 12-minute treadmill walking 2) 8-minute of condition; 8-minute of complementary condition (i.e., 8-minute sit followed by 8-minute tread) 3) 16-minute seated indoors; 8-minutes seated indoors and 8-minute walking outdoors or 8-minute walking outdoors and 8-minute seated indoors; 16-minute walking outdoors	All divergent thinking parameters were subject to a-priori defined, researcher operationalizations of creativity Analogies were further scored using Amabile’s (1996) consensual assessment technique. Interrater reliability for the AUT was reported as r = .73. for Experiments #1 and #2, r = .74 for Experiment #3, and r = 1.0 for detail level and r = .98 for semantic distance in Experiment #4.	1) RAT performance decreased when walking (p = .03), while AUT performance increased when walking (p < .001). 2) The order of walking (before or after sitting) did not yield statistically significant differences (p = .975) at the end of the bout. Decreased ideation on the AUT was determined from time-point 1 to time-point 2 in the tread-sit condition (p = .016). Walking was associated with higher creativity performance on the AUT than sitting (p < .001). 3) Walking once was not statistically different than walking at both time-points on the AUT (p = .253) Walking at both time-points resulted in a similar level of maintained creativity performance on the AUT across time (p = .507) Sitting after walking mirrored the findings of experiment 2. Sitting after walking was associated with comparable creativity performance on the AUT as that achieved during walking (p = .335). 4) Walking was associated with higher-quality, novel analogies relative to individuals who sat. Being outdoors was independently related to novelty, albeit perhaps of lower-quality responses
Ramocki (2002)
N = 31 Experimental Group: n = 15 Control Group (no PA): n = 16 Age 20–40	between-subjects	Baseline: AUT (20-minute), game development, (40-minute) Post: metaphors (20-minute), planning a party (40-minute)	Creative fluency, flexibility, novelty (categorical), and global creativity (rank-ordered)	One-hour of self-selected vigorous-intensity physical activity for experimental group	Double-blinded scoring completed by three faculty and three student-raters (also participants in the study). Kendall's Coefficients for the pretest assessment were W = .66 for the subject judges, W = .62 for the faculty judges, and W = .56 for all six judges combined. Kendall's Coefficient for the posttest assessment was W = .59 for the subject judges, W = .73 for the faculty judges, and W = .61 for all six judges combined.	Only the mean change in pre- to post-fluency was statistically significant for the experimental group (p < .01).
Steinberg et al. (1997)
N = 63 Aerobic Exercise Group: 15 males; 16 females; age range 19–54; median age range 25–29 Aerobic Dance Group: 4 males; 28 females; age range 19–59; median age range 20–24 Four students were lost to attrition.	Mixed Model	Unusual Uses Test of Creative Thinking (Tin Cans and Cardboard Boxes-5-minute per item)	Fluency, flexibility, and originality	17 minutes of aerobic exercise defined as high-impact 21.7 minutes of aerobic dance defined as low-impact A control condition was completed (counterbalanced order), consisting of a neutral video matched to exercise duration	Scoring of unusual uses was based on ratings summed across a four-point scale Inter-rater reliability was reported between two independent raters at r = .89.	Flexibility was marginally higher in the exercise condition, compared to the video condition (p < .05) Although favorable improvements in mood occurred with exercise (p < .001), mood failed to contribute to effects on creativity (> .05)
Tuckman and Hinkle (1986)
N = 154 n = 48 4^th graders (Age: M = 9) n = 53 5^th graders (Age: M = 10) n = 53 6^th graders (Age: M = 11) Number of participants in Experimental and control groups was not specified	Mixed Model	AUT (10 items-no duration provided)	No mention of specific creativity parameters was provided	Three outdoor running sessions per week (30-minute each session) for 12 weeks Active control group participated in regular physical education class activities	No procedures for scoring methods were reported. Thus, no inter-rater reliability was reported	The experimental group outperformed the control group on the AUT (p < .001) Boys in the experimental group achieved marginally higher AUT scores than girls following posttest analyses (p < .05)
Zhou et al. (2017) (Study 2a and 2b were excluded, as these did not evaluate exercise)
Study 1a. N = 63, 21 males and 42 females, Age: M = 21.25 [Study 1b. Same participants]	within-subject	1a) DIT divergent thinking task 1b) CIT divergent thinking task (10 trails; 1-minute allocated to each trial)	1a) Scored task completion and task novelty 1b) Scored fluency, flexibility, and novelty	Study 1 and 1b) standing, constrained walking-Figure-of-8 Walk Test (F8W), and unconstrained walking (roaming) conditions (no exercise duration provided-likely about 10-minute)	1a) Creative novelty was rated by six experts on a scale of 1 (not original) to 5 (very original) for both experiments. Cronbach’s alpha was .79, and .70, respectively for the two experiments. 1b) Fluency and flexibility was scored by the primary investigator	1a) Novelty was highest in the roaming condition, compared to constrained walking and standing (p < .001). 1b) Fluency, flexibility, and novelty were highest in the roaming condition, compared to constrained walking and standing (p < .001). Constrained walking was also associated with higher fluency, flexibility, and novelty than standing (p < .001)