Skip to main content
. 2019 Dec 19;15(4):858–877. doi: 10.5964/ejop.v15i4.1773

Table 2. Extraction Table of the Evaluated Studies.

Study /
Sample Characteristics
Research Design Creativity Measures (duration) Creativity Parameters Assessed Exercise Modality (intensity and duration) Scoring Outcomes and Conclusions
Blanchette et al. (2005)
N = 60, 30 males; 30 females;
Age: 18–27 (M = 20)
within-subject 1. TTCT Figural Tests A and B (10 minutes each form)
2. Creative Strengths Questionnaire
Abstractness of titles, fluency, originality, elaboration, resistance to premature closure per TTCT scoring guide Acute exercise protocol; primarily aerobic; self-selected (moderate: 30-minute) Four independent authors scored all of these anonymous instruments in random order. Interrater reliability was high with Pearson Correlation medians of .818 (range .766–.886) for H1, .850 (range .789–.870) for H2, and .826 (range .781–.917) for H3. Creative potential was elevated immediately post-exercise, relative to control (p < .001)
Creative potential was elevated 2-hrs post-exercise, relative to control (p < .001)
No statistically significant temporal differences were determined
Between the two exercise conditions (p = .251)
Colzato et al. (2013)
N = 96
(Age: M = 21)
Experimental Group: 48 habitual exercisers
Control Group: 48 inactive individuals
between-subjects cross-over design 30 RAT triads RAT (10 triads per condition)
3 AUT items (1 item per condition)
Flexibility, fluency, originality, elaboration Cycle ergometer ((rest (6-minute), (moderate (6-minute), (and intense (6-minute)) exercise (12-minute total cycling time)

Creativity was assessed during exercise for half of the participants in each group (24-minutes total protocol), and immediately after for the remaining half (36-minute total protocol)
RAT scored numerically via an index of total correct responses
AUT scoring was completed by two independent raters for the divergent thinking measure-no indication if participant responses were blinded to raters
Cronbach’s alpha scores for fluency, flexibility, originality and elaboration ranged from 0.74 to 1.00. We assume the authors intended to report inter-rater reliability
Intense exercise was associated with reductions in convergent thinking among inactive participants, compared to engaging in moderate exercise (p = .002) and rest (p = .029).
Creative flexibility on the AUT was higher at rest, than for intense exercise (p = .011) for both groups. There was no difference in AUT flexibility performance during rest or moderate-intensity exercise for both groups (p = .150).
Curnow and Turner (1992)
N = 46, 35 females;
Age: 18–24 (M = 19)
A) Music and Exercise
B) Exercise Only
C) Music Only
D) Control Group (no exercise-no music)
(no sample size reported for each separate group)
between-subjects TTCT Figural tests A (pre) and B (3-minute post condition) Fluency, originality and elaboration Cycle ergometer (20-minute submaximal workload of 150 kpm at a rate of 55 rpm The Scholastic Testing Service, Earth City, MO scored the assessments. However, no inter-rater reliability was reported. There were no statistically significant differences between groups for any creativity measure assessed.
Gondola and Tuckman (1985)
Control (no PA): n = 23
Experimental Group: n = 26
Mixed model AUT, Match Sticks and Consequences Pre-study and post-study chronic creativity (before exercising) Match Sticks, Obvious Consequences, Remote Consequences and AUT 8-week chronic training study (20-minute run for 16 sessions-2× per week) Followed scoring guides for convergent and divergent thinking measures
Did not detail how the scoring was completed, or if scoring was conducted by internal or external raters. No inter-rater reliability was reported
The experimental group outperformed the control group on the AUT (p < .01)
No additional differences were determined for the included creativity assessments
Gondola (1986)
Experimental Group 1: n = 23
Experimental Group 2: n = 19
Co-ed undergraduates (no other demographics reported)
Control: no sample size reported
Mixed model AUT, Match Sticks and Consequences Group 1 and 2: Pre-study and post-study chronic creativity (before exercising) Match Sticks, Obvious Consequences, Remote Consequences
Group 2: Acute Creativity (Match Sticks, Obvious Consequences, Remote Consequences and AUT) measured pre-and post-exercise for session 1
Group 1: 8-week chronic training study: 20-minute run for 16 sessions (2× per week)
Group 2: 6-week chronic training study: 20-minute run for 12 sessions (2× per week)
Scoring was completed by the author and one assistant. No inter-rater reliability was reported Both experimental groups performed better on the AUT relative to controls (p < .001).
Group 2 scored higher on Remote Consequences than the other two groups (p < .01).
Pre and post-acute creativity scores for Remote Consequences and the AUT were statistically significantly different for Group 2 (p < .001).
Gondola (1987)
N = 37 females;
Age: 19–35 (M = 23)
Experimental Group: n = 21
Control Group (no PA): n = 16
Mixed-model AUT and Consequences Acute creativity assessed at baseline and 5-minute post-exercise 1 week later (two visits) 20-minute moderate-to-vigorous aerobic dance No description of scoring methods was provided for replication. No inter-rater reliability was reported The experimental group scored higher on the AUT than the control group (p < .0001)
The experimental group scored higher on the Remote Consequences than the control group (p < .01)
Herman-Tofler and Tuckman (1998)
N = 52 third graders randomized into an Experimental (aerobic exercise physical education) or Control Group (traditional physical education) No sample size per group was reported. Mixed-Model TTCT Figural Test-Forms A (vertical parallel lines) and B (circles)
Time to complete the creativity assessments was not reported
Picture construction-original and detailed stories; multiple associations and divergent thinking 3 aerobic exercise sessions per week for 8 weeks Scoring per the TTCT manual
TTCT test-retest reliability coefficients were reported for the figural test forms (0.71–0.85). No inter-rater reliability was reported
The aerobic exercise group achieved increased figural fluency scores pre-to-post-intervention, compared to the control group (p = .04)
Aerobic power (measured via an 800-m run) was not statistically significantly different from pre-intervention to post-intervention (p = .266)
Hinkle et al. (1993)
N = 85
Experimental Group: n = 42; 20 males; 22 females
Control Group: n = 43; 24 males; 19 females
(Age: M = 13)
Mixed-Model Figural and Verbal versions of the TTCT tested in a group setting Verbal: divergent thinking, fantasy, unique thinking
Figural: elaboration, fluency, originality, and breaking set
Five outdoor running sessions per week for 8 weeks (no duration provided) No description of scoring methods was provided for replication
All creativity assessments were scored by one independent rater. Therefore, no inter-rater reliability could be reported
Pre-to-post scores for fluency, flexibility, and originality were marginally higher in the treatment group compared to controls (p < .05)
Females, irrespective of condition assignment, achieved marginally higher increases in verbal flexibility, verbal originality, and figural elaboration (p < .05).
Oppezzo and Schwartz (2014)
Experiment 1: N = 48 undergraduate psychology students
Experiment 2: N = 48; sit-sit; sit-tread; tread-sit conditions
Experiment 3: N = 40; sit-sit; sit-walk; walk-sit; and walk-walk
Experiment 4: N = 40; sit inside; walk inside; sit outside; walk outside
1) within-subject
2) between-subjects
3) between-subjects
4) between-subjects (2 × 2 design)
1) AUT (4-minute × 2 tasks consisting of 6 items total) and RAT (4-minute for 16 triads)
2) AUT (4-minute × 2 tasks consisting of 6 items total) × 2
3) AUT (same as above)
4) BSE (5-minute × 3 tasks-16-minute total session)
Ideation, novelty, appropriate uses, appropriate novelty, and non-repetitive uses
3 only) alfresco code (“outdoor” ideas)
4 only) analogy production coded for appropriateness, novelty, and high-quality responses, further determined by degree of detail and semantic distance
1) 12-minute seated followed by 12-minute treadmill walking
2) 8-minute of condition; 8-minute of complementary condition (i.e., 8-minute sit followed by 8-minute tread)
3) 16-minute seated indoors; 8-minutes seated indoors and 8-minute walking outdoors or 8-minute walking outdoors and 8-minute seated indoors; 16-minute walking outdoors
All divergent thinking parameters were subject to a-priori defined, researcher operationalizations of creativity
Analogies were further scored using Amabile’s (1996) consensual assessment technique.
Interrater reliability for the AUT was reported as r = .73. for Experiments #1 and #2, r = .74 for Experiment #3, and r = 1.0 for detail level and r = .98 for semantic distance in Experiment #4.
1) RAT performance decreased when walking (p = .03), while AUT performance increased when walking (p < .001).
2) The order of walking (before or after sitting) did not yield statistically significant differences (p = .975) at the end of the bout.
Decreased ideation on the AUT was determined from time-point 1 to time-point 2 in the tread-sit condition (p = .016).
Walking was associated with higher creativity performance on the AUT than sitting (p < .001).
3) Walking once was not statistically different than walking at both time-points on the AUT (p = .253)
Walking at both time-points resulted in a similar level of maintained creativity performance on the AUT across time (p = .507)
Sitting after walking mirrored the findings of experiment 2. Sitting after walking was associated with comparable creativity performance on the AUT as that achieved during walking (p = .335).
4) Walking was associated with higher-quality, novel analogies relative to individuals who sat. Being outdoors was independently related to novelty, albeit perhaps of lower-quality responses
Ramocki (2002)
N = 31
Experimental Group: n = 15
Control Group (no PA): n = 16
Age 20–40
between-subjects Baseline: AUT (20-minute), game development, (40-minute)
Post: metaphors (20-minute), planning a party (40-minute)
Creative fluency, flexibility, novelty (categorical), and global creativity (rank-ordered) One-hour of self-selected vigorous-intensity physical activity for experimental group Double-blinded scoring completed by three faculty and three student-raters (also participants in the study).
Kendall's Coefficients for the pretest assessment were W = .66 for the subject judges, W = .62 for the faculty judges, and W = .56 for all six judges combined.
Kendall's Coefficient for the posttest assessment was W = .59 for the subject judges, W = .73 for the faculty judges, and W = .61 for all six judges combined.
Only the mean change in pre- to post-fluency was statistically significant for the experimental group (p < .01).
Steinberg et al. (1997)
N = 63
Aerobic Exercise Group: 15 males; 16 females; age range 19–54; median age range 25–29
Aerobic Dance Group: 4 males; 28 females; age range 19–59; median age range 20–24
Four students were lost to attrition.
Mixed Model Unusual Uses Test of Creative Thinking (Tin Cans and Cardboard Boxes-5-minute per item) Fluency, flexibility, and originality 17 minutes of aerobic exercise defined as high-impact
21.7 minutes of aerobic dance defined as low-impact
A control condition was completed (counterbalanced order), consisting of a neutral video matched to exercise duration
Scoring of unusual uses was based on ratings summed across a four-point scale
Inter-rater reliability was reported between two independent raters at r = .89.
Flexibility was marginally higher in the exercise condition, compared to the video condition (p < .05)
Although favorable improvements in mood occurred with exercise (p < .001), mood failed to contribute to effects on creativity (> .05)
Tuckman and Hinkle (1986)
N = 154
n = 48 4th graders (Age: M = 9)
n = 53 5th graders (Age: M = 10)
n = 53 6th graders (Age: M = 11)
Number of participants in Experimental and control groups was not specified
Mixed Model AUT (10 items-no duration provided) No mention of specific creativity parameters was provided Three outdoor running sessions per week (30-minute each session) for 12 weeks
Active control group participated in regular physical education class activities
No procedures for scoring methods were reported. Thus, no inter-rater reliability was reported The experimental group outperformed the control group on the AUT (p < .001)
Boys in the experimental group achieved marginally higher AUT scores than girls following posttest analyses (p < .05)
Zhou et al. (2017) (Study 2a and 2b were excluded, as these did not evaluate exercise)
Study 1a. N = 63, 21 males and 42 females, Age: M = 21.25 [Study 1b. Same participants] within-subject 1a) DIT divergent thinking task
1b) CIT divergent thinking task (10 trails; 1-minute allocated to each trial)
1a) Scored task completion and task novelty
1b) Scored fluency, flexibility, and novelty
Study 1 and 1b) standing, constrained walking-Figure-of-8 Walk Test (F8W), and unconstrained walking (roaming) conditions (no exercise duration provided-likely about 10-minute) 1a) Creative novelty was rated by six experts on a scale of 1 (not original) to 5 (very original) for both experiments. Cronbach’s alpha was .79, and .70, respectively for the two experiments.
1b) Fluency and flexibility was scored by the primary investigator
1a) Novelty was highest in the roaming condition, compared to constrained walking and standing (p < .001).
1b) Fluency, flexibility, and novelty were highest in the roaming condition, compared to constrained walking and standing (p < .001). Constrained walking was also associated with higher fluency, flexibility, and novelty than standing (p < .001)