Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Apr 1.
Published in final edited form as: Psychol Med. 2020 Aug 18;52(6):1080–1088. doi: 10.1017/S0033291720002834

Prevalence of Harm in Mindfulness-Based Stress Reduction

Matthew J Hirshberg 1, Simon B Goldberg 1,2, Melissa Rosenkranz 1,3, Richard J Davidson 1,3,4
PMCID: PMC7889774  NIHMSID: NIHMS1613718  PMID: 32807249

Abstract

Background

Mindfulness meditation has become a common method for reducing stress, stress-related psychopathology and some physical symptoms. As mindfulness programs become ubiquitous, concerns have been raised about their unknown potential for harm. We estimate multiple indices of harm following Mindfulness-based Stress Reduction (MBSR) on two primary outcomes: global psychological and physical symptoms. In secondary analyses we estimate multiple indices of harm on anxiety and depressive symptoms, discomfort in interpersonal relations, paranoid ideation and psychoticism.

Methods

Intent-to-treat analyses with multiple imputation for missing data were used on pre- and post-test data from a large, observational dataset (n = 2155) of community health clinic MBSR classes and from MBSR (n = 156) and waitlist control (n = 118) participants from three randomized controlled trials conducted contemporaneous to community classes in the same city by the same health clinic MBSR teachers. We estimate change in symptoms, proportion of participants with increased symptoms, proportion of participants reporting greater than a 35% increase in symptoms, and for global psychological symptoms, clinically significant harm.

Results

We find no evidence that MBSR leads to higher rates of harm relative to waitlist control on any primary or secondary outcome. On many indices of harm across multiple outcomes, community MBSR was significantly preventative of harm.

Conclusions

Engagement in MBSR is not predictive of increased rates of harm relative to no treatment. Rather, MBSR may be protective against multiple indices of harm. Research characterizing the relatively small proportion of MBSR participants that experience harm remains important.

Keywords: Mindfulness, psychological symptoms, harm, physical symptoms, behavioral intervention

Introduction

Mindfulness meditation has become a common method for reducing stress and stress-related psychopathology. In 2017, over 14% of American adults (~14 million) used some form of meditation in the prior year, a threefold increase from 2012 (Clarke et al., 2015; Clarke & Stussman, 2018). Mindfulness-based interventions such as Mindfulness-Based Stress Reduction (MBSR; Kabat-Zinn, 1982) and Mindfulness-based Cognitive Therapy (MBCT; Segal et al., 2018) are now implemented in in-patient and out-patient psychiatric settings as primary or adjunct treatments for stress, depression and substance abuse (Segal et al., 2018; Witkiewitz et al., 2005). In addition, they are increasingly being used with other vulnerable populations, including school children. For example, researchers in the United Kingdom are undertaking an ambitious study implementing mandatory mindfulness training with tens of thousands of school children (Hayes et al., 2019).

The proliferation of mindfulness interventions in clinical and public settings corresponds with rapid growth in research on mindfulness (American Mindfulness Research Association, 2019). Although considerable research has evaluated the efficacy of manualized mindfulness interventions (e.g., MBSR, MBCT) on clinical conditions and in healthy populations, there is a dearth of reporting on contraindications (Baer et al., 2019; Britton, 2019; Van Dam et al., 2017). As a consequence, there exist no rigorous estimates of harm following engagement in a mindfulness-based intervention (Baer et al., 2019). Scientific (Baer et al., 2019) as well as media (Grant, 2018) outlets have recently published cautionary notes about the expansion of these techniques absent valid and reliable estimates of harm.

The scientific and contemplative literatures contain reports of contraindications (Lindahl et al., 2017). A well-conducted qualitative study and anecdotal reports describe severe effects such as the onset of psychosis and mania (Lindahl et al., 2017; Van Dam et al., 2017; Wallace, 2011). However, most contraindication reports follow periods of intensive or long-term practice, not the relatively modest engagement expected in public-facing programs (Britton, 2019). Clinicians and the public are nevertheless placed in the difficult position of having to make determinations about the appropriateness of meditation interventions without all of the necessary guidance. Meta-analyses on clinical (Goldberg et al., 2018) and non-clinical populations (Khoury et al., 2015) indicate that mindfulness interventions are effective treatments for a range of conditions. Consequently, researchers have recommended them to clinicians for treatment of stress-related symptoms (Goyal et al., 2014), but these recommendations are provided absent good data on the potential for harm.

No consensus operationalization of harm exists (Linden, 2013; Taylor et al., 2012). In randomized controlled trials (RCTs), change in groups receiving and not receiving the experimental treatment are statistically compared to determine whether rates of change are significantly different. If one group exhibits average increases in symptoms that are significantly different from another group, such a result indicates harm. However, null hypothesis testing has been criticized for statistical (e.g., detecting a significant effect is largely dependent on sample size; Freiman et al., 1992) and practical reasons (e.g., statistical significance is not necessarily practically meaningful; Thompson, 2002). In addition, detecting harm based on average rates of change can be problematic because group effects may mask individual harm events that are important to understand (Thompson, 2002).

Thresholds for within-subject or within-group percent change (e.g., > 35%) are widely used as benchmarks for treatment-response (Erzegovesi et al., 2001; Revicki et al., 2008). Although this approach could be used to estimate harm, it has also been criticized as arbitrary and unstandardized (Linden & Schermuly-Haupt, 2014). Statistically-grounded indices that ostensibly establish clinically significant change have been proposed as well (Jacobson & Truax, 1991). After computing clinical versus non-clinical symptom population cut-offs, researchers can examine the proportion of participants moving from a non-clinical to clinical symptom level. This approach has not been widely adopted and has also been critiqued (Linden, 2013).

Given a lack of consensus regarding how best to assess harm, one approach that addresses the concerns associated with any one operationalization is to report on multiple harm indices. By estimating harm across multiple indices, we can understand the sensitivity of an effect conditional on how harm is operationalized. For example, if the proportion of individuals who experience an increase in symptoms following treatment is relatively high but the proportion experiencing a >35% increase in symptoms is very low, concerns may be tempered. In contrast, if the proportion of individuals experiencing an increase in symptoms is relatively low but of those individuals a very high proportion experience large increases, there may be cause for concern about adverse outcomes. Similarly, harm can occur in many domains (e.g., global physical symptoms or interpersonal relationships). A comprehensive portrait of harm requires pairing estimates of multiple operationalizations of harm across different domains.

The purpose of this research is to provide clinicians and the public with quantitative estimates of harm following MBSR. Given the lack of consensus on how best to operationalize harm and prior reports that meditation may induce harm in multiple domains (Lindahl et al., 2017), we follow Dimidjian and Hollon’s (2010) simple definition of harm as outcomes worse than would have been expected in the absence of treatment.

On the full sample (N = 2429), we estimate average change on two primary domains: global psychological and physical symptoms. We first assess the proportion of participants reporting elevated post-treatment symptoms. Second, following the convention that a >35% increase in symptoms is clinically meaningful, we analyze the proportion of participants reporting a > 35% increase in symptoms. Third, using established clinically significant cut-offs on our measure of global psychological symptoms (Symptoms Checklist-90R Global Severity Index; SCL-90R GSI; Derogatis, 1992) we compute clinically significant change (Jacobson & Truax, 1991) and analyze the proportion of participants that experience clinically significant harm. Fourth, on the subset of the sample from whom we have item-level SCL-90R data (n = 521), we estimate the first three harm indices (average symptom change, proportion worsening, and proportion with a > 35% increase in symptoms) on five symptom domains that Lindahl and colleagues (2017) reported to be adversely affected by intensive meditation practice: anxiety and depressive symptoms, interpersonal relations, paranoid ideation, and psychoticism.

Method

Mindfulness-Based Stress Reduction

MBSR is an 8-week manualized program consisting of weekly 2.5-hour classes and a 6-hour practice day (Kabat-Zinn, 2013). It is widely implemented in health care and other public settings and has been studied extensively (Crane et al., 2017).

Data

Ethics board approval was obtained in order to access community health clinic records and pair them with the randomized controlled trial data (RCT; Table 1). RCT participants consented to participate after study procedures were fully explained. The community health clinic offers pay-for-service MBSR classes. Beginning in 2002, all individuals registered for MBSR were asked to complete the SCL-90R (Derogatis, 1992) and the Medical Symptoms Checklist (MSC; Travis, 1977) before and following MBSR. Completing the forms was not mandatory and did not affect the ability to participate in MBSR. From 2002 to 2013, the clinic program manager collected forms, entered the summed SCL-90R and MSC total scores into a spreadsheet, and then deleted the item-level data. For data from 2013 to 2016, trained undergraduate research assistants entered raw item-level data into a spreadsheet. We report on all participants from whom at least pre- or post-MBSR GSI and MSC data were collected between 2002 and 2016 (n = 2155). Based on enrollment data during this period, the current sample represents approximately 85% of the total number of health clinic MBSR participants.

Table 1.

Demographics and Descriptive Statistics by Data type

n T1 data only (n) T2 data only (n) T1 & T2 data (n) Gender(n) Age Race (n) T1 GSI T2 GSI T1 MSC T2 MSC

Community MBSR 2155 GSI (492)
MSC (513)
GSI (61)
MSC (86)
GSI (1602)
MSC (1556)
71% Female (1533)
26 % Male (562)
3% Unknown (60)
47.41 (12.85) Data not collected 61.78 (42.67) 35.56 (31.98) 18.29 (8.78) 11.33 (10.35)
RCT MBSR 156 GSI (16)
MSC (11)
GSI (0)
MSC (12)
GSI (140)
MSC (117)
65% Female (85)
45% Male (71)
44.42 (12.64) 1.92% Native American (3)
3.21% Latinx (5)
3.21% African American (5)
7.05% Asian/Pacific Isl. (11)
85.89% White (134)
27.05 (25.73) 25.33 (22.97) 8.78 (8.46) 7.70 (8.13)
RCT WLC 118 GSI (5)
MSC (12)
GSI (0)

MSC (10)
GSI (140)
MSC (96)
49% Female (58)
51% Male (60)
43.51 (12.64) 2.54% Native American (3)
2.54% Latinx (3)
0% African American (0)
7.63% Asian/Pacific Isl. (9)
87.29% White (103)
22.83 (19.80) 27.62 (23.32) 8.61 (7.94) 14.77 (10.99)
Anxiety Depression Interpersonal Sensitivity Paranoid Ideation Psychoticism
T1 T2 T1 T2 T1 T2 T1 T2 T1 T2

Community MBSR 247 0.53 (54) 0.30 (0.38) 0.85 (0.70) 0.49 (0.51) 0.67 (0.62) 0.43 (0.46) 0.40 (0.56) 0.10 (0.28) 0.27 (0.38) 0.14 (0.26)
RCT MBSR 156 0.22 (0.31) 0.18 (0.24) 0.42 (0.44) 0.39 (0.43) 0.41 (0.53) 0.34 (0.40) 0.32 (0.48) 0.27 (0.42) 0.13 (0.21) 0.13 (0.22)
RCT WLC 118 0.19 (0.30) 0.20 (0.30) 0.35 (0.37) 0.45 (0.46) 0.29 (0.33) 0.35 (0.39) 0.24 (0.36) 0.29 (0.41) 0.11 (0.19) 0.14 (0.22)

Note: T1 = Pre-test. T2 = Post-test about 10-weeks later. Community MBSR= community health clinic data; RCT MBSR = aggregated data from three consecutive NIH-sponsored clinical trials testing MBSR; RCT WLC = aggregated data from RCT 2 and 3 that included a wait-list control group. GSI = Global Severity Index (global psychological symptoms measure of the Symptom Checklist 90-Revised; Derogatis, 1992). MSC = Medical Symptoms Checklist (number of bothersome medical symptoms in the prior month; Travis, 1977). T1 MSC data missing from RCT MBSR and RCT WLC is due to technical error.

Because selection and demand biases may influence estimates from the community data, we include data pooled from three consecutive National Institutes of Health-funded RCTs (RCTs 1, 2 and 3; U01AT002114–01A1 and P01AT004952, respectively) that included MBSR (RCT MBSR, n = 156) and a waitlist control condition (WLC, n = 118; RCTs 2 and 3 only). These data are useful comparisons because they were collected contemporaneous to health clinic classes (i.e., 2004 to 2018) in the same city, and RCT MBSR classes were taught by the community MBSR teachers in the same physical space as community MBSR classes.

Outcome Measurements

The two primary outcome measures in this study are the GSI (α ≥ .95, all samples), a measure of global psychological symptom severity, and the MSC total score (α ≥ .95, all samples), a measure of the number of bothersome physical symptoms across over 100 common physical ailments. We analyze harm in four ways on the GSI and three on the MSC: 1) mean group change, 2) proportion with increased symptoms, 3) proportion with a > 35% increase in symptoms, and 4) on the GSI only, proportion with clinically significant harm. For clinically significant harm analyses, we apply Schmitz and colleagues (2000) statistically-formulated distribution cutoffs for the GSI; functional symptom levels (GSI < 54), moderately symptomatic (54 ≤ GSI ≤ 108), and severely symptomatic (GSI > 108). Participants who moved from functional to moderately symptomatic or moderately to severely symptomatic were coded as experiencing clinically significant harm.

In secondary analyses, we utilize the subset of the sample for whom we have item-level SCL-90R data (n = 521) to estimate all harm indices (except clinically significant change due to a lack of standardized cutoffs) on five symptom clusters. Symptom clusters were selected based on domains previously noted as showing increases in the context of meditation (Lindahl et al., 2017). Other SCL-90R clusters were less obviously relevant (e.g., phobic anxiety) and were not examined. The five clusters examined are anxiety (α = .84) and depressive symptoms (α = .89), interpersonal sensitivity (i.e., discomfort, negative expectancy and self-doubt in social relations; α = .84), and the more severe psychiatric symptom clusters of paranoid ideation (α = .70) and psychoticism (α = .73). Paranoid ideation assesses disordered thinking such as projective thought, suspiciousness and fear of loss of autonomy. Psychoticism represents a spectrum of symptoms from social withdrawal to acute psychotic symptoms.

Missing data approach

Community data had 2.83 and 3.99% missingness at pre-test (GSI/MSC) and 22.83 and 23.81% missingness at post-test (GSI/MSC). Of those participants missing post-test data, 2.00% dropped out of the MBSR class. RCT MBSR and WLC data had no pre-test missingness on the GSI and 9.60 and 8.47% missingness on the MSC, respectively. RCT MBSR and WLC data had 10.26 and 4.24% post-testing GSI missingness and 7.05 and 10.17% post-test MSC missingness, respectively.

Sensitivity analysis examining whether pre-test variables were significantly associated with post-test missingness showed that participation year (z = −2.41, p < .001) and gender (z = −2.41, p = .016) were negatively associated with providing post-test data (women were more likely to have missing post-test data), while older age (z = 4.25, p < .001) was significantly associated with presence of post-test data. Because observed variables are related to missingness, we assume data are missing at random and appropriate for multiple imputation (Graham, 2009). We used predictive mean matching through a multiple imputation with chained equations procedure, imputing 50 datasets with seed set to 1981 for replicability (Buuren & Groothuis-Oudshoorn, 2011). All data processing and analyses were conducted in R v.4.0.0 (R Team, 2014).

Statistical Analysis

We conducted intent-to-treat analysis based on the 50 imputed datasets. Rubin’s (2004) pooling rules were followed. In all regression models, age and gender were entered as covariates and data type (community MBSR, RCT MBSR, RCT WLC) was entered as the categorical independent variable of interest. We control for Type I error within each outcome (e.g., GSI, MSC, anxiety symptoms) with False Discovery Rate correction (Benjamini & Hochberg, 1995).

For the analysis of average change in symptoms, we estimated a multiple regression model with post-test score as the dependent variable and pre-test score on the outcome as a covariate. For examining the proportion of participants with increased symptoms (i.e., post-test minus pre-test change > 0), we estimated a multiple logistic regression model with increased symptoms (Yes/No) as the dependent variable. For the analysis of the proportion of participants with a > 35% increase in symptoms, we estimated a multiple logistic regression model with > 35% increase (Yes/No) as the dependent variable (Erzegovesi et al., 2001). For clinically significant harm on the GSI, we estimated a multiple logistic regression model with a one or two category increase in symptoms (Yes/No) as the dependent variable (Schmitz et al., 2000).

Confidence intervals (95% CIs) for point estimates of mean change were estimated using Rubin’s (2004) rules. Standardized mean differences with their corresponding CIs are provided as an estimate of an effect’s magnitude. Point estimate CIs for proportions were estimated by bootstrapping 5000 samples of the original data, imputing 50 datasets on each bootstrapped sample, and computing an average 95% CI from the bootstrapped, imputed datasets (Schomaker & Heumann, 2018). The Absolute Risk Reduction (ARR) – the difference in the incidents of harm in MBSR versus RCT WLC – is provided as an effect size estimate for proportions. CIs for ARRs were estimated in the same way as proportion CIs.

Results

Change on Primary Outcomes

Average Symptom Change

Community MBSR participants reported an average GSI reduction of 26.15 (−42.33%), compared to a 1.72 reduction in RCT MBSR (−6.36%) and a 4.75 increase in RCT WLC (+ 20.89%). Results from multiple regression analysis showed that predicted change in community MBSR was significantly different than RCT WLC (b = −9.74, se = 2.47, t(1476) = −3.95, p < .001, d = −0.30 95% CI[−0.45, −0.15]) and RCT MBSR (b = −5.93, se = 2.26, t(1075) = −2.63, p = .014, d = −0.17 [−0.30, −0.04]) (Figure 2). Change in RCT MBSR and RCT WLC was not significantly different (b = −3.80, se = 3.13, t(1333) = −1.22, p = .224, d = −0.12 [−0.31, 0.07]).

Figure 2. Average Change in Psychological (a) and Physical (b) Symptoms by Data Type.

Figure 2.

Note: A. Residualized change on the Global Severity Index was significantly different in community MBSR compared to RCT WLC and RCT MBSR (standardized mean difference = −0.30 and −0.17, respectively). No significant difference was observed between RCT MBSR and WLC (standardized mean difference = −0.12. B. Residualized change in bothersome physical symptoms on the Medical Symptoms Checklist was significantly different in both community and RCT MBSR compared to RCT WLC (standardized mean differences = − 0.70; −0.74, respectively. Change in community and RCT MBSR was not significantly different (standardized mean differences = − 0.22).

Consistent with psychological symptoms, average predicted change in physical symptoms was −6.95 (−38.00%), − 1.07 (−12.19%) and + 6.15 (+71.43%) in the community MBSR, RCT MBSR, and RCT WLC groups, respectively. Change in community MBSR was significantly different from RCT WLC (b = −8.13, se = 0.88, t(776) = −9.19, p < .001, d = − 0.70 [−0.84, −0.55]) but not RCT MBSR (b = −1.0, se = 0.73, t(1914) = −1.42, p = .157, d = −0.09 [−0.22, 0.04]). Change in RCT MBSR was significantly different from RCT WLC (b = −7.09, se = 1.08, t(1269) = −6.58, p < .001, d = −0.74 [−0.97, −0.51]).

Proportion with Increased Symptoms

Among community MBSR participants, 15.17% [13.90, 17.38] experienced greater symptoms at post-test compared to 43.67% [36.36, 51.85] and 57.61% [48.84, 66.34] of RCT MBSR and WLC participants, respectively (Figure 2). The proportion of community MBSR participants reporting increased symptoms at post-test was significantly smaller than RCT WLC (z = −9.36, p < .001, ARR = 41 [32.27, 49.97]) and RCT MBSR (z = −7.78, p < .001). The proportion of RCT MBSR reporting increased symptoms was significantly smaller than in RCT WLC (z = −2.05, p = .041, ARR = 13 [1.65, 24.79]).

Consistent with psychological symptoms, 17.64% [16.31, 19.63] of community MBSR, 39.32% [32.56, 47.77] of RCT MBSR, and 66.15% [55.25, 72.43] of RCT WLC reported greater physical symptoms at post-test. The proportion of community MBSR participants reporting increased symptoms was significantly smaller than RCT WLC (z = −9.83, p < .001, ARR = 46 [37.90, 54.64]) and RCT MBSR (z = −4.23, p < .001). The proportion of RCT MBSR reporting increased symptoms was significantly smaller than in RCT WLC (z = −5.32, p < .001, ARR = 24 [12.52, 36.11]).

Proportion with > 35% Symptom Increase

In community MBSR, 6.83% [6.64, 8.96] of participants reported a >35% increase on the GSI from pre- to post-test compared to 32.31% [25.71, 39.69] of RCT MBSR and 38.65% [29.92, 47.38] of RCT WLC participants. Community MBSR participants were significantly less likely to >35% increases on the GSI at post-test compared to RCT WLC (z = −9.22, p < .001, ARR = 31 [22.19, 39.80]) and RCT MBSR (z = −9.03, p < .001). There was no difference between RCT MBSR and RCT WLC rates of > 35% increases in symptoms (z = −1.01, p = .298, ARR = 6 [−5.10, 16.63]) (Figure 2).

Community MBSR had the lowest proportion of participants reporting a > 35% increase in physical symptoms at post-test 9.62% [9.03, 11.66] compared to RCT MBSR 29.30% [23.31, 37.16], and RCT WLC 53.11% [41.20, 59.60]. Community MBSR had significantly fewer participants reporting >35% increases in symptoms compared to RCT WLC (z = −10.52, p < .001 ARR = 40 [30.79, 48.99]) and RCT MBSR (z = −6.44, p < .001). RCT MBSR had significantly fewer participants reporting >35% increases in symptoms than RCT WLC (z = −3.55, p = .004, ARR = 20 [8.67, 31.83]).

Clinically Significant Harm

Applying Schmitz et al.’s (2000) framework, among the subpopulation of participants reporting functional symptom levels at pre-test, 3.59% [3.19, 5.03] of community MBSR, 4.41 % [1.55, 7.65] of RCT, and 9.01% [4.07, 14.19] of WLC reported clinically significant harm (Figure 2). No significant differences in rates of clinically significant harm were observed between groups (ps > .05). The ARR relative to RCT WLC was 5 for both community [−0.09, 10.01] and RCT MBSR [−1.03, 10.76].

Change on Secondary Outcomes

Details of all secondary outcome analyses are provided in supplementary materials Table 1.

Average Symptom Change

Average change in community MBSR was significantly different than RCT WLC on depressive symptoms (p = .003, d = −0.35 [−0.49, −0.13]) and paranoid ideation (p < .001, d = −0.60 [−0.79, −0.40]), but not on psychoticism (p = .051, d = −0.23 [−0.42, −0.04]) or interpersonal sensitivity (p = .074, d = −0.19 [−0.36, −0.03]) following error correction. No difference was observed in anxiety symptoms (p = .678, d = −0.06 [−0.22, 0.10]). There were no significant differences between RCT MBSR and RCT WLC (all ps > .150).

Community and RCT MBSR change was significantly different on depressive symptoms (p = .050, d = −0.18 [−0.34, −0.01]) and paranoid ideation (p < .001, d = −0.30 [−0.48, −0.12]), but not on psychoticism following error correction (p = .053, d = −0.19 [−0.42, −0.04]). No differences were observed on anxiety symptoms (p = .868, d = −0.01 [−0.17, 0.15]) or interpersonal sensitivity (p = .438, d = −0.06 [−0.21, 0.09]).

Proportion with Worsening Symptoms

The proportion of participants reporting greater symptoms at post-test was significantly smaller in community RCT compared to RCT WLC on depressive symptoms (p < .001, ARR = 30 [20.33, 40.82]), interpersonal sensitivity (p = .015, ARR = 19 [9.36, 30.78]), paranoid ideation (p < .001, ARR = 27 [17.61, 37.35]) , and psychoticism (p < .001, ARR = 19 [8.50, 27.68]), but not anxiety following error correction (p = .093, ARR = 14 [4.14, 23.26]). RCT MBSR rates of increased symptoms were not significantly different than RCT WLC on any symptom cluster following error correction: anxiety (p = .693, ARR = 4 [−8.91, 16.74]); depression (p = .500 ARR = 7 [−4.66, 17.90]); interpersonal sensitivity p = .150, ARR = 8 [−2.68, 20.60]); paranoid ideation (p = .054, ARR = 17 [7.33, 27.00]); and psychoticism (p < .321, ARR = 8 [−3.59, 18.72]).

Community MBSR rates of increased symptoms differed from RCT MBSR on depressive symptoms (p < .001), but not anxiety symptoms (p = .093); interpersonal sensitivity (p = .150), paranoid ideation (p = .054); psychoticism (p = .072) following error correction.

Proportion With > 35% Increase in Symptoms

The proportion of community MBSR participants reporting a >35% increase in symptoms was significantly smaller than RCT WLC and RCT MBSR on all secondary outcomes: anxiety symptoms (p = .002, ARR = 14 [5.21, 23.45]; p = .015); depressive symptoms (p < .001, ARR = 30 [21.27, 39.50]; p < .001), interpersonal sensitivity (p < .001, ARR = 18 [9.30, 27.95]; p = .020); paranoid ideation (p < .001 ARR = 25 [15.85, 35.01]; p = .022); and psychoticism (p < .001, ARR = 18 [7.70, 26.30]; p = .008), for comparisons with RCT WLC and RCT MBSR respectively. A significantly lower proportion of RCT MBSR compared to RCT WLC participants reported a >35% increase in symptoms on paranoid ideation (p = .019, ARR = 16 [6.64, 26.58]). There were no other differences between RCT MBSR and RCT WLC on anxiety symptoms (p = .896, ARR = 3 [−9.01, 14.69]); depressive symptoms (p = .274, ARR = 7 [−4.16, 18.45]), interpersonal sensitivity (p = .158, ARR = 7 [−3.34, 17.30]); or psychoticism (p = .206, ARR = 8 [−2.94,18.18]).

Associations of Baseline Symptoms, Harm and Drop-out

Higher baseline symptoms were not significantly associated with any index of harm on primary or secondary outcomes or with drop-out (all ps > .05).

Discussion

Using population health records from 2155 community MBSR participants and data from 274 RCT participants collected contemporaneously, we estimate prevalence of multiple indices of harm following MBSR. Applying Dimidjian and Hollon’s (2010) definition of harm as outcomes worse than would have been expected in the absence of treatment, regardless of how harm was operationalized, the harm domain assessed (i.e., GSI, anxiety), or MBSR context (community or RCT), we find no evidence that rates of harm following MBSR are significantly greater than rates of harm following no treatment. To the contrary, on many harm indices across multiple domains, community and RCT MBSR predicted significantly less harm.

We conducted 44 contrasts between an MBSR group and RCT WLC across our 22 estimates of harm, leading to an 89.53% chance of observing at least one statistically significant (p < .05) contrast. There was not a single contrast where MBSR led to significantly greater harm, but we observed 22 contrasts in which MBSR led to significantly lower rates of harm than no treatment. We interpret these data as strong evidence that MBSR is no more harmful than no treatment on the indices of harm we estimated. Further, this pattern of results suggests that MBSR may be preventative against increased psychological and physical symptoms.

In practical terms, our results indicate that compared to no treatment, for every 100 individuals engaged in community MBSR, 41 fewer will experience increased psychological symptoms, 31 fewer a >35% increase in psychological symptoms, and five fewer clinically significant harm. Following RCT MBSR, 13 fewer individuals will experience increased psychological symptoms, six fewer a >35% increase in psychological symptoms, and five fewer clinically significant worsening compared to no treatment over the same approximately 10-week period. Harm on bothersome physical symptoms was similar. For every 100 individuals engaged in community or RCT MBSR, 46 and 24 fewer experience increased in physical symptoms, and 40 and 20 fewer >35% increase in physical symptoms compared to no treatment.

Global metrics of psychological or physical symptoms may mask MBSR-related harm within particular domains of distress. In the subsample of participants for whom we had item-level data, (n = 521), we therefore examined five psychological symptom clusters that together comprise many of the domains in which concerns about adverse effects have been reported (e.g., Lindahl et al., 2017). Consistent with primary outcome analyses, we find no evidence for increased harm but evidence for salutary MBSR effects. Notably, MBSR’s preventative benefits were observed for some metrics of harm across domains, from anxiety and depressive symptoms to perceptions of social relationships, and on more severe psychiatric symptom domains (paranoid ideation and psychoticism).

Comparisons between community MBSR and RCT WLC should be interpreted cautiously. Community MBSR participants selected into and paid for MBSR. As a result of RCT inclusion criteria (e.g., no current psychiatric diagnosis, not currently taking pain medication), community MBSR participants had significantly higher baseline symptoms on all outcomes. Most, but not all, of the evidence that MBSR is protective against increased symptoms relative to WLC base rates were from community MBSR versus RCT WLC contrasts. We are therefore circumspect about the evidence that MBSR is protective. At the same time, because community MBSR participants were more symptomatic, these data suggest that MBSR is no more harmful than no treatment even among participants reporting higher levels of baseline psychological and physical distress. Moreover, baseline symptoms were not significantly associated with harm outcomes or drop-out.

Limitations

There a few important limitations to acknowledge. Because of sample differences, these data do not allow us to conclude that MBSR is protective against base rates of symptom increases. They also do not allow us to explore the possible mechanisms or significance behind the consistent gradation in harm when comparing community MBSR, RCT MBSR, and RCT WLC. The observed protective benefits of community relative to RCT MBSR could be explained by selection or demand biases, or regression to the mean. It is equally plausible that RCT MBSR effects are diminished, particularly when study inclusion criteria rule out symptomatic participants. MBSR is a behavioral intervention; motivation to engage is an important component in treatment outcomes (Prochaska & Velicer, 1997). Continued research is required to understand these questions and provide insight into the true effects of MBSR.

Although secondary analyses allowed us to examine MBSR-related harm in most categories that have been highlighted as areas of concern, there are other domains in need of investigation. For example, future research should examine harm in family and work life, whether any incidents of harm are related to malpractice, or whether MBSR increases unwanted events (Linden, 2013). Relatedly, our ability to examine the role of individual differences in harm was limited. Continued research on the impact of individual differences on harm is needed. In particular, because the community data did not include race/ethnicity, were not able to examine the effect of race/ethnicity on harm. Lastly, the nature of our assessment methods did not allow us to investigate the possibility that some psychologically difficult experiences may reflect the intended change processes in meditation-based interventions (e.g., discomfort associated with disrupting habitual tendencies) and that individuals’ interpretation of these experience may influence their impact (Lindahl et al., 2017).

Conclusions

As mindfulness and other forms of meditation rapidly expand in popularity, it is crucial to understand the potential for harm. We find no evidence that MBSR leads to increased incidence of harm and suggestive evidence that MBSR may be protective against the development of harm relative to no treatment. Results were consistent regardless of the operationalization of harm (e.g., a >35% increase in symptoms), the domain of harm (e.g., physical symptoms, anxiety), or the MBSR context (i.e., community or RCT). Coupled with research on the benefits of MBSR, our findings support Goyal and colleagues (2014) conclusion that clinicians should recommend MBSR for psychological stress and physical symptoms.

Although these data provide strong evidence against claims that MBSR may increase harm on the indices we estimated, concerns about adverse meditation effects extend beyond relatively brief, manualized interventions (Baer et al., 2019; Britton, 2019; Lindahl et al., 2017). The current research does not shed light on the potential for deleterious outcomes during intensive mediation practice (e.g., intensive retreat). Although the number of individuals for whom such concerns are germane is small, it is nonetheless an important area for future research. However, in the most widely disseminated manualized mindfulness program, MBSR, there appears to be little cause for concern.

Supplementary Material

Supplementary Table 1

Figure 1. Smoothed Density Plot of Pre- to Post-Test Psychological (A) and Physical (B) Symptoms Change by Data type.

Figure 1.

Note: A. GSI = Global Severity Index from the Symptoms Checkllist 90 Revised (Derogatis, 1992). B. MSC = Medical Symptoms Checklist total score. 0 point = no pre- to post-test change in symptoms.

Figure 3. Change of Psychological and Physical Symptom Indices of Harm.

Figure 3.

Note: GSI = Global Severity Index of the SCL-90R. MSC = Medical Symptoms Checklist. ARR = Absolute risk reduction; Error bars estimated through multiple imputation on each of the 5000 bootstrapped samples; a. Percent of the sample with increased symptoms at post-test; b. Percent of the sample reporting a > 35% increase in global psychological symptoms at post-test; c. Percent of the sample reporting clinically significant change (i.e., moving from a non-clinical to clinical symptom population or from moderately to severely symptomatic); d. Percent of the sample reporting increases in bothersome physical symptoms at post-test; e. Percent of the sample reporting a > 35% increase in bothersome physical symptoms at post-test *** p < .001; **p < .01; * p < .05.

Acknowledgment

We thank Katherine Bonus, Robert Gillespie, Heather Sorensen, Lisa Thomas-Prince and Margaret Kalscheur for collecting community MBSR data and Jeanette Mumford for consulting on the statistical analyses.

Financial Support

Financial support was provided by a National Academy of Education / Spencer Foundation postdoctoral research fellowship to MJH, the National Institutes of Health (U01AT002114–01A1 and P01AT004952) to RJD, MK and Antoine Lutz, the National Center For Complementary & Integrative Health (K23AT010879) to SBG, and by generous donations to the Center for Healthy Minds. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Footnotes

Conflicts of interest

Matthew J. Hirshberg is a contracted provider at the community MBSR providing clinic. Richard J. Davidson is the founder, president, and serves on the board of directors for the non-profit organization, Healthy Minds Innovations, Inc. In addition, RJD served on the board of directors for the Mind & Life Institute from 1992–2017. No donors, either anonymous or identified, have participated in the design, conduct, or reporting of research results in this manuscript.

Statement of Ethics

The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008. All participants in all of the studies included in this manuscript provided their written, informed consent before participating. All methods and procedures were reviewed and approved by the University of Wisconsin Madison Institutional Review Board.

References

  1. American Mindfulness Research Association. (2019). Mindfulness scientific publications. https://goamra.org/resources/ [Google Scholar]
  2. Baer R, Crane C, Miller E, & Kuyken W (2019). Doing no harm in mindfulness-based programs: Conceptual issues and empirical findings. Clinical Psychology Review. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Benjamini Y, & Hochberg Y (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), 289–300. [Google Scholar]
  4. Britton WB (2019). Can Mindfulness Be Too Much of a Good Thing? The Value of a Middle Way. Current Opinion in Psychology. 10.1016/j.copsyc.2018.12.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Buuren S, & Groothuis-Oudshoorn K (2011). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3). http://doc.utwente.nl/78938/ [Google Scholar]
  6. Chiesa A, & Serretti A (2009). Mindfulness-Based Stress Reduction for Stress Management in Healthy People: A Review and Meta-Analysis. The Journal of Alternative and Complementary Medicine, 15(5), 593–600. 10.1089/acm.2008.0495 [DOI] [PubMed] [Google Scholar]
  7. Clarke TC, & Stussman BJ (2018). Use of Yoga, Meditation, and Chiropractors Among U.S. Adults Aged 18 and Over. 325, 8. [PubMed] [Google Scholar]
  8. Clarke TC, Stussman BJ, & Nahin RL (2015). Trends in the Use of Complementary Health Approaches Among Adults: United States, 2002–2012. 79, 16. [PMC free article] [PubMed] [Google Scholar]
  9. Crane RS, Brewer J, Feldman C, Kabat-Zinn J, Santorelli S, Williams JMG, & Kuyken W (2017). What defines mindfulness-based programs? The warp and the weft. Psychological Medicine, 47(6), 990–999. [DOI] [PubMed] [Google Scholar]
  10. Derogatis LR (1992). SCL-90-R: Administration, scoring and procedures manual for the R (evised) version and other instruments of the psychopathology rating scale series. Clinical Psychometric Research. [Google Scholar]
  11. Dimidjian S, & Hollon SD (2010). How would we know if psychotherapy were harmful? American Psychologist, 65(1), 21. [DOI] [PubMed] [Google Scholar]
  12. Erzegovesi S, Cavallini MC, Cavedini P, Diaferia G, Locatelli M, & Bellodi L (2001). Clinical Predictors of Drug Response in Obsessive-Compulsive Disorder. Journal of Clinical Psychopharmacology, 21(5), 488. [DOI] [PubMed] [Google Scholar]
  13. Freiman JA, Chalmers TC, Smith H, & Kuebler RR (1992). The importance of beta, the type II error, and sample size in the design and interpretation of the randomized controlled trial. Medical Uses of Statistics, 357–373. [DOI] [PubMed] [Google Scholar]
  14. Goldberg SB, Tucker RP, Greene PA, Davidson RJ, Wampold BE, Kearney DJ, & Simpson TL (2017). Mindfulness-based interventions for psychiatric disorders: A systematic review and meta-analysis. Clinical Psychology Review. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Goyal M, Singh S, Sibinga EM, Gould NF, Rowland-Seymour A, Sharma R, Berger Z, Sleicher D, Maron DD, Shihab HM, & others. (2014). Meditation programs for psychological stress and well-being: A systematic review and meta-analysis. JAMA Internal Medicine, 174(3), 357–368. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Graham JW (2009). Missing data analysis: Making it work in the real world. Annual Review of Psychology, 60, 549–576. [DOI] [PubMed] [Google Scholar]
  17. Grant A (2018, January 19). Opinion | Can We End the Meditation Madness? The New York Times. https://www.nytimes.com/2015/10/10/opinion/can-we-end-the-meditation-madness.html [Google Scholar]
  18. Hayes D, Moore A, Stapley E, Humphrey N, Mansfield R, Santos J, Ashworth E, Patalay P, Bonin E-M, Moltrecht B, Boehnke JR, & Deighton J (2019). Promoting mental health and wellbeing in schools: Examining Mindfulness, Relaxation and Strategies for Safety and Wellbeing in English primary and secondary schools: study protocol for a multi-school, cluster randomised controlled trial (INSPIRE). Trials, 20(1), 640. 10.1186/s13063-019-3762-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Jacobson NS, & Truax P (1991). Clinical significance: A statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59(1), 12. [DOI] [PubMed] [Google Scholar]
  20. Kabat-Zinn J (1982). An outpatient program in behavioral medicine for chronic pain patients based on the practice of mindfulness meditation: Theoretical considerations and preliminary results. General Hospital Psychiatry, 4(1), 33–47. [DOI] [PubMed] [Google Scholar]
  21. Kabat-Zinn J (2013). Full catastrophe living, revised edition: How to cope with stress, pain and illness using mindfulness meditation. Hachette UK. [Google Scholar]
  22. Khoury B, Sharma M, Rush SE, & Fournier C (2015). Mindfulness-based stress reduction for healthy individuals: A meta-analysis. Journal of Psychosomatic Research, 78(6), 519–528. [DOI] [PubMed] [Google Scholar]
  23. Lindahl JR, Fisher NE, Cooper DJ, Rosen RK, & Britton WB (2017). The varieties of contemplative experience: A mixed-methods study of meditation-related challenges in Western Buddhists. PLOS ONE, 12(5), e0176239. 10.1371/journal.pone.0176239 [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Linden M (2013). How to Define, Find and Classify Side Effects in Psychotherapy: From Unwanted Events to Adverse Treatment Reactions. Clinical Psychology & Psychotherapy, 20(4), 286–296. 10.1002/cpp.1765 [DOI] [PubMed] [Google Scholar]
  25. Linden M, & Schermuly-Haupt M-L (2014). Definition, assessment and rate of psychotherapy side effects. World Psychiatry, 13(3), 306–309. 10.1002/wps.20153 [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Prochaska JO, & Velicer WF (1997). The Transtheoretical Model of Health Behavior Change. American Journal of Health Promotion, 12(1), 38–48. 10.4278/0890-1171-12.1.38 [DOI] [PubMed] [Google Scholar]
  27. Revicki D, Hays RD, Cella D, & Sloan J (2008). Recommended methods for determining responsiveness and minimally important differences for patient-reported outcomes. Journal of Clinical Epidemiology, 61(2), 102–109. 10.1016/j.jclinepi.2007.03.012 [DOI] [PubMed] [Google Scholar]
  28. Rubin DB (2004). Multiple imputation for nonresponse in surveys (Vol. 81). John Wiley & Sons. [Google Scholar]
  29. Schmitz N, Hartkamp N, & Franke GH (2000). Assessing clinically significant change: Application to the SCL-90–R. Psychological Reports, 86(1), 263–274. [DOI] [PubMed] [Google Scholar]
  30. Schomaker M, & Heumann C (2018). Bootstrap inference when using multiple imputation. Statistics in Medicine, 37(14), 2252–2266. 10.1002/sim.7654 [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Segal ZV, Williams M, & Teasdale J (2018). Mindfulness-based cognitive therapy for depression. Guilford Publications. [Google Scholar]
  32. Taylor S, Abramowitz JS, & McKay D (2012). Non-adherence and non-response in the treatment of anxiety disorders. Journal of Anxiety Disorders, 26(5), 583–589. 10.1016/j.janxdis.2012.02.010 [DOI] [PubMed] [Google Scholar]
  33. Team, R. C. (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, 2012. ISBN 3–900051-07–0. [Google Scholar]
  34. Thompson B (2002). “Statistical,” “Practical,” and “Clinical”: How Many Kinds of Significance Do Counselors Need to Consider? Journal of Counseling & Development, 80(1), 64–71. 10.1002/j.1556-6678.2002.tb00167.x [DOI] [Google Scholar]
  35. Travis JW (1977). Wellness Workbook: A Guide to Attaining High Level Wellness for Health Professionals. Wellness Resource Center. [Google Scholar]
  36. Van Dam NT, van Vugt MK, Vago DR, Schmalzl L, Saron CD, Olendzki A, Meissner T, Lazar SW, Kerr CE, & Gorchov J (2017). Mind the hype: A critical evaluation and prescriptive agenda for research on mindfulness and meditation. Perspectives on Psychological Science, 1745691617709589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Wallace BA (2011). Stilling the Mind: Shamatha Teachings from Dudjom Lingpa’s Vajra Essence. Simon and Schuster. [Google Scholar]
  38. Witkiewitz K, Marlatt GA, & Walker D (2005). Mindfulness-based relapse prevention for alcohol and substance use disorders. Journal of Cognitive Psychotherapy, 19(3), 211–228. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Table 1

RESOURCES