PROTOCOL: Incentives for climate mitigation in the land use sector: a mixed‐methods systematic review of the effectiveness of payment for environment services (PES) on environmental and socio‐economic outcomes in low‐ and middle‐income countries

. 2018 May 15;14(1):1–77. doi: 10.1002/CL2.209

Mixed‐methods critical appraisal tool to be used for critical appraisal for qualitative studies, process evaluations and descriptive quantitative studies

Study type	Methodological appraisal criteria				Response
Study type					Yes	No	Comment
Screening questions: assessing ‘fatal flaws’ (Dixon‐Woods 2005 ) Aggregative ‘fatal flaws’ based on Stewart et al (2014) Configurative ‘fatal flaws’ based on Pawson (2003) TAPUS framework	Aggregative assessment: ✓ Study reports primary data and applied methods ✓ Study reports before and after data¹ ✓ Study features an intervention and control group
	Configurative assessment: ✓ Study reports primary data and applied methods ✓ Study states clear research questions and objectives ✓ Study states clear research design, which is appropriate to address the stated research question and objectives (Purposivity) ✓ The findings of the study are based on collected data, which justify the knowledge claims (Accuracy)
	*Screening question based on abstract and/or superficial reading of full‐text: Further appraisal is not feasible or appropriate when the answer is ‘No’ to any of the above screening questions!*

Study type	Methodological appraisal criteria				Response
Study type					Yes	No	Comment / Confidence judgment
1. Qualitative e.g. (A) Ethnography (B) Phenomenology (C) Narrative (D) Grounded theory (E) Case study	I. RESEARCH IS DEFENSIBLE IN DESIGN (providing a research strategy that addresses the question) Appraisal indicators: ✓ Is the research design clearly specified and appropriate for aims and objectives of the research? Consider whether
	i. there is a discussion of the rationale for the study design
	ii. the research question is clear, and suited to qualitative inquiry
	iii. there are convincing arguments for different features of the study design
	iv. limitations of the research design and implications for the research evidence are discussed
	Defensible	Arguable	Critical	Not defensible	Worth to continue:

	II. RESEARCH FEATURES AN APPROPRIATE SAMPLE (following an adequate strategy for selection of participants) Appraisal indicators: Consider whether
	i. there is a description of study location and how/why it was chosen
	ii. the researcher has explained how the participants were selected
	iii. the selected participants were appropriate to collect rich and relevant data
	iv. reasons are given why potential participants chose not take part in study
	Appropriate sample	Functional sample	Critical sample	Flawed sample	Worth to continue:

	III. RESEARCH IS RIGOROUS IN CONDUCT (providing a systematic and transparent account of the research process) Appraisal indicators: Consider whether
	i. researchers provide a clear account/description of the process by which data was collected (e.g. for interview method, is there an indication of how interviews were conducted?/procedures for collection or recording of data?)
	ii. researchers demonstrate that data collection targeted depth, detail and richness of information (e.g. interview/observation schedule)
	iii. there is evidence of how descriptive analytical categories, classes, labels, etc. have been generated and used
	iv. presentation of data distinguishes clearly between the data, the analytical frame used, and the interpretation
	v. methods were modified during the study; and if so, has the researcher explained how and why?
	Rigorous conduct	Considerate conduct	Critical conduct	Flawed conduct	Worth to continue:

	IV. RESEARCH FINDINGS ARE CREDIBLE IN CLAIM/BASED ON DATA (providing well‐founded and plausible arguments based on the evidence generated) Appraisal indicators: Consider whether
	i. there is a clear description of the form of the original data
	ii. sufficient amount of data are presented to support interpretations and findings/conclusions
	iii. the researchers explain how the data presented were selected from the original sample to feed into the analysis process (i.e. commentary and cited data relate; there is an analytical context to cited data, not simply repeated description; is there an account of frequency of presented data?)
	iv. there is a clear and transparent link between data, interpretation, and findings/conclusion
	v. there is evidence (of attempts) to give attention to negative cases/outliers etc.
	Credible claims	Arguable claims	Doubtful claims	Not credible	If findings not credible, can data still be used?

	V. REASEARCH ATTENDS TO CONTEXTS (describing the contexts and particulars of the study) Appraisal indicators: Consider whether
	i. there is an adequate description of the contexts of data sources and how they are retained and portrayed?
	ii. participants' perspectives/observations are placed in personal contexts
	iii. appropriate consideration is given to how findings relate to the contexts (how findings are influenced by or influence the context)
	iv. the study makes any claims (implicit or explicit) that infer generalisation (if yes, comment on appropriateness)
	Context central	Context considered	Context mentioned	No context attention

	VI. RESEARCH IS REFLECTIVE (assessing what factors might have shaped the form and output of research) Appraisal indicators: Consider whether
	i. appropriate consideration is given to how findings relate to researchers' influence/own role during analysis and selection of data for presentation
	ii. researchers have attempted to validate the credibility of findings (e.g. triangulation, respondent validation, more than one analyst)
	iii. researchers explain their reaction to critical events that occurred during the study
	iv. researchers discuss ideological perspectives/values/philosophies and their impact on the methodological or other substantive content of the research (implicit/explicit)
	Reflection	Consideration	Acknowledgement	Unreflective research	NB: Can override previous exclusion!
OVERALL DECISCON – EXLUDE / INCLUDE (study generates new knowledge relevant to the review question and complies with minimum criteria to ensure reliability and empirical grounding of knowledge)

Sources used in this section (in alphabetical order); Campbell et al (2003); CASP (2006); CRD (2009); Dixon‐Woods et al (2004); Dixon‐Woods et al (2006)^{cited in Gough 2012}; Greenhalgh & Brown (2014); Harden et al (2004)^{cited in SCIE & Gough 2012}; Harden et al (2009); Harden & Gough (2012); Mays & Pope (1995); Pluye et al (2011); Spencer et al 2006; Thomas et al (2003); SCIE (2010).

Study type	Methodological appraisal criteria				Response
Study type					Yes	No	Comment / risk of bias judgment
2. Quantitative (non‐randomised; Randomised‐Controlled) Common non‐random design include: (A) Non‐randomised CT (B) Cohort studies (C) Case‐control (D) Cross‐sectional analytical studies Most common ways of controlling for bias due to baseline confounding: • Matching attempts to emulate randomization • Propensity score matching and methods • Stratification where sub‐groups have been compared • Regression analysis where covariates are adjusted for Randomised designs: Randomised Control Trial (RCT)	I. Selection bias: (Are participants recruited in a way that minimizes selection bias?) Appraisal indicators: Consider whether
	i. there is a clear description of how and why sample was chosen
	ii. there is adequate sample size to allow for representative and/or statistically significant conclusions
	iii. participants recruited in the control group were sampled from the same population as that of the treatment
	iv. group allocation process attempted to control for potential risk of bias
	Low risk of bias	Risk of bias	High risk of bias	Critical risk of bias	Worth to continue:

	II. Bias due to baseline confounding: (Is confounding potentially controllable in the context of this study?) Appraisal indicators: Consider whether
	i. the treatment and control group are comparable at baseline
	ii. matching was applied, and in case, featured sufficient criteria
	iii. the authors conducted an appropriate analysis that controlled for all potential critical confounding domains
	iv. the authors avoided to adjust for post‐intervention variables
	Low risk of bias	Risk of bias	High risk of bias	Critical risk of bias	Worth to continue:

	IF RANDOMISED CONTROL TRIAL, SKIP I + II AND START HERE! Bias due to ineffective randomisation: (Is allocation of treatment status truly random?) Appraisal indicators: Consider whether
	i. there is a clear description of the randomisation process
	ii. the unit of randomisation and number of participants is clearly stated (pay special attention to treatment and control locations/ balance)
	iii. eligibility criteria for study entry are specified
	iv. characteristics of baseline and endline sample are provided ¹						Preferable condition, see 1
	Low risk of bias	Risk of bias	High risk of bias	Critical risk of bias	If critical risk of bias, treat as non‐random study

	III. Bias due to departures from intended interventions (Was the intervention implemented as laid out in the study protocol?) Appraisal indicators: Consider whether
	i. the critical co‐interventions were balanced across intervention groups
	ii. treatment switches were low enough to not threaten the validity of the estimated effect of intervention
	iii. implementation failure was minor and unlikely to threaten the validity of the outcome estimate
	iv. it is possible that intervention was taken by the controls (contamination and possible crossing‐over)*						*whilst challenging in terms of estimating impact, spill‐overs might be an important finding in itself (eg teachers read to pupils/village/family members)
	v. it is possible that knowledge of the intervention group affects how the two study groups are treated in course of follow‐up by investigators?**						**consider only in extreme cases in which preferential treatment is clearly evident; blinding in general not expected in social interventions
	Low risk of bias	Risk of bias	High risk of bias	Critical risk of bias	Worth to continue:

	IV. Bias due to missing data (attrition) (Are the intervention groups free of critical differences in participants with missing data?) Appraisal indicators: Consider whether
	i. outcome data are reasonably complete (80% or above)
	ii. If ‘no’, are missing data reported?
	iii. If missing data:are proportion of participants and reasons for missing data similar across groups?
	iv. If missing data: Were appropriate statistical methods used to account for missing data? (e.g. sensitivity analysis)
	v. If not possible to control for missing data, are outcomes with missing data excluded from analysis?
	Low risk of bias	Risk of bias	High risk of bias	Critical risk of bias	Worth to continue:

	V. Outcome reporting bias (Are measurements appropriate, e.g. clear origin, or validity known?) Appraisal indicators: Consider whether
	i. there was an adequate period for follow up***						***in many social science interventions, follow‐up is not required to coincide with the start of the treatment; further, longer period of follow up are often required to measure changes. In the context of education, the question of retention – in particular when dealing with short intervention periods –(< 1 month) is of major interest.
	ii. the outcome measure was clearly defined and objective
	iii. outcomes were assessed using standardised instruments and indicators
	iv. outcome measurements reflect what the experiment set out to measure
	v. the methods of outcome assessment were comparable across experiential groups
	Low risk of bias	Risk of bias	High risk of bias	Critical risk of bias	Worth to continue:

	VI. Bias in selection of results reported (Are the reported outcomes consistent with the proposed outcomes at the protocol stage?) Appraisal indicators: Consider whether
	i. it is unlikely that the reported effect estimate is available primarily because it was a notable finding among numerous exploratory analyses
	ii. it is unlikely that the reported effect estimate is prone to selective reporting from among multiple outcome measurements within the outcome domain
	iii. it is unlikely that the reported effect estimate is prone to selective reporting from among multiple analyses of the outcome measurements
	iv. the analysis includes an intention to treat analysis? (If so, was this appropriate and were appropriate methods used to account for missing data?)****						****usually in clinical RCTs, rare in social science: only rate if conducted
	Low risk of bias	Risk of bias	High risk of bias	Critical risk of bias
OVERAL RISK OF BIAS:

Sources used in this section (in weighted order): Cochrane (2014); Stewart et al (2014); Stewart et al (2012); Higgins et al (2011); Greenhalgh & Brown (2014); Pluye et al (2011); Gough et al (2007)

Study type	Methodological appraisal criteria				Response
Study type					Yes	No	Comment /confidence judgment
3. Mixed‐methods ² Sequential explanatory design The quantitative component is followed by the qualitative. The purpose is to explain quantitative results using qualitative findings. E.g., the quantitative results guide the selection of qualitative data sources and data collection, and the qualitative findings contribute to the interpretation of quantitative results. Sequential exploratory design The qualitative component is followed by the quantitative. The purpose is to explore, develop and test an instrument (or taxonomy), or a conceptual framework (or theoretical model). E.g., the qualitative findings inform the quantitative data collection, and the quantitative results allow a generalization of the qualitative findings. Triangulation designs The qualitative and quantitative components are concomitant. The purpose is to examine the same phenomenon by interpreting qualitative and quantitative results (bringing data analysis together at the interpretation stage), or by integrating qualitative and quantitative datasets (e.g., data on same cases), or by transforming data (e.g., quantization of qualitative data). Embedded/convergent design The qualitative and quantitative components are concomitant. The purpose is to support a qualitative study with a quantitative sub‐study (measures), or to better understand a specific issue of a quantitative study using a qualitative sub‐study, e.g., the efficacy or the implementation of an intervention based on the views of participants.	I. RESEARCH INTEGRATION/SYNTHESIS OF METHODS (assessing the value‐added of the mixed‐methods approach) Applied mixed‐methods design: ○ Sequential explanatory design ○ Sequential explorative design ○ Triangulation design ○ Embedded design Appraisal indicators: Consider whether
	i. the rationale for integrating qualitative and quantitative methods to answer the research question is explained [DEFENSIBLE]
	ii. the mixed‐methods research design is relevant to address the qualitative and quantitative research questions, or the qualitative and quantitative aspects of the mixed methods research question [DEFENSIBLE]
	iii. there is evidence that data gathered by both research methods was brought together to inform new findings to answer the mixed‐methods research question (e.g. form a complete picture, synthesise findings, configuration) [CREDIBLE]
	iv. the approach to data integration is transparent and rigorous in considering all findings from both the qualitative and quantitative module (danger of cherry‐picking) [RIGOROUS]
	v. appropriate consideration is given to the limitations associated with this integration, e.g., the divergence of qualitative and quantitative data (or results)? [REFLEXIVE]
For mixed‐methods research studies, each component undergoes its individual critical appraisal first. Since qualitative studies are either included or excluded, no combined risk of bias assessment is facilitated, and the assigned risk of bias from the quantitative component similarly holds for the mixed‐methods research. The above appraisal indicators only refer to the applied mixed‐methods design. If this design is not found to comply with each of the four mixed‐methods appraisal criteria below, then the quantitative/qualitative components will individually be included in the review:
Mixed‐methods critical appraisal: 1. Research is defensible in design 2. Research is rigorous in conduct 3. Research is credible in claim 4. Research is reflective	Qualitative critical appraisal: Include / Exclude				Quantitative critical appraisal: 1. Low risk of bias 2. Risk of bias 3. High risk of bias 4. Critical risk of bias
Combined appraisal: Include / Exclude mixed‐methods findings judged with____________________________ risk of bias

Section based on Pluye et al (2011). Further sources consulted (in alphabetical order): Creswell & Clark (2007); Crow (2013); Long (2005); O'Cathain et al (2008); O'Cathain (2010); Pluye & Hong (2014); Sirriyeh et al (2011).

Two theoretical exceptions to this rule apply:

i)
A RCT with appropriate randomization procedure can be included without showing baseline data, as both experimental groups can be assumed to be equal at baseline by design.
ii)
A sophisticated quasi‐experimental design such as PSM or RDD in theory could make the same claim to not require baseline data.

In both cases, the advise of an evaluation specialist will be thought as the researcher does not have the capacity to make an informed judgment in such specialist cases.

The mixed‐methods Critical Appraisal is facilitated for studies applying an explicit mixed‐methods approach. The component is applied in addition to criteria for the qualitative component (I to VI), and appropriate criteria for the quantitative component (I to VI).