Skip to main content
PLOS One logoLink to PLOS One
. 2025 Jun 16;20(6):e0324936. doi: 10.1371/journal.pone.0324936

Applying qualitative methods to experimental designs: A tutorial for the behavioral sciences

Hidde Jelmer Leplaa 1,¤,*, Jari A Tönjes 1, Mariska Bouterse 1, Karlijn F B Soppe 1, Irene Klugkist 1
Editor: Gabriel Velez2
PMCID: PMC12169552  PMID: 40522903

Abstract

Studies with experimental designs are almost invariably evaluated with quantitative outcomes and methods, both in behavioral sciences and other disciplines. We argue that there can be added value of using qualitative methods for the evaluation of (behavioral) experiments. Incorporating qualitative data can enhance the ecological validity of a study, by acquiring a more holistic understanding of the phenomenon of interest. There is, however, little methodological guidance on how to implement such an approach.

In this paper we present the different steps and considerations for a qualitative evaluation of results in experimental designs. Methodological guidelines are offered for each stage of a study, from formulation of the research goals, through data collection and data analysis, to the interpretation of a potential effect of the intervention. In addition, there is ample attention for ensuring the rigor of the research. The presented guidelines are developed and illustrated using an empirical example, in which a constructivist grounded theory approach was applied to evaluate the effect of empathy prompts on the motivation to adhere to COVID-19 regulations.

Introduction

Experimental designs are common research designs in the social and behavioral sciences [1,2]. Although there are different types of experimental designs, shortly outlined hereafter, they almost invariably have one thing in common: quantitative data are used to evaluate the effects. In this paper, we demonstrate and discuss the use of qualitative data to evaluate experimental designs, resulting in a methodological tutorial which may be especially beneficial for studies aiming to explain human behavior.

Before elaborating further on the different, qualitative and quantitative, approaches to measure the potential effect of an experimental manipulation, we shortly recap the different experimental designs, starting with the distinction between true experiments and quasi-experiments [3]. In both true and quasi-experiments participants are subjected to some sort of treatment or manipulation, and the effect of this manipulation is measured on the outcome(s) of interest. In a quasi-experiment participants are naturally members of one of the experimental conditions, without the possibility for the researchers to control the allocation of participants. In true experiments there is full randomization, that is, the participants are randomly assigned to one of the conditions of the experiment. Random assignment enhances comparability between groups as it diminishes the effect of confounders and, therefore, positively affects the internal validity. Although quasi-experiments can lead to undesired differences between conditions, in practice full randomization is not always possible. Throughout the paper, when we use the term ‘experiment’ or ‘experimental design’ this includes both true and quasi- experiments.

A different typology of experiments is the distinction between lab, field, and natural experiments [4]. Laboratory experiments are conducted in highly controlled environments, where researchers manipulate one or more independent variables to create the conditions and subsequently measure the effect of the conditions on the outcome of interest. While laboratory experiments have high internal validity due to the high level of control, they typically have low ecological validity, because the lab situation may be different from daily life [5]. Alternatively, field experiments are conducted in the everyday environment while still allowing the researcher to manipulate the experimental conditions. Therefore, in field experiments there is less control over extraneous (confounding) variables, potentially lowering internal validity, but have higher ecological validity [6]. Finally, in natural experiments there is no manipulation controlled by the researcher at all, instead naturally existing groups, that is, conditions, are compared on the outcome of interest in everyday life. Although this can provide valuable, practical information with high ecological validity, there is a higher risk of confounding variables.

Irrespective of the type of experiment the researcher must define the intended outcome, that is, the change as is defined or measured within the study that is assumed to be the consequence of being exposed to an intervention or manipulation. We will refer to the intended outcome throughout this paper as the “intervention effect”, a term that can refer to quantitative or qualitative outcomes.

In quantitative research the outcome is a measure that reflects the construct intended to be influenced by the intervention. This is generally done by providing operationalizations of the key constructs of interest as quantitatively measurable variables, that are subsequently analyzed by quantitative methods. Using the empirical-analytical paradigm, common in behavioral sciences, one assumes that the universe can be explained through researching empirical data, where a condition for the research is that it is reproducible, replicable, systematic, and transparent [7]. To achieve these goals a form of reductionism is implied.

Reductionism in research is the process of simplifying a situation to scores on a small number of variables [8]. Since human behavior is intricate, researchers can engage in using reductionism by bringing complex human beings down to scores on variables. Through reductionism it is feasible to understand if some variables are related, however, quantitative research seldom explains the exact process under investigation. To fully understand human behavior one may need to study the larger context, that is, the complex human being as a whole. Research that focuses on small segments of reality, as is done with quantitative analyses and reductionism, may lack ecological validity by not considering all aspects of the real-life situation and the complex human being of interest.

We therefore argue that it can be beneficial to utilize a more holistic approach for the evaluation of results in experimental designs, especially in the context of behavioral sciences. Holistic approaches are typically associated with an interpretist point of view, and regularly apply qualitative methods [9]. It builds on the idea that human beings and their behavior and motivations are complex and thus need to be studied without reducing them to some numerical variables. Adopting a more holistic view enables researchers to more fully understand the phenomenon of interest and as such can positively influence the ecological validity of a study [10]. Next to the holistic approach, qualitative research is often associated with an inductive goal, i.e. theory forming, which can be achieved through an abductive approach [11]. Abductive processes are by nature iterative, as they follow sequences of formulating preliminary explanations based on data which will be tested within the same study with new data. The use of qualitative methods is rather common in several fields of research, and there is much literature available discussing the ways to conduct qualitative research e.g. [1215], with a clear focus on idiographic knowledge: thoroughly understanding the individual. Since understanding the individual is the focus of much qualitative research, there is little literature on using a qualitative approach specifically for the evaluation of experimental designs in behavioral science.

Some papers offering guidelines for qualitative evaluations of experiments are found in humanities [16], management [17], and political science [18]. To the best of our knowledge, there are no methodological papers in the field of behavioral sciences, but there are a few examples of studies that apply qualitative methods for the evaluation of experiments [1921], and often they quantify the qualitative results [22,23]. These studies offer little to no methodological guidance for other behavioral researchers to apply qualitative methods in their experimental studies.

With this methodological tutorial we aim to provide such guidelines. We provide a step-by-step approach for future researchers aiming for a qualitative evaluation of an experimental design, supported with an example. This worked example describes a field study that investigates the effect of empathy-inducing interventions on the willingness to adhere to social distancing rules during the COVID-19 pandemic. De Ridder et al. [24] set-up an experimental (quantitative) study for this goal, that provided the framework for another study executed independently from de Ridder (Glebbeek et al. [25]; from here on referred to as GLS). GLS used qualitative methods to evaluate the effect of the intervention following a constructivist grounded theory approach and this study is used as a worked example to develop and illustrate the general methodological guidelines presented in this paper.

The rest of the paper is organized as follows. First, we shortly describe those aspects of the study of De Ridder et al. [24] that also apply to the qualitative study by GLS,that is the design and intervention, followed by the motivation for the GLS study. In the next section, we present the methodological guidelines for qualitative evaluation of an experiment, structured in four stages of the research process: defining the research goals, data collection, data analysis, and determining the intervention effect. A separate subsection describes an overarching fifth stage, which partains the process of reflecting and adjusting during the other stages. The paper concludes with a discussion of the merits of (additional) qualitative evaluation of experimental effects compared to a purely quantitative analysis. Finally, we provide recommendations for researchers also considering a qualitative evaluation of an experiment.

1 Context of the worked example

In 2020, in the Netherlands, De Ridder and colleagues investigated the effect of an empathy-inducing intervention on the willingness to adhere to the social distancing rules during the COVID-19 pandemic, using a quasi-experiment. One of the measures taken by the Dutch government during the pandemic was social distancing, which included keeping 1.5 meters distance between people.

The field study employed an A-B design [26] lasting for 6 weeks, with three sequences of a control week (A) and an experimental week (B) at three campus locations (square outside college hall, main entrance lecture hall, and entrance lecture rooms). The same approach, but with four instead of three sequences of A-B weeks, thus 8 instead of 6 weeks long, was executed a second time four months after the initial wave. In the data-analysis the researchers assumed this to be a between-subjects design, though realistically participants could have been present at multiple occasions throughout the weeks, but without personal identification this could not be detected.

The intervention consisted of prompts to induce empathy and contained three elements: (1) a social robot encouraging people to keep distance; (2) posters of student and staff models with a text expressing a prompt for empathy-based distancing (e.g., ‘I have asthma. Keep your distance for me’); and (3) a rail of movie clips of the same models used in (2) with the same texts shown on screens. To determine the effects of the empathy prompts for promoting distancing, De Ridder et al. [24] used the actual distance between people which was measured using camera recordings. The results of the quantitative study can be found in the paper by De Ridder et al. [24].

Whereas the camera recordings provided quantitative data on the actual behavior, it does not provide insight in reasons or motivations to (not) adhere to the social distancing rules. To further investigate if and how the intervention had an effect on those aspects, GLS added a qualitative inquiry using the context of the main study. Certain aspects of the qualitative study were thus set in stone, that is, the intervention and the data collection in the two waves of A-B sequences were predetermined. All other choices were open, providing a good playing field for our study, that is, investigating how to perform a qualitative evaluation of results from experimental designs in the context of behavioral research. While the GLS study is an example of a quasi-experiment, the lessons and guidelines can be applied directly to true experiments as well. Note that the details and empirical results are reported in Glebbeek, Leplaa, and Soppe [25], while we use the qualitative study as a worked example to derive and illustrate the methodological guidelines presented in this paper.

The studies of De Ridder et al., and GLS as an addendum, were approved by the Ethics Committee of the Faculty of Social and Behavioral Sciences, Utrecht University (file number 20-479). Our methodological tutorial was exempt on the basis of not collecting nor handling data ourselves. We did however have access to the data for research purposes from 2019 through 2023, while the data were anonymized so it could not be brought back to individual participants.

2 Methodological approach

In this section, we provide a step by step approach developed and illustrated by the GLS study. We explain the methodology followed in GLS, the rationale behind it, and the lessons learned throughout the study.

To ensure transparency and rigor in research, it is common practice to preregister studies and make data open. Also, for the context of qualitative evaluation of experiments these are important considerations before executing the study.

In the context of qualitative studies, preregistration is a debated issue with proponents (e.g. [27]) and opponents (e.g. [28]). Although the discussion of compatibility of qualitative research and preregistration is beyond the scope of our study, GLS decided to preregister their research to formally document their intended approach before starting the study. The preregistration document can be accessed on the Open Science Framework: https://archive.org/details/osf-registrations-knb2d-v1.

Open data is also an essential aspect of scientific research as it enhances transparency, accountability, and reproducibility. However, the application of this practice in qualitative research is also debated [29,30]. It is argued that openly sharing qualitative data may result in a loss of context and can compromise confidentiality [29]. Despite the impossibility to perfectly interpret the data due to the loss of context, GLS decided to make the data publicly available through https://osf.io/e4zfw/https://osf.io/e4zfw/ for the purpose of transparency.

After addressing the considerations of preregistration and open data, the actual research can begin. In Fig 1, we outline the stages involved in executing a qualitative evaluation of results in experimental designs. While the first stages of defining the research goal, collecting data, and data analysis are common to all empirical studies, the vertical stage labeled ‘reflecting and adjusting’ is more characteristic of qualitative research and is essential to its success. The arrows pointing upwards on the right-hand side of the figure highlight an essential aspect of qualitative research - the need for reflection and adjustment throughout the research process. It is common in qualitative research to revisit previous stages. This not only improves the quality of the data but also enhances the final analysis and conclusions drawn from the study, including establishing the intervention effect, as shown in the final stage at the bottom of Fig 1.

Fig 1. Visualization of the methodological approach of a qualitative experimental study.

Fig 1

In the following sections, we will provide a detailed description of each stage shown in the figure. We will highlight general considerations that are important at each stage, as well as the specific choices made in GLS.

2.1 Defining the research goals

In order to conduct a successful qualitative evaluation of an experiment, it is crucial to define its goals explicitly. The open and iterative nature of qualitative research makes clear goal-setting even more critical, as researchers often move back and forth between various stages, which can lead to scope creep, that is, a gradual shift away from the original research goals [31]. Therefore, formulating unambiguous goals and regularly reviewing them can help maintain the intended focus, or shift the focus by updating the goals in the desired direction.

Defining clear research goals is particularly important in studies of intervention effects and requires a clear description of the intervention under investigation and the outcome of interest, including how, when, and where the effects are being examined. In some studies intervention effects could be evaluated exclusively through qualitative data: researchers could decide that the qualitative methods provide all the required information to answer the research question. In other studies, intervention effects could also be studied with the qualitative data as an addition to quantitative data within the same study. The latter situation requires that the qualitative component adds value to the quantitative study, for instance, through triangulation or complementing [32].

In both scenarios, the qualitative investigation of potential intervention effects allows for a more comprehensive understanding of behavior and motivation compared to quantitative research [8,10]. This benefit can however only be achieved with in-depth knowledge of the field.

Therefore, a thorough review of existing literature is required to frame the study and ensure its relevance [33]. Familiarity with leading theories and concepts in the relevant field can lead to the identification of sensitizing concepts, which are concepts found in the literature that could be elements of the answer to the research question [13]. Sensitizing concepts can be used to design the study, including data collection and analysis, and can be valuable in defining the research goal. While these concepts are not final, as the definitive concepts and theoretical model will be drawn from the data, sensitizing concepts can guide the study towards achieving its goals.

GLS provides an example of adding a qualitative element to a quantitative study. While the quantitative study focuses on the actual distance between people, which is just one aspect of behavior, the qualitative study aims to provide an understanding of more aspects of behavior and uncover the reasoning behind it. Empathy was identified as sensitizing concept and the literature search showed that empathy comprises multiple aspects, these are emotional simulation, perspective-taking, and emotion-regulation [34]. Understanding these different facets is crucial for being sensitive to the various ways in which empathy can manifest in the study.

The research goal GLS defined for the qualitative study was to achieve a holistic understanding of both the behavior and the motivations for this behavior of participants in the context of COVID-19-related measures. Specifically, they aimed to identify the strategies that participants employed to adhere to the COVID-19-related measures and the motivations behind their choices. This allowed them to test the underlying theory on the effectiveness of prompts, with a predominantly inductive approach instead of a more narrow assessment of the behavioral outcomes.

2.2 Data collection

In qualitative research, just as in other studies, data collection needs to be carefully planned to ensure that the chosen methods generate data that can effectively address the research question. It is both advisable and considered a standard practice in qualitative research to initiate data analysis as soon as part of the data becomes accessible. This proactive approach not only encourages critical reflection on the research process but also facilitates timely adjustments to data collection methods when needed. Typical data collection methods in qualitative research are observations, interviews, focus groups, and archival data [35]. It is important to note here that in the context of an experiment, the content of the intervention might limit the options for plausible data collection methods. Perhaps the most common methods, and what GLS used in their study, are observations and interviews. Therefore, we will focus on discussing the strengths and limitations of these two methods.

Observations are particularly valuable as they enable the study of naturalistic behavior and covert field observations can achieve a high level of ecological validity (e.g., [36,37]). Two types of observational studies are distinguished: systematic and non-systematic. Systematic observation involves using a pre-defined checklist to quickly and accurately record behavior with high inter-rater reliability [38]. However, it may not offer the holistic understanding, since what is being observed is determined beforehand. In contrast, non-systematic observation allow researchers to capture all relevant aspects of the situation in their field notes, which provides a detailed description of the behavior and its contextual setting. This approach can result in a more thorough understanding of the behavior and the factors that contribute to it. However, non-systematic observations are more prone to observer effects, that is, lower inter-rater reliability [37].

Interviews are a valuable research tool for gathering detailed information on the experiences, rationales, or attitudes of individual people [39,40, e.g., ]Dearnly2005,Evers2015. While interviews may not provide information on naturalistic behavior, they can offer insight into intentions and experiences. Two main types of interviews are distinguished: open and semi-structured. In open interviews only a starting topic is selected and the interviewer and interviewee discuss what seems relevant to them. Open interviews can provide a holistic understanding of the individual, allowing the interviewee to address topics the researchers may not have considered. However, the lack of structure in open interviews can lead to some topics being overlooked, creating differences between interviews, which is an example of an interviewer effect that may limit the quality of data obtained [40].

Semi-structured interviews on the other hand follow a more predictable pattern by pre-determining to a greater extent the content of the interview, which enhances comparability across interviews but may limit exploration [40]. To structure such interviews and ensure comparability, it is recommended to use a topic list that includes the topics to be discussed, their order, and specific wording of questions [41]. Such a list can help focus on relevant information and reduce unwanted researcher bias [42]. It also enables researchers to ask relevant questions that are more likely to elicit valuable data on the relevant concepts.

In qualitative research, one common practice is to employ method triangulation, which involves using multiple types of data within a single study [43]. By using method triangulation, researchers can study the phenomenon of interest from multiple perspectives, which ultimately enhances their understanding of it. Method triangulation allows researchers to choose data collection methods that complement each other’s strengths and weaknesses. For instance, while interviews may provide insight into intentions, observations may provide information on actual behavior.

In GLS, the researchers utilized both systematic and non-systematic observation methods to investigate whether individuals adjust their behavior in response to prompts and to identify circumstances in which individuals adhere to or violate COVID-19-related measures. For systematic observations, a simple checklist was provided to record whether individuals seemed to notice the prompts, kept their distance from others, and if they walked alone or in a group (recording group size). Non-systematic observations were included through field notes, that provided a more detailed description of individual behavior, their relationship with potential companions, and changes in behavior related to adherence to COVID-19-related measures. Participants were informed about data collection through signs with explicit information about the research. In addition the project’s purpose and procedures were thoroughly explained on the university website.

Six researchers conducted these observations over a seven-week period in the first wave of data collection. Assigning the same task to multiple researchers is known as researcher triangulation and will reduce the influence of individual preconceptions [44,45]. A pilot week was used to determine the observation schedule and test the checklist, followed by three A-B iterations. The A-weeks served as the control condition with no intervention, while the B-weeks were the experimental condition. Topic lists for both weeks are included in Appendix A and Appendix B. A total of 52 hours of observations were conducted for the control condition, and 70 hours for the experimental condition, observing a total of 1002 individuals. The sample consists predominantly of highly educated adolescences, since the study location is almost exclusively visited by students and employees of the university and the university of applied sciences.

In addition, GLS used method triangulation through short interviews that complemented the information from the observations. The same six researchers conducted 5-minute interviews while walking with the people they observed, in the last week of data collection of the first wave. The researchers used a topic list to guide the conversation aiming for information about the motivation for their behavior, added in Appendix C. A total of 42 participants were interviewed, equally split between the control and experimental group. The interviews were transcribed intelligent verbatim to ensure the preservation of meaning [46].

The second wave of data collection was announced at a later stage and took place half a year after the first wave. The duration of the second wave of the study was eight weeks in total and consisted of four sequences of a control and an intervention week. The intervention was the same as in the first wave and was also placed at the same locations. The second wave enabled GLS to collect the data needed to answer the research question and the design was informed by the results from the first wave since data analysis was already conducted for that part of the data. Both the short and the long interviews were preceded by asking respondents for consent to record the conversation and to use the collected data for scientific publication. Their consent was also audio-recorded.

The key objective for the qualitative data collection in this wave was to gain a better understanding of the motivations and strategies of participants. To obtain more in-depth data, the researchers conducted longer interviews (approximately 45 minutes) with a predefined topic list to enable a comprehensive understanding of the processes that potentially influenced the participants’ behavior. The topic list was based on the data collection of the first wave and contained topics like Social life, Home situation, and Study-related contact. For each topic, a starting question was predefined (e.g., ‘Can you describe your recent interactions with fellow students and lecturers?’) as well as some follow-up questions (e.g., ‘How does this differ from before corona?’) and subtopics (e.g., agreements, activities, group composition). The topic list is added in Appendix D.

The interviews in the second wave of data collection were done by five researchers at one of three locations: a building near the interventions, another university building away from the interventions, and online. Of the 39 participants, six were interviewed in the intervention-weeks and near the intervention location and were therefore considered part of the experimental group. The control group consisted of 33 participants who were interviewed online (3), on a location away from the intervention (13), or near an intervention location but during a control week (17). The participants of both types of interviews were all students at the university or the university of applied sciences. Besides three non-Dutch participants in the control group, all participants were Dutch natives. While participants could have been the subject in multiple observations, no participants were interviewed more than once. The interviews were transcribed intelligent verbatim in order to stay true to the words of the respondent.

By using non-systematic and systematic observations as well as short interviews and semi-structured interviews, GLS aimed for a comprehensive overview of the potential effect of the intervention. By combining data from these different sources, they expected to be able to describe the relation between behavior and underlying motivations, attitudes and intentions with respect to COVID-19-related measures and compare results between groups with and without exposure to the intervention.

As stated earlier, it is common practice to start initial data analyses simultaneously with collecting data in qualitative research. This can offer valuable insights and enables researchers to improve their data collection methods or identify confounders not anticipated beforehand. Before we discuss the qualitative data analysis in the next section, we first elaborate on the process of reflecting and adjusting that is typical in qualitative research and is needed to ensure the quality of both the collection and the analysis of data.

2.3 Reflecting and adjusting

Maintaining rigor throughout the research process is critical to ensuring the overall quality of the research. Four critical aspects of rigor include dependability, confirmability, transferability, and credibility [43,47]. Dependability and confirmability emphasize the importance of minimizing the researcher’s influence [48]. Rigorous studies should produce objective findings that are not influenced by the researcher’s preconceptions (confirmability) and are consistent across different contexts and time frames (dependability). Credibility and transferability focus more on the quality of the findings within their specific context [48]. Rigorous results accurately reflect the truth value and the participants’ perspectives on the phenomenon of interest (credibility) and are applicable in a wide range of situations (transferability).

Qualitative research can ensure rigor through various methods, including applying thorough data collection methods, data triangulation, researcher triangulation, and reflexivity. While the first three have been introduced earlier in this paper, reflexivity is a term that has not been discussed yet.

Reflexivity can target diverse facets of the research process. In order to address all pertinent dimensions, we categorize it into two distinct forms: personal and epistemological reflexivity. Personal reflexivity addresses the unintended influences of the researcher’s values, beliefs, social identity, and experiences on the study [49]. It also acknowledges that researchers are active participants in the study, and their biases can shape the way they perceive and approach the research world. For instance, researchers may be more likely to note down observations that align with their expectations, and overlook or disregard unexpected findings. Being aware of these preconceptions can help researchers identify their unwanted influences on the data and findings.

Epistemological reflexivity on the other hand involves an examination of the potential impact of the study’s design on the research outcomes [50]. Epistemological reflexivity enables researchers to reflect on questions such as, “Could alternative methods be employed to investigate the research questions?” or “How does the study’s design impact the data and findings?”. Engaging in epistemological reflexivity allows researchers to critically evaluate the data’s origin and how the study’s design has influenced specific findings. Data collection can impact the direction of research, and epistemological reflexivity can help researchers recognize this and make necessary adjustments.

Saturation is another critical term in qualitative research, referring to the point where new data does not yield any new information [51]. Achieving saturation is often a goal for researchers as it allows them to determine when data collection can stop. To achieve saturation researchers employ theoretical sampling, where participants are selected based on their potential to advance the research [52]. This approach requires analyzing data during data collection to determine the characteristics of subjects that can further the research. Researchers can focus on negative cases, those that differ significantly from the majority of cases, to ensure the study’s completeness [53].

At the beginning of a qualitative study, it is often unclear what constitutes sufficient data, that is, enough data to achieve saturation. Also the research focus may shift throughout the study. Therefore, it is recommended to commence data analysis as soon as part of the data is available and continuously reflect on the research process, making adjustments as necessary, as this is inherent in qualitative studies.

GLS employed multiple measures to ensure the rigor and quality of their research. Firstly, they utilized method triangulation by conducting both observations and interviews to increase the credibility and reliability of their findings. Additionally, they employed researcher triangulation during data collection and analysis to reduce the potential for bias, that is, confirmability, and increase objectivity. They also used theoretical embedding to facilitate the interpretation of the results by incorporating sensitizing concepts.

To achieve data saturation within the predetermined fixed time available for data collection, GLS collected as much data as possible. They used theoretical sampling to increase the efficiency of data collection by providing their researchers with specific characteristics to look for, such as non-romantic couples or larger groups walking together, and by paying attention to negative cases. Initially, GLS recorded participants’ gender and age during the interviews. However, it soon became evident that these demographic characteristics were not analytically relevant to the aims of the study, as no meaningful patterns related to demographics emerged from the data. Instead, the social contexts in which participants operated proved to be of greater relevance. Consequently, GLS discontinued the systematic documentation of demographic variables and redirected their focus toward aspects more pertinent to the research objectives.

They also incorporated various methods to promote both forms of reflexivity in their study. They held regular meetings with different groups of researchers to discuss the collected data and preliminary analyses. Additionally, they conducted weekly meetings with researchers to monitor the progress of data collection and review the field notes. Through this process, GLS identified that the field notes lacked sufficient detail on certain aspects that the observers took for granted, such as the relationships between people walking together. To address this issue, they instructed the observers to take more detailed factual notes to enrich the data.

Finally, through epistemological reflexivity, GLS recognized that the data collected during the first wave was insufficient to answer their research question thoroughly. Rather than adjusting the research goal, they decided to use the second wave of data collection to gather the additional data required, hence the decision to use semi-structured interviews, with the topic list representing the relevant themes to discuss.

2.4 Qualitative data analysis

Throughout data collection, the (initial) analysis of data and constant reflecting and adjusting play a pivotal role. Also within data analysis, the stages presented in this section are subject to constant reflection and potential going back and forth between them. Nevertheless, we structured this section in three parts, describing subsequent stages, to enhance clarity. In the first subsection, the preparation phase is discussed. Initial choices concern how data are formatted, what software is used, and which analysis framework will be applied. We will emphasis the role and impact of such choices specifically for the context of experimental qualitative data analysis.

Given the choice for a constructivist grounded theory approach, the actual analysis contained the phases of open and axial coding (second subsection) and the final phase of selective coding (last subsection). Selective coding results in the construction of a theoretical model. GLS initially only had data from the first wave and processed and analyzed it up until the selective coding phase. With the realization that information was lacking for the goals at hand and with the opportunity of a second wave of data collection, the final analysis used data from both waves. So, although the second wave data was needed to be able to construct the final theoretical model, it is important to note that all data undergo rigorous processing throughout all stages of data analysis. Illustrations of this process will be provided throughout this section.

2.4.1 Preparations.

Some considerations and choices need to be made before starting the actual data analysis. Not only are there different analytical approaches and several software packages for qualitative data analysis, there are also different ways to make group comparisons when using qualitative data to evaluate effects in experimental designs. An important aspect in making some of these choices is the outcome that the study at hand aims for, as determined by the type of research question that was formulated. Kearney [54] makes a distinction between descriptive, exploratory, and explanatory research questions. Descriptive and exploratory questions aim for a relatively superficial understanding of aspects or themes at play in the situation being studied and typically result in models that remain close to the data. In contrast, explanatory research aims to provide a more thorough understanding of the process under investigation and often leads to the development of more abstract models, for which the data is transformed to a higher degree [55]. An experiment is to some extent inherently explanatory, since the main goal is to evaluate if an intervention has effect on an outcome. However, the outcomes of interest can be more or less precisely defined, a priori, and this determines whether the study is more exploratory, descriptive, or explanatory.

An often used paradigm in qualitative studies is constructivist grounded theory (CGT) as developed by Charmaz [56]. The CGT approach aims to represent the perspectives of the participants and therefore fits well with behavioral research. The data is processed in a structured, step-by-step approach in such a way that the data speaks for itself. Some qualitative researchers argue that the high level of structure of CGT limits the creative space characteristic of qualitative research [57]. Other approaches, such as discourse analysis, narrative analysis, and phenomenology, are less structured than CGT and can be more interpretative. Discourse analysis seeks to explain behavior from an outsider perspective [58], narrative analysis focuses on reconstructing complete stories [59], and phenomenology aims to describe experiences [60]. Depending on the context of the research each of these approaches could be used for the evaluation of experiments.

The treatment or intervention of interest creates groups in the data. In a qualitative analysis of an intervention effect, one can incorporate this grouping in two ways. First, all data can be analyzed as if coming from one population (e.g. [20,61]). In this case, the last phase of the analysis is comparing the resulting model between the groups. The description of a potential intervention effect is in terms of certain parts of the model originating from one group and not the other, or in terms of certain relations being more prominent in one of the groups. This approach may be the preferred choice when the research question is merely descriptive or exploratory; it may reveal unexpected findings, such as themes that are entirely missing from one subgroup or a process that is not part of the other. Alternatively, separate analyses can be done for the treatment conditions (e.g. [17]). Then distinct models are built for the data from each of the groups and the final phase consists of comparing the resulting models. This approach is suitable when it is assumed that groups differ inherently and independent processes are expected to be evaluated. This approach seems more natural for studies where aims and questions are of a more explanatory nature.

Irrespective of the analytical approach and the way the grouping structure is incorporated, appropriate software is required to systematically analyze the data. Various computer-assisted qualitative data analysis software (CAQDAS) programs are available and assist researchers in structuring their analyses by identifying (sub)groups in the data, structuring codes, creating separate models, and running queries. Well-known CAQDAS programs include MAXQDA [62], Atlas.ti [63], and NVivo [64], among others. All these programs provide functionalities such as coding, compatibility with multiple data types, specification of demographics per document or case, memo writing and linking, query usage, and project file sharing. Researchers can choose a CAQDAS based on their specific needs, familiarity with the software, or availability.

GLS used CGT to answer the explanatory research question on how the empathy-inducing nudges of the intervention program potentially influenced the decision making process with respect to adherence to the distance regulations. NVivo Release 1 [64] was used for the data analysis, since it satisfied their criteria and was readily available through a campus license. Within NVivo, they distinguished between the control and intervention groups by creating attributes for each case, including control or experimental group, day, time, interviewer/observer, and week.

Despite the explanatory nature of the research question, GLS assumed that similar processes would be present in the control and experimental conditions. They therefore choose to analyse all data together to create one overall model. However, the intervention could lead to different weights in parts of this model, for instance, certain concepts or relations may be only present in one of the groups, while other parts of the model may be more prominent in one group than in the other. If such group differences emerge, GLS can conclude that the intervention imposing empathy-inducing nudges triggered different processes and thus was effective.

2.4.2 Open and axial coding.

In the remainder of this section we will describe the actual data analysis given the choice for a CGT approach. In the grounded theory tradition, three phases in the coding of data are subsequently conducted: open coding, axial coding and selective coding [33,65].

During open coding, the aim is to identify and list all topics presented in the data by allowing to let the data speak for itself. Preconceived notions may interfere with the analysis and, therefore, it is important to avoid imposing structure at this point. To ensure the quality of open codes, researchers should keep an open mind and be prepared for unexpected themes to emerge. However, to fully understand the phenomenon of interest, it can also be beneficial to incorporate the earlier identified sensitizing concepts in this phase of coding. Although the sensitizing concepts are typically too broad to include directly, they can be translated into so-called a priori codes [66]. A priori codes are more specific then sensitizing concepts, and provide a link between data and theory. The use of a priori codes enables identification of the specific ways in which sensitizing concepts might occur in the data and helps interpret the data. An example of a sensitizing concept could be empathy, which is a theoretical, abstract construct. Participants will most likely not mention or express this construct directly, so a priori codes are needed. Examples of a priori codes for empathy could be ‘protecting a loved one’ (for interview data), or ‘steps back so other people can join a conversation’ (for observational data).

Effective open coding involves adhering to the 3C’s: content, context, and coverage. This entails applying labels that describe the content of a fragment, which spans enough data to capture the context, while ensuring that the created codes cover all topics in the data [67]. The open coding phase continues until saturation is reached, that is, until adding new data does not lead to new open codes [68].

The next phase is axial coding, which aims to structure the data and identify latent themes by exploring the relationships between the open codes that emerged from the data [69]. Although sensitizing concepts can be used to provide an initial structure, it is important to remain open-minded and avoid tunnel vision. By exploring different possible structures, researchers can gain a more nuanced understanding of the various layers in the data and potentially uncover new perspectives that were not previously considered. Throughout this process, the method of constant comparison is applied to ensure that themes are consistent across the data [70]. This involves checking how themes are present in the data of other participants and how they may differ in terms of their characteristics.

Axial coding results in a structured code tree, with well-defined main codes (themes), and relevant sub-themes and open codes placed underneath the overarching themes. Before discussing the next phase in the data analysis, we will first continue with a description of the initial analyses in GLS, and present some additional considerations that are important before moving to the third phase of analysis: selective coding.

In GLS, the open coding of the data from the first wave was conducted by six researchers ensuring researcher triangulation. Although GLS considered creating a priori codes for the sensitizing concept empathy, they decided against it to avoid over-fitting the data and over-interpreting the findings. Codes were created to describe the content, selecting fragments to capture their context, and cover all themes in the data as prescribed by the 3C’s approach. For instance, they produced codes such as Waits for passersby to pass, Avoid putting entire house in quarantine, Keeping your distance with friend is weird, and Avoids grandparents due to health issues. Saturation was achieved after coding approximately 60% of the observations and 80% of the short interviews.

The axial coding phase was carried out by two of the six researchers doing the open coding, and two additional researchers. In this phase the sensitizing concept empathy was applied as one of the themes, since they wanted to investigate its impact on adhering to COVID-19-related measures. Data from the observations served well for the identification of behavior and behavior changes. The data from the short interviews was supposed to add insights into participants’ strategies and motivations, but in this phase of analyzing the data it became clear that the short interviews provided insufficient information to fully address their research goals. With the current data GLS could only provide a description of visible behavior, the direct influence of the nudges, and a very general enumeration of considerations for the behavior. It became clear that an in-depth understanding to explain which motivations were active in both groups, and how that led to certain strategies to behave in relation to COVID-19-measures, would not be achieved.

In such a situation there are two options: collecting additional data (if possible), or reporting the insights gained from the available data, even if the original questions cannot be fully answered. GLS were fortunate that a second wave of data collection was announced, allowing them to collect more in-depth data and to pursue a complete answer to their original research question.

Resuming the data analysis with the additional data from the interviews, in the axial coding phase they initially still used the overarching theme of empathy to view the data through this lens. With the data from the second wave however, at some point they decided to drop empathy as a sensitizing concept. It appeared to lead to over-fitting the data, since most themes could be interpreted as sub-themes of empathy. Excluding empathy enabled the researchers to gain a more comprehensive understanding of the data.

The new data provided more context and explanation to the observations’ results and supplemented the information acquired from the short interviews. To address the research goal, two main themes were formulated: Motives and Strategies. Several sub-themes were identified that represented aspects of these main themes, e.g., ‘no fear’ under motives, or ‘use of self-tests’ under strategies. A further illustration of part of the coding scheme is provided in Table 1 and shows how themes and underlying sub-themes emerged from observations and text fragments from the interviews. Such a visualization of results forms the basis for the final phase, that is, selective coding.

Table 1. A selection of subthemes, open codes, and fragments to serve as illustrations for the coding scheme resulting from the axial coding phase.
Theme Subtheme Open code Fragment
Motives Influence others Mother sees little risk in being close ‘Well my mother, she works at home. And she only goes to the supermarket and back. There is so little risk of being infected.’ ‘That is how I rationalize it: she is vaccinated, she sees barely any people, I don’t see many either. We think it is okay.’
Keeping friends at a distance feels weird ‘It is so weird to keep your distance if you are not used to doing that. That is why we don’t do that.’ ‘If I wanted to [edit: keep distance between friends], it is not realistic.’
“Hugger” kept giving hugs ‘One friend was always used to giving hugs. And he kept doing that. But with other friends we give a box.’
No fear for COVID-19 Risk of getting infected is low ‘For as far as I can tell, the risk for us of getting infected is low.’ ‘My friends, the four I regularly see, are very calm, timid, boys. They don’t go out or do crazy student things. So if you are at home, the chance is much lower of getting infected. I take that risk.’
Risk of infection is low ‘I am still young, I am not afraid I will get really sick from the virus.’
Strategies Use of self-tests Tests regularly ‘I meet a small number of different people, most of whom are vaccinated by now, and I test regularly, because I received a self-test-thingy at the University.’
Testing before meeting family ‘Before going home for Christmas I did a self-test to be sure that I had not caught COVID-19, so I could go home without too much risk.’
Limited number of people gathering Mostly one-on-one meetings ‘I meet friends mostly one on one.’ ‘We had the rule to only allow one external person in the kitchen at any time.’
Avoiding large gatherings ‘We are relatively safe, we do not organize or attend any large illegal parties.’ ‘Well I invite less people to my home.’

2.4.3 Selective coding.

Selective coding is the third and final phase of data analysis in the grounded theory tradition [71]. This phase is the culmination of the study process, which allows the qualitative researcher to engage in a dialogue with the data to search for the most accurate representation of the truth value. Its main objective is to develop a theoretical model that explains the phenomenon under investigation.

The outcome of axial coding, which includes the code tree, definitions, and memos generated during the open and axial coding phases, serves as the foundation for building the theoretical model in the selective coding phase. The selective coding process involves identifying the relationships between the themes that emerged from the previous phases, such as developing a chronological story, typology, or identifying which themes can or cannot coexist [13]. Providing clear instructions for this phase can be challenging, as the approach may vary depending on the research situation. A common strategy is to identify a core category that connects the themes previously identified [70]. This strategy is particularly useful when multiple models are created and each sub-population needs to be analyzed, as is typically the case in experimental designs. The core category acts as the missing piece that connects the different models and explains differences between the sub-populations. This will be further elaborated in the subsection about evaluating the intervention effect.

Throughout selective coding the process of constant comparison is applied [13]. With constant comparison researchers continuously check how modifications to the model work for other, already analyzed, data. A potential pitfall to qualitative research is to let the final round of data collection be the deciding factor for the final model, that is, the patterns present in the last observed data are used to determine the definitive descriptions. Through constant comparison, each addition or adaptation of the model is verified with earlier data. This process is essential to ensure that the final model correctly represents the participants’ views.

The outcome of selective coding is a theoretical model that explains how participants perceive the phenomenon or process of interest [55]. The resulting theoretical model should adhere to four key principles: they should fit within the field of study, be understandable to those in the field, be generalizable, and provide some level of control to the researcher in the form of understanding the process or phenomenon of interest [68]. By setting up the study as described in the previous sections, the results acquired should meet these criteria. The use of sensitizing concepts helps fit the theory within the existing literature, while theoretical sampling ensures generalizability. The CGT coding procedure ensures that the resulting theory is understandable and provides control to the researcher.

To illustrate the process of creating and verifying a theoretical model from the qualitative data, we present the results of GLS in Fig 2. Here, we zoom in on a few aspects of the model, to give an impression of the process and result of selective coding. Note that in this phase they did not distinguish between participants in the intervention or control condition; they built one overall model for all data obtained.

Fig 2. Theoretical model as a result from the selective coding, where forms of behavior are explained by different kinds of circumstances, intended behavior, and considerations.

Fig 2

In GLS, the stories of participants demonstrated some common ground that helped understand different aspects of their behavior and motivations. For instance, the researchers observed a recurring pattern in the participants’ explanation of circumstances that affected their choices for actual behavior. From observations and interviews it became clear that social interactions are context-dependent, that is, behavior does not only depend on intentions (based on reasoning about measures and the motivations to adhere to them) but also on, for instance, how friends or other contacts behave, what their intentions are, and on perceived risk of not behaving in the intended way (see Fig 2). To give just one illustration, the following quote is exemplary for several similar remarks about the influence of other people’s behavior:

Participant D2505B2: ‘It is sort of legitimized because the people around you don’t do it either.

To verify the conclusion that social interactions are context-dependent, a revisit of all data was made to validate this pattern with the data itself as the method of constant comparison describes. An example of such a verification in the data was given by a participant that described intended behavior depending on who they were with.

Participant D1006O1: ‘When I’m with family, then I’m a bit easier about it [edit: ‘it’ refers to keeping distance]. But if I’m just in the supermarket then I’m a bit more careful with that. So that’s actually kind of where I draw the line. People close to me, that’s where I am very lenient with it. But if it’s just about public places then I am more cautious.

The researchers saw similar things in the observations happening. Most people ignored the station for hand sanitation, but the following happened when people entered together:

Observation M1611D215: ‘PP does not keep distance from either friends. First friend disinfects their hands, they stand close to each other while waiting. When the first friend is done, PP disinfects their hands, and waits until the third person has disinfected their hands as well. When they walk away they do not keep distance.

Similar things happened on a regular basis: when one person of a group adheres to a measure, the others follow their lead. With the developed theoretical model, GLS could more easily classify new data and categorize topics mentioned by the participants.

2.5 The intervention effect

The final phase in determining evidence for a potential intervention effect is the comparison of the intervention and control group. In research where separate models are created for the groups, this means comparing the models on their similarities and differences. This entails describing the processes per group, in order to understand at what point they take meaningfully different paths. Subsequently, the data is again consulted to get a comprehensive understanding of participant’s perspectives in both groups. The final challenge is to describe them accordingly, so the readers can grasp the essential characteristics for both groups.

In research where a singular model is employed, researchers often utilize queries as valuable tools provided by CAQDAS. These queries generate cross-tabulations, which elucidate the frequency of various aspects of the theoretical framework observed within different groups. A basic query involves tallying the occurrences of specific (sub)themes across these groups. Furthermore, researchers can analyze the co-occurrence of themes and discern potential disparities between intervention and control groups regarding the prevalence of theme relationships. However, it is imperative to note that while queries offer initial insights into the underlying processes, a thorough understanding and interpretation of the results within a theoretical framework necessitate continual data analysis through constant comparison.

In addition to quantifying the results with queries, a qualitative assessment is indispensable. Qualitative researchers often describe this process as “engaging in a dialogue with the data”. Essentially, researchers aim to uncover potential relationships using various tools such as queries, memos, and theoretical frameworks. These relationships are then rigorously tested by revisiting the original raw data, a practice akin to the concept of constant comparison introduced earlier.

GLS categorized all data entries as belonging to either the intervention or control group. Table 2 provides an illustration of some comparison between these groups. The researchers chose the themes to inspect partly based on the theory behind how prompting with empathy would influence behavior: specifically the subthemes under motivation. The table displays the proportions of participants from each group who reported a particular theme or combination of themes. By examining these proportions, GLS could gain initial insights into the potential differences between the groups and identify the particular aspects in which they may diverge.

Table 2. Example of query used to process selective coding to crosstabs: proportion of participants mentioning subthemes of strategy, motivation, and relations between subthemes in the control and intervention groups.

Used strategy Control Intervention
Limited number of people .64 .33
Meeting outdoors .24 .33
Use of self-tests .15 .50
Behavior contacts/friends .64 .67
Motivation Control Intervention
No fear for COVID-19 .16 .50
Perceived risk for self .64 .67
Duration of measurements .36 .67
Relations Control Intervention
Perceived risk for self vs. Use of self-tests .12 .33
Perceived risk for self vs. Meeting outdoors .12 .00

Subsequently, the investigated queries prompted the generation of preliminary explanations, which in the context of abductive reasoning [11] can be seen as a form of (data-generated) hypotheses. To illustrate the analytical process, let’s consider the theme of No fear for COVID-19, which was mentioned less frequently in the control group compared to the intervention group (control: .16; intervention: .50). The researchers’ initial expectation was that this difference might have resulted in divergent intentions to adhere to COVID-19-related measures. However, upon revisiting the data, no clear differences in intentions between the groups were observed, that is, the intervention group did not overall seem less willing to adhere to COVID-19-measures than the control group.

To explain the finding that the intervention group, despite mentioning the motivation No fear for COVID-19 more frequently, expressed similar intentions to adhere to COVID-19-related measures, a new preliminary explanation was formulated. GLS speculated that differences in empathy, which the intervention aimed to induce, could account for the absence of divergent intentions. It was theorized that although fewer participants experienced fear for COVID-19 themselves, the empathy-based nudges might have influenced their intended behavior, resulting in a comparable level of adherence to measures. However, upon conducting another reassessment of the data, this preliminary explanation was not supported either. Instead, during this round of data reassessment, two other topics consistently emerged in conjunction with the mention of No fear for COVID-19. It became apparent that participants who expressed no fear for COVID-19 often mentioned employing specific strategies to mitigate the risk, such as using self-tests and limiting their social interactions to a small number of people. The following quote serves as an illustration of this observation:

Participant E1905L1: ‘I have one group of friends that I actually see [edit: without keeping distance] every week. Yes, if they all stay within that group of friends, then it’s not too bad with how many contacts you have’.

The researchers’ final theory was that, notwithstanding minor differences between the groups, no relevant effects attributable to the prompts were observable. To validate this assertion, they conducted a test wherein two researchers were tasked with reviewing two transcripts of semi-structured interviews, while the experimental status was masked, and asked to allocate these to either the control or experimental group. The researchers were unable to accurately allocate the transcripts, supporting the preliminary explanation that the prompts had no relevant effects in this study. A further discussion of the empirical findings of the COVID-19-study is beyond the scope of this paper and can be found in Glebbeek et al. [25].

With this the data analysis is concluded. Our intention was to demonstrate the iterative process of preliminary explanation generation and validation through repeated data reassessment. This iterative approach is vital in answering the central question of qualitative evaluation: how can we describe and comprehend the effects of the intervention on the desired outcomes? By engaging in this meticulous and iterative process, researchers strive to gain a comprehensive understanding of the intervention’s impact on participants’ behavior and the motivation for this behavior. In the context of the quantitative [24] and qualitative [25] studies referred to in this paper, in both studies it is concluded that no support for an intervention effect was found. However, by developing a model based on empirical qualitative data, GLS were able to provide deeper insight into the underlying mechanisms behind the quantitative findings, offering an explanation of the internal processes that contributed to those outcomes.

3 Discussion

In this methodological tutorial, we provided guidance to assist (quantitative) researchers in incorporating qualitative methods into their experiments. Compared to existing literature in this area, we aimed to provide a practical and comprehensive overview of the stages of the research process from start to finish. Furthermore, the focus on the application of qualitative research to experimental designs is a contribution in the field of behavioral sciences that was missing until now. We argued that, by following the stages outlined in Fig 1, researchers can obtain a more holistic understanding of their research problem than with the use of solely quantitative methods. Here, we add Table 3, which summarizes the essential elements for conducting a rigorous qualitative investigation of potential intervention effects in experimental designs, and gives an overview of some key references for these elements. With this paper we aimed to guide researchers who are well-versed in quantitative methods towards the integration of qualitative methods into their study. By providing detailed guidance and an illustration, we hope to facilitate a smooth transition and ensure the quality and rigor of qualitative experiments.

Table 3. The most relevant topics to consider when evaluating experiments with qualitative methods, ordered per stage of the research.

Stage Short description of stage Term References
Research goal In this stage the basis for the research is layed down.The approach is to be determined and the scope of the research is formulated, making use of the strengths of qualitative methods to add value. Scope creep [31]
Holistic approach [8,10]
Sensitizing concepts [13]
Constructivist GT [56]
Discourse analysis [58]
Narrative analysis [59]
Phenomenology [60]
Content analysis [72]
Data collection During this stage, regardless of the approach,the data collection is designed, making sure the data is collected such that it leads to a comprehensive answer to the research question. Qualitative data [35]
Observations [36,37]
Researcher triangulation [44,45]
Data & method triangulation [43]
Interviews [39,40]
Topic list [41,42]
Verbatim transcription [46]
Reflecting & Adjusting It is important to critically assess the rigor of the study. By applying multiple strategies to improvedifferent aspects of rigor, the quality of the research can be held high. Rigor Reflexivity (epistemological) [43,47][50]
Reflexivity (personal) [49]
Saturation [51]
Theoretical sampling [52]
Negative cases [53]
Data analysis As soon as the first data is collected the analysis will start. Essential in the approach of constructivist grounded theory is to keep an open mind for unexpected information. Open coding [71]
A priori codes [66]
Axial coding [69,71]
Selective coding [71]
Core category [70]
Constant comparison [13]
Intervention effect In the final research stage, the comparison between experimental groups is conducted to determine the Types of theoretical models Key principles of theories intervention’s effect, if any. [55] [68]

Previous literature on qualitative experiments in behavioral sciences primarily engaged in philosophical contemplation regarding the suitability of qualitative methods within experimental contexts [16,18], or provided individual empirical reports [, e.g., ]. In contrast, our focus is on highlighting the necessary stages that researchers must undertake to conduct their own qualitative evaluation of an experimental intervention. Where earlier studies focused mainly on the design of the research [17], we discuss considerations and decisions for all stages of the project. Notably, our contribution places a greater emphasis on the application of reflexivity throughout the study and advocates for a flexible and iterative process.

The degree of flexibility inherent to qualitative research is one of the main differences compared to the quantitative approach. Refining the study design throughout the research process adds to the rigor of the research. In quantitative research, there should be a clear a priori plan for data collection and analysis. If it is needed to explore how different elements work, a pilot study can be employed before the actual study, and these pilot data should not be included in the actual analysis. In contrast, in qualitative research adjustments to the study design and analysis can be made as needed throughout the study. Reflexivity should improve the rigor of the study and assist in finding the most reliable and comprehensive answer possible. In the worked example, GLS adjusted their plans several times; e.g., in the second wave of data collection they included more extensive interviews to get a better understanding of the motives and strategies of the participants, and in different phases of the analysis they moved back and forth between ignoring or enforcing empathy as a sensitizing concept or through a priori codes, to search for the right balance between guiding and structuring the data analysis and letting the data speak for itself without giving too much direction.

To elaborate a bit on this last point, the approach to data analysis in general and the sensitizing concept of empathy, specifically, was highly flexible. While theory suggests that a priori codes should be formulated for sensitizing concepts, the researchers chose not to do so, as they were concerned that this could lead to over-fitting the data. At the start of axial coding, they explored the fit of the sensitizing concept with the data, a common practice in qualitative research. Ultimately, the researchers made the decision to discard the concept again at a certain point during axial coding. This shows once more that qualitative studies involve ongoing reflection, iteration, and adaptation: even in a constructivist grounded theory approach, known to be relatively structured compared to other qualitative approaches. The decision-making process should be driven by the research goals and commitment to rigor and transparency.

Two limitations of our study are important to mention. In the second wave of data collection in the empirical example, GLS managed to interview only a small number of participants in the intervention group. Even when invited for the interview near the intervention location, most people preferred the offered option of doing the interview at another time, often incidentally during a control week, or they preferred an online interview. This changed their status to control group participants since they did not encounter the intervention shortly before the interview. Despite observing this throughout the data collection phase, and stimulating the interviewers to motivate people to be interviewed at the intervention location during an intervention week, the final number remained low. It is a clear example of reflecting and adjusting throughout the study (by adapting instructions to interviewers), but in this example it did not lead to more interviews in the experimental condition. We believe that negative impact was manageable and that the data was still rich enough to draw conclusions with regards to the absence of an effect of the interventions. Either way, the worked example did enable a thorough testing of all methodological stages, including the final comparison between intervention and control group, leading to the methodological recommendations presented in this paper. Though a more exhausting experimental group might have enriched the data analysis even further, we think the steps identified would not have changed with more data.

Basing our paper largely on the worked example, however, has another limitation and that is the somewhat narrow look on the different opportunities that qualitative research methods offer. We discussed several approaches in relative depth (e.g., interviews, observation, the CGT approach) because they were applied in the worked example. And we left out others (e.g., focus groups, archival data, case studies) that were not applied in the worked example. In order to be more exhaustive, in Table 3 several additional approaches and methods are listed, including relevant literature references.

Following our recommendations, supplemented by the information in Table 3, researchers will gain the necessary tools to effectively employ qualitative methodologies for evaluating experimental effects. Compared to our example, it will be more straightforward to apply this approach in more controlled experimental settings, like in a laboratory environment. An evident challenge highlighted in the worked example was the variance in participant engagement with the intervention, a common occurrence in field experiments where researchers contend with less control compared to lab experiments. Augmenting or even replacing traditional quantitative methods with qualitative approaches could lead to unforeseen revelations regarding intervention effects.

In conclusion, our paper provides guidelines for the qualitative evaluation of experimental designs. While these guidelines are not all-encompassing, they provide a useful starting point for researchers looking to incorporate qualitative methods in their experimental research. We hope to have convinced readers that sometimes experiments in social and behavioral research can be studied in a more meaningful and more insightful way if one either changes to qualitative methods, or adds a qualitative evaluation of results to the more traditional quantitative way of measuring intervention effects.

Supporting information

Appendix A. Observation list control week.

The translated version of the observation list as was used during the control weeks

(PDF)

pone.0324936.s001.pdf (58.6KB, pdf)
Appendix B. Observation list intervention weeks.

The translated version of the observation list as was used during the intervention weeks.

(PDF)

pone.0324936.s002.pdf (74.7KB, pdf)
Appendix C. Topic list short interviews.

The translated version of the topic list as was used during the short interviews.

(PDF)

pone.0324936.s003.pdf (43.3KB, pdf)
Appendix D. Topic list long interviews.

The translated version of the topic list as was used during the long semi-structured interviews.

(PDF)

pone.0324936.s004.pdf (60.4KB, pdf)

Acknowledgments

We are grateful to Prof. Dr. Denise de Ridder for setting up the bigger context study and allowing for the qualitative study that served as a worked example in this study to be executed. We thank Dr. Marie-Louise Glebbeek for offering a critical reflection during the design and execution of the empirical study. We thank Efe Adu, Timo Keijzer, Jamie Keurntjes, Jonas van Oosten, & Danja Roelofs for their contributions to the data collection. We used ChatGPT for textual improvements.

The preregistration for this study can be accessed via this link: https://archive.org/details/osf-registrations-knb2d-v1

Data Availability

The authors of the qualitative study, the worked example in our paper, have decided to make the data openly available. We have added the relevant information in the submission, with a link to the repository (https://osf.io/e4zfw/).

Funding Statement

The author(s) received no specific funding for this work.

References

  • 1.Cohen BH. Lea RB. Essentials of statistics for the social and behavioral sciences. Wiley; 2004. [Google Scholar]
  • 2.Stangor C. Research methods for the behavioral sciences. Cengage Learning. 2014. [Google Scholar]
  • 3.Gribbons B, Herman J. True and quasi-experimental designs. Pract Assessm Res Eval. 1996;5(14). doi: 10.7275/fs4z-nb61 [DOI] [Google Scholar]
  • 4.List JA. Field experiments: a bridge between lab and naturally occurring data. BE J Econ Anal Policy. 2007;6(2) :1–45. doi: 10.2202/1538-0637.1747 [DOI] [Google Scholar]
  • 5.Schmuckler MA. What is ecological validity? A dimensional analysis. Infancy. 2001;2(4):419–36. doi: 10.1207/S15327078IN0204_02 [DOI] [PubMed] [Google Scholar]
  • 6.Roe BE, Just DR. Internal and external validity in economics research: tradeoffs between experiments, field experiments, natural experiments, and field data. Am J Agri Econ. 2009;91(5):1266–71. doi: 10.1111/j.1467-8276.2009.01295.x [DOI] [Google Scholar]
  • 7.Tijmstra J, Boeije H. Wetenschapsfilosofie in de context van de sociale wetenschappen. Boom. 2011. [Google Scholar]
  • 8.Verschuren PJM. Holism versus reductionism in modern social science research. Qual Quant. 2001;35(4):389–405. doi: 10.1023/a:1012242620544 [DOI] [Google Scholar]
  • 9.Smaling A. The pragmatic dimension. Qual Quant. 1994;28(3):233–49. doi: 10.1007/bf01098942 [DOI] [Google Scholar]
  • 10.van Steenbergen B. Potential influence of the holistic paradigm on the social sciences. Futures. 1990;22(10):1071–83. doi: 10.1016/0016-3287(90)90008-6 [DOI] [Google Scholar]
  • 11.Timmermans S, Tavory I. Theory construction in qualitative research. Sociol Theory. 2012;30(3):167–86. doi: 10.1177/0735275112457914 [DOI] [Google Scholar]
  • 12.Maxwell JA. 7. The SAGE handbook of applied social research methods. SAGE Publications, Inc.; 2009. [Google Scholar]
  • 13.Boeije H. Analysis in qualitative research. SAGE Publications Ltd.; 2010. [Google Scholar]
  • 14.Denzin NK, Lincoln YS. The SAGE handbook of qualitative research. Sage Publications, Inc.; 2017. [Google Scholar]
  • 15.Ritchie J, Lewis J, Mc Nuaghton Nicholls C, Ormston R. Qualitative research practice: a guide for social science students and researchers. Sage Publications, Inc.; 2013. [Google Scholar]
  • 16.Naber A. Qualitative experiment as a participating method in innovation research. Hist Soc Res. 2015;40:233–57. http://www.jstor.org/stable/24583154 [Google Scholar]
  • 17.Prowse M, Camfield L. Improving the quality of development assistance. Prog Develop Stud. 2012;13(1):51–61. doi: 10.1177/146499341201300104 [DOI] [Google Scholar]
  • 18.Levy Paluck E. The promising integration of qualitative methods and field experiments. ANNALS Am Acad Politic Soc Sci. 2010;628(1):59–71. doi: 10.1177/0002716209351510 [DOI] [Google Scholar]
  • 19.Luo Y, Sun F, Jiang L, Zhang A. The stress and coping experiences among Chinese rural older adults in welfare housing: through the lens of life review. Aging Ment Health. 2019;23(9):1086–94. doi: 10.1080/13607863.2019.1612322 [DOI] [PubMed] [Google Scholar]
  • 20.Pelak CF, Duncan S. Using a social science–fictional play to teach about global capitalism and macro-structural systems in introduction to sociology. Teach Sociol. 2016;45(4):334–46. doi: 10.1177/0092055x16663804 [DOI] [Google Scholar]
  • 21.Vors O, Marqueste T, Mascret N. The trier social stress test and the trier social stress test for groups: qualitative investigations. PLoS One. 2018;13(4):e0195722. doi: 10.1371/journal.pone.0195722 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Rossman GB, Wilson BL. Numbers and words. Eval Rev. 1985;9(5):627–43. doi: 10.1177/0193841x8500900505 [DOI] [Google Scholar]
  • 23.van Grootel L, Balachandran Nair L, Klugkist I, van Wesel F. Quantitizing findings from qualitative studies for integration in mixed methods reviewing. Res Synth Methods. 2020;11(3):413–25. doi: 10.1002/jrsm.1403 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.de Ridder D, Aarts H, Benjamins J, Glebbeek M-L, Leplaa H, Leseman P, et al. “Keep your distance for me”: a field experiment on empathy prompts to promote distancing during the COVID-19 pandemic. J Community Appl Soc Psychol. 2022;32(4):755–66. doi: 10.1002/casp.2593 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Glebbeek M, Leplaa HJ, Soppe KFB. The power of prompting: a qualitative experiment on prompting empathy. 2024.
  • 26.Kratochwill TR, Hitchcock J, Horner RH, Levin JR, Odom SL, Rindskopf DM. Single-case design technical documentation. 2010.
  • 27.L Haven T, Van Grootel DL. Preregistering qualitative research. Account Res. 2019;26(3):229–44. doi: 10.1080/08989621.2019.1580147 [DOI] [PubMed] [Google Scholar]
  • 28.Humphreys M, Sanchez de la Sierra R, van der Windt P. Fishing, commitment, and communication: a proposal for comprehensive nonbinding research registration. Polit anal. 2013;21(1):1–20. doi: 10.1093/pan/mps021 [DOI] [Google Scholar]
  • 29.Chauvette A, Schick-Makaroff K, Molzahn AE. Open data in qualitative research. Int J Qualit Methods. 2019;18. doi: 10.1177/1609406918823863 [DOI] [Google Scholar]
  • 30.Class B, de Bruyne M, Wuillemin C, Donzé D, Claivaz J-B. Towards open science for the qualitative researcher: from a positivist to an open interpretation. Int J Qualit Methods. 2021;20:1–15. doi: 10.1177/16094069211034641 [DOI] [Google Scholar]
  • 31.Roy S, Searle M. Scope creep and purposeful pivots in developmental evaluation. Canadian J Prog Eval. 2020;35(1):92–103. doi: 10.3138/cjpe.56898 [DOI] [Google Scholar]
  • 32.Bryman A. Integrating quantitative and qualitative research: how is it done? Qualit Res. 2006;6(1):97–113. doi: 10.1177/1468794106058877 [DOI] [Google Scholar]
  • 33.Glaser BG. Theoretical sensitivity. Mill Valley, CA: The Sociology Press. 1978. [Google Scholar]
  • 34.Cuff BMP, Brown SJ, Taylor L, Howat DJ. Empathy: a review of the concept. Emot Rev. 2014;8(2):144–53. doi: 10.1177/1754073914558466 [DOI] [Google Scholar]
  • 35.Carr D, Boyle EH, Cornwell B, Correll S, Crosnoe R, J F. The art and science of social research. W.W. Norton. 2021. [Google Scholar]
  • 36.Angrosino M, Rosenberg J. Observations on observation. 4th ed. SAGE. 1994. p. 467–78. [Google Scholar]
  • 37.Mulhall A. In the field: notes on observation in qualitative research. J Adv Nurs. 2003;41(3):306–13. doi: 10.1046/j.1365-2648.2003.02514.x [DOI] [PubMed] [Google Scholar]
  • 38.Reiss AJ. Systematic observation of natural social phenomena. Sociol Methodol. 1971;3:3. doi: 10.2307/270816 [DOI] [Google Scholar]
  • 39.Dearnley C. A reflection on the use of semi-structured interviews. Nurse Res. 2005;13(1):19–28. doi: 10.7748/nr2005.07.13.1.19.c5997 [DOI] [PubMed] [Google Scholar]
  • 40.Evers J. Kwalitatief interviewen: kunst en kunde. Boom/Lemma. 2015. [Google Scholar]
  • 41.Kallio H, Pietilä A-M, Johnson M, Kangasniemi M. Systematic methodological review: developing a framework for a qualitative semi-structured interview guide. J Adv Nurs. 2016;72(12):2954–65. doi: 10.1111/jan.13031 [DOI] [PubMed] [Google Scholar]
  • 42.McGrath C, Palmgren PJ, Liljedahl M. Twelve tips for conducting qualitative research interviews. Med Teach. 2019;41(9):1002–6. doi: 10.1080/0142159X.2018.1497149 [DOI] [PubMed] [Google Scholar]
  • 43.Korstjens I, Moser A. Series: Practical guidance to qualitative research. Part 4: trustworthiness and publishing. Eur J Gen Pract. 2018;24(1):120–4. doi: 10.1080/13814788.2017.1375092 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Archibald MM. Investigator triangulation. J Mixed Methods Res. 2015;10(3):228–50. doi: 10.1177/1558689815570092 [DOI] [Google Scholar]
  • 45.Patton MQ. Enhancing the quality and credibility of qualitative analysis. Health Serv Res. 1999;34(5 Pt 2):1189–208. [PMC free article] [PubMed] [Google Scholar]
  • 46.McMullin C. Transcription and qualitative methods: implications for third sector research. Voluntas. 2023;34(1):140–53. doi: 10.1007/s11266-021-00400-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Guba EG. Criteria for assessing the trustworthiness of naturalistic inquiries. ECTJ. 1981;29(2). doi: 10.1007/bf02766777 [DOI] [Google Scholar]
  • 48.Lincoln YS, Guba EG. Naturalistic inquiry. Sage. 1985. [Google Scholar]
  • 49.Palaganas E, Sanchez M, Molintas VP, Caricativo RD. Reflexivity in qualitative research: a journey of learning. TQR. 2017. doi: 10.46743/2160-3715/2017.2552 [DOI] [Google Scholar]
  • 50.Mauthner NS, Doucet A. Reflexive accounts and accounts of reflexivity in qualitative data analysis. Sociology. 2003;37(3):413–31. doi: 10.1177/00380385030373002 [DOI] [Google Scholar]
  • 51.Fusch PI, Ness LR. Are we there yet? Data saturation in qualitative research. Qualit Rep. 2015;20:1408–16. [Google Scholar]
  • 52.Coyne IT. Sampling in qualitative research. Purposeful and theoretical sampling; merging or clear boundaries?. J Adv Nurs. 1997;26(3):623–30. doi: 10.1046/j.1365-2648.1997.t01-25-00999.x [DOI] [PubMed] [Google Scholar]
  • 53.Henry P. Rigor in qualitative research: promoting quality in social science research. Res J Recent Sci. 2015;4:25–8. [Google Scholar]
  • 54.Kearney MH. Levels and applications of qualitative research evidence. Res Nurs Health. 2001;24(2):145–53. doi: 10.1002/nur.1017 [DOI] [PubMed] [Google Scholar]
  • 55.Sandelowski M, Barroso J. Classifying the findings in qualitative studies. Qual Health Res. 2003;13(7):905–23. doi: 10.1177/1049732303253488 [DOI] [PubMed] [Google Scholar]
  • 56.Charmaz K. Constructivist grounded theory. J Posit Psychol. 2016;12(3):299–300. doi: 10.1080/17439760.2016.1262612 [DOI] [Google Scholar]
  • 57.Henwood K, Pidgeon N. Remaking the link: qualitative research and feminist standpoint theory. Feminism Psychol. 1995;5(1):7–30. doi: 10.1177/0959353595051003 [DOI] [Google Scholar]
  • 58.Hodges BD, Kuper A, Reeves S. Discourse analysis. BMJ. 2008;337:a879. doi: 10.1136/bmj.a879 [DOI] [PubMed] [Google Scholar]
  • 59.Franzosi R. Narrative analysis—or why (and how) sociologists should be interested in narrative. Annu Rev Sociol. 1998;24(1):517–54. doi: 10.1146/annurev.soc.24.1.517 [DOI] [Google Scholar]
  • 60.Wojnar DM, Swanson KM. Phenomenology: an exploration. J Holist Nurs. 2007;25(3):172–80; discussion 181-2; quiz 183–5. doi: 10.1177/0898010106295172 [DOI] [PubMed] [Google Scholar]
  • 61.Palsola M, Renko E, Kostamo K, Lorencatto F, Hankonen N. Thematic analysis of acceptability and fidelity of engagement for behaviour change interventions: the Let’s Move It intervention interview study. Br J Health Psychol. 2020;25(3):772–89. doi: 10.1111/bjhp.12433 [DOI] [PubMed] [Google Scholar]
  • 62.VERBI Software. Maxqda 2022. 2021.
  • 63.ATLAS ti Scientific Software Development GmbH. ATLAS.ti 22 Windows. 2022.
  • 64.QSR International Pty Ltd. NVivo. 2020.
  • 65.Corbin JM, Strauss A. Grounded theory research: procedures, canons, and evaluative criteria. Qual Sociol. 1990;13(1):3–21. doi: 10.1007/bf00988593 [DOI] [Google Scholar]
  • 66.Stemler S. An overview of content analysis. Pract Assessm Res Eval. 2000;7:137–46. doi: 10.7275/z6fm-2e34 [DOI] [Google Scholar]
  • 67.Williams M, Moser T. The art of coding and thematic exploration in qualitative research. Int Manag Rev. 2019;15:45–55. [Google Scholar]
  • 68.Glaser BG, Strauss A. The discovery of grounded theory: strategies for qualitative research. Aldine. 1967. [Google Scholar]
  • 69.Scott C, Medaugh M. Axial coding. The international encyclopedia of communication research methods. Wiley. 2017. 1–2. doi: 10.1002/9781118901731.iecrm0012 [DOI] [Google Scholar]
  • 70.Giles TM, de Lacey S, Muir-Cochrane E. Coding, constant comparisons, and core categories: a worked example for novice constructivist grounded theorists. ANS Adv Nurs Sci. 2016;39(1):E29-44. doi: 10.1097/ANS.0000000000000109 [DOI] [PubMed] [Google Scholar]
  • 71.Strauss A, Corbin J. Basics of qualitative research: grounded theory procedures and techniques. Sage. 1990. [Google Scholar]
  • 72.Stone PJ, Dunphy DC, Smith MS, Ogilvie DM. The general inquirer: a computer approach to content analysis. MIT Press. 1966. [Google Scholar]

Decision Letter 0

Gabriel Velez

11 Oct 2024

PONE-D-24-29638A qualitative evaluation of an experiment: Studying the effects of empathy-inducing probes on distancing during COVID-19 to derive methodological guidelines.PLOS ONE

Dear Dr. Leplaa,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. This manuscript has a lot of value and this approach in particular enhances the understanding of quantitative results through qualitative findings. At the same time, both reviewers bring up serious concerns that I share. I will note that my decision was on the border between major revision and reject, but I do see the value of this manuscript and want to give the opportunity to address these concerns. In particular, I would note:

  • Need to qualify language and assertions (mentioned by both reviewers)

  • Clarify purposes of the analyses (see Reviewer 1)

  • Specifically address point 7 from Reviewer 1

Please submit your revised manuscript by Nov 25 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Gabriel Velez, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Please update your submission to use the PLOS LaTeX template. The template and more information on our requirements for LaTeX submissions can be found at http://journals.plos.org/plosone/s/latex. 3. In the online submission form, you indicated that "We did not collect data ourselves for this study. In this methodological paper we propose a specific way to analyze experimental research. Data was obtained through contact with the authors of an empirical study, which will be submitted in the next few months. We have an overlap with some of the authors, which made permission to support our methodological approach with a proof of conduct realistic. The authors of the other paper are the owners of the data, and decided to make it available upon request. In their paper they will specify the necessary procedure to acquire the data." All PLOS journals now require all data underlying the findings described in their manuscript to be freely available to other researchers, either 1. In a public repository, 2. Within the manuscript itself, or 3. Uploaded as supplementary information.This policy applies to all data except where public deposition would breach compliance with the protocol approved by your research ethics board. If your data cannot be made publicly available for ethical or legal reasons (e.g., public availability would compromise patient privacy), please explain your reasons on resubmission and your exemption request will be escalated for approval. 4. Please include your full ethics statement in the ‘Methods’ section of your manuscript file. In your statement, please include the full name of the IRB or ethics committee who approved or waived your study, as well as whether or not you obtained informed written or verbal consent. If consent was waived for your study, please include this information in your statement as well. 5. Please include a separate caption for each figure in your manuscript.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: No

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Dear authors,

The article submitted for my review is entitled “A qualitative evaluation of an experiment: Studying the effects of empathy-inducing probes on distancing during COVID-19 to derive methodological guidelines”. The article is classically structured and clearly argued. The theme of the article is stimulating: designing a qualitative evaluation method for a quasi-experiment. Nevertheless, I feel that certain elements of the article need to be improved or supplemented. I'd like to point them out below:

1) In the introduction, despite the authors' didactic efforts, the distinction between experience and quasi-experience is not clear. The authors do not give a clear definition of quasi-experience and what distinguishes it from experience. This is important, however, as the originality of the article seems to rest on the study of a quasi-experience.

2) Still in the introduction, the authors write “there is little literature on using a qualitative approach for the evaluation of experimental designs in behavioral science.” Firstly, I don't really agree with this peremptory assertion. Numerous books or articles deal to a greater or lesser extent with the qualitative evaluation of experimental design in behavioral science (for example: Kite & Whitley, 2012. Principles of research in behavioral science; Orcher, 2016. Conducting research: Social and behavioral science methods; Maxwell 2008. Designing a qualitative study The SAGE handbook of applied social research methods; etc.). Secondly, there are a multitude of methods for evaluating qualitative experiments (Rossman & Wilson, 1985, Numbers and words: Combining quantitative and qualitative methods in a single large-scale evaluation study. Evaluation review, 9(5), 627-643; Horsburgh, 2003, Evaluation of qualitative research. Journal of Clinical Nursing (Wiley-Blackwell), 12(2); Hennink, et al (2020). Qualitative research methods. Sage;Yadav, 2022. Criteria for good qualitative research: A comprehensive review. The Asia-Pacific Education Researcher, 31(6), 679-689. etc.). I fail to see where the contribution of the new method proposed by this article lies. It should be clarified in a very precise and documented way. In particular, it should be specified whether the proposed method concerns quasi-experiments or experiments, whether it concerns mixed or totally qualitative methods, or even experiments that are mainly quantitative but incorporate qualitative data; otherwise, it's hard to see where the article's contribution lies. At the end of the introduction, the authors state “we provide a step-by-step approach step-by-step approach for future researchers wishing to carry out a qualitative evaluation of a"quasi-)experimental design”, but it is not clear whether this is really a research gap. In this respect, in the “Methodological approach” section, the authors state “Our own study is an example of adding a qualitative element to a quantitative study”, so we'd be tempted to believe that the gap lies here, but we'd have to justify it much earlier in the article.

3) Still in this “Methodological approach” section, the authors mention a large number of bibliographical references (references 20 to 61). However, in a research article, bibliographical references usually appear in the introduction or discussion, but not in the methodological or results section. This gives the impression that your work is more like a literature review than a concrete experiment. The authors should clearly state their position on this point. Is the article a methodological review or the testing of a qualitative experimental method? This is not very clear.

4) In the “data collection” section, you state that you have carried out interviews and observations, but it's hard to understand why you are doing so (for what purpose) and with whom (list of interviewees). What's more, if I understand correctly, the purpose of these observations and interviews is to justify the results of the method you describe, but your results contain very little verbatim information about these observations and interviews. So I don't see how these observations and interviews validate your method.

5) Still in the presentation of the method, you insert a few verbatims after figure 2 and then announce that “Observations and interviews clearly showed that social interactions are context-dependent, i.e. that behavior depends not only on intentions (based on reasoning about actions and motivations for complying with them) but also, for example, on how friends or other contacts behave, their intentions and the perceived risk of not behaving as intended (see figure 2).”. It's possible to agree with this statement, but I don't see how it validates your method of evaluating a qualitative experiment? In my opinion, to validate a method, it is necessary to provide very factual evidence, for example by comparing the responses of the experimental group and the control group, and by showing that each stage of the method improves the results. In my opinion, your article fails to do this. Table 2 compares the results between the two groups, but I fail to see how this validates your method. To validate your method, it would probably have been more appropriate to set up two groups, an experimental group studied using the new method, and a control group not studied using this method.

6) To conclude the presentation and justification of your method, you write that “Our intention was to demonstrate the iterative process of hypothesis generation and validation through repeated re-evaluation of the data.

of data.” Firstly, I don't think this intention corresponds to the initial objective of your study set out in the introduction. Secondly, I discover that your method ultimately consists in determining hypotheses by re-evaluating data. Figure 1, which presents your method, and Table 3 do not mention the construction of these hypotheses, which, in my opinion, is part of a qualitative abductive research methodology not presented in this work.

7) I'm not very convinced by your discussion, as you fail to demonstrate the validity of your method. You write “Table 3 summarizes the essential elements for conducting a rigorous qualitative investigation of potential research projects.” Table 3 presents a literature review of the various stages (Research objective; Data collection

Personally, I regularly conduct qualitative studies and apply these steps from the literature, but I've never tested the validity of the overall method, and I don't know if the method is valid. After reading your work, I still don't know. You have carried out a literature review of the design stages of a qualitative study, which is interesting, but it is not possible to say, as you do, that this method is rigorous and reproducible. To write that, it would have been necessary, at the very least, to apply this method to several studies, or to analyze the results between an experimental group and a control group. I'm not saying that your method isn't correct, but you don't provide proof of it.

In conclusion, I'm not convinced by your work and the evidence you provide. You do a good job of analyzing the literature on the stages of qualitative research (but this has already been done several times in the social sciences), but you don't provide any proof of the reliability of your methodological approach. To publish your work, I think you need to start again from Table 3 and test this method in its entirety on several qualitative studies in order to conclude whether or not this method is reliable and rigorous.

Thank you for your attention, and I encourage you to take this work further.

Yours faithfully

Reviewer #2: Dear Editor, Dear Authors,

Thank you for the opportunity to read this interesting manuscript, in which the methodology of a qualitative study is described clearly and step-by-step. In my opinion, qualitative research often holds an unjustly subordinate position in the research landscape compared to quantitative research. However, the qualitative approach provides a valuable opportunity to explain interventions or quantitative results, thereby offering a more comprehensive understanding of (complex) interventions. This is precisely the approach taken by the authors of this manuscript, as they aim to demonstrate a possible implementation and pave the way for future researchers to adopt a similar methodology. However, there are a few points I would like to emphasize. Below, you can find my suggestions:

• Abstract: Please include the qualitative research design, specifying that the qualitative evaluation was conducted using Grounded Theory. I would also suggest clarifying this earlier in the manuscript to help guide the reader (e.g., a guide for a qualitative evaluation based on Grounded Theory).

• In your manuscript, you frequently refer to the 'effect of the intervention' or 'intervention effect' in the context of the qualitative approach. The term 'effect' is more commonly associated with quantitative research, although it is debated in the literature. I find your approach very interesting, particularly in the chapter 'The intervention effect,' which, among other things, addresses the quantitative representation of qualitative results and hypothesis generation. However, please clarify more clearly, what you mean by this in the qualitative context. You often discuss how qualitative approaches enrich quantitative (quasi-experimental) studies, and at times you refer to the 'intervention effect' as something that could potentially replace quantitative approaches. In this context, I would like to refer to the sentence on page 8: 'Note that intervention effects can be evaluated exclusively through qualitative data, or the qualitative data can be an addition to quantitative data within the same study.' There may be a language issue here, but I believe this statement requires further clarification. In my view, qualitative research cannot replace quantitative intervention research, but it can certainly enhance it by providing a clearer understanding of why an intervention works or doesn't work.

• In the ethics statement and data availability section (provided by the submission system), you mention that you obtained permission to describe the data and conduct of the qualitative study by Glebbeek et al. (2024) in this manuscript. You also note an overlap in authorship, which leads me to conclude that some of the authors were involved in both the Glebbeek study and this manuscript. Furthermore, in the manuscript itself, you mention that there was a quantitative study (De Ridder) and a qualitative one (Glebbeek) related to the COVID-19 intervention, with the latter being discussed in this manuscript. That was clear to me. However, the sudden use of phrases such as 'The research goal we defined for the qualitative study…,' as well as 'In our study...' and 'We did...' in the context of the qualitative study was somewhat confusing. I suggest clarifying the distinction for the reader by describing this issue more clearly in the manuscript itself.

• Worked example, page 6: How long, or how many weeks, was an A or B sequence in the A-B design?

• The Table 3 is great because it provides an excellent overview of the topic. One suggestion I have is to introduce an additional column and separate the references into those used for your grounded theory and those that can also be applied (as further reading). Furthermore, I understand that you cannot cover all possible qualitative aspects here, but I would like to point out that qualitative content analysis is also a viable option to be applied in this context.

• Discussion, Limitation, the following sentence: 'Even when invited for the interview near the intervention location, most people preferred the offered option of doing the interview online or at an other time. This changed their status to control group participants.' Please explain why their status was changed to that of the control group.

• Discussion: I would find it interesting if you could revisit and discuss the two research approaches using the studies of Reed 2021 and Geed 2024. You could highlight the benefits that this qualitative study brings to the quantitative results of Reed.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2025 Jun 16;20(6):e0324936. doi: 10.1371/journal.pone.0324936.r003

Author response to Decision Letter 1


Please see the uploaded file for a better lay-out.

Reviewer Comment Response

Rev 1 The article submitted for my review is entitled “A qualitative evaluation of an experiment: Studying the effects of empathy-inducing probes on distancing during COVID-19 to derive methodological guidelines”. The article is classically structured and clearly argued. The theme of the article is stimulating: designing a qualitative evaluation method for a quasi-experiment. Nevertheless, I feel that certain elements of the article need to be improved or supplemented. We thank the reviewer for their kind words regarding the theme of the paper, as well as the structure and arguments in the manuscript. Before responding to all of the comments one by one, we have a more general response as well.

From the questions and comments of reviewer 1, it is clear to us that we need to better outline the main goal of this study and paper. By doing that we trust that this also solves many of the more specific comments but of course we will also still address them one by one.

The goal of this paper is to show (quantitative) researchers in the behavioral sciences how to use qualitative methods in the context of experimental research. We use the paper of Glebbeek et al. (2024) as a worked example, which was executed in the larger context of research by De Ridder et al. (2020). In our manuscript, we summarize the steps needed to execute a qualitative analysis in an experimental study. The aim of this paper, therefore, is to serve as a methodological tutorial. It’s developed by describing and discussing the methodological steps and considerations of the empirical example (Glebbeek et al) that is used as illustration throughout the paper.

To bring this structure and the scope of our paper across we made several changes to the manuscript: 1) we changed the title, 2) we have rewritten the abstract, 3) throughout the introduction we rephrased the scope of the paper (mostly on p.2 and p.4-5).

1. In the introduction, despite the authors' didactic efforts, the distinction between experience and quasi-experience is not clear. The authors do not give a clear definition of quasi-experience and what distinguishes it from experience. This is important, however, as the originality of the article seems to rest on the study of a quasi-experience. On p.2, in the second paragraph, we reformulated the definitions of quasi- en true experiments to better clarify the distinction.

As outlined above, we made several changes to better communicate the scope of this article and its originality. Since this is not the presentation of a quasi-experiment we made additional changes in formulations and explanations. In most places, we replaced the term “(quasi-) experiments” (comprising both options) with experimental designs (also comprising both options) and are explicit about when we mean true experiment or quasi-experiment. We added an explanation on p.6, that, while our illustrative example concerns a quasi-experiment, the methodological guidelines we present in this manuscript can also be applied to other experimental designs.

2. Still in the introduction, the authors write “there is little literature on using a qualitative approach for the evaluation of experimental designs in behavioral science.” Firstly, I don't really agree with this peremptory assertion. Numerous books or articles deal to a greater or lesser extent with the qualitative evaluation of experimental design in behavioral science (for example: [1]Kite & Whitley, 2012. Principles of research in behavioral science; [2]Orcher, 2016. Conducting research: Social and behavioral science methods; [3] Maxwell 2008. Designing a qualitative study The SAGE handbook of applied social research methods; etc.). Secondly, there are a multitude of methods for evaluating qualitative experiments ([4]Rossman & Wilson, 1985, Numbers and words: Combining quantitative and qualitative methods in a single large-scale evaluation study. Evaluation review, 9(5), 627-643; [5] Horsburgh, 2003, Evaluation of qualitative research. Journal of Clinical Nursing (Wiley-Blackwell), 12(2); [6] Hennink, et al (2020). Qualitative research methods. Sage; [7] Yadav, 2022. Criteria for good qualitative research: A comprehensive review. The Asia-Pacific Education Researcher, 31(6), 679-689. etc.).

2B. I fail to see where the contribution of the new method proposed by this article lies. It should be clarified in a very precise and documented way. In particular, it should be specified whether the proposed method concerns quasi-experiments or experiments, whether it concerns mixed or totally qualitative methods, or even experiments that are mainly quantitative but incorporate qualitative data; otherwise, it's hard to see where the article's contribution lies. At the end of the introduction, the authors state “we provide a step-by-step approach step-by-step approach for future researchers wishing to carry out a qualitative evaluation of a"quasi-)experimental design”, but it is not clear whether this is really a research gap. In this respect, in the “Methodological approach” section, the authors state “Our own study is an example of adding a qualitative element to a quantitative study”, so we'd be tempted to believe that the gap lies here, but we'd have to justify it much earlier in the article. Thank you for providing suggestions for relevant literature. We were familiar with almost all, and checked the other references. While this work is indeed about qualitative research, it lacks the specific methodological focus on the qualitative evaluation of experimental designs in behavioral sciences.

We added several of the suggested references for further reading on qualitative methods in general (p.4). Also the references about quantifying qualitative results that were suggested by the reviewer, are now included (p.4).

2B. We trust that with a better presentation of the scope of this paper (see first response), and a better positioning with regards to type of designs (see response to remark 1), we have now clarified what the contribution of this paper is.

3. Still in this “Methodological approach” section, the authors mention a large number of bibliographical references (references 20 to 61). However, in a research article, bibliographical references usually appear in the introduction or discussion, but not in the methodological or results section. This gives the impression that your work is more like a literature review than a concrete experiment. The authors should clearly state their position on this point. Is the article a methodological review or the testing of a qualitative experimental method? This is not very clear. Thanks again for pointing out that the scope of the paper was really not clearly presented. We prefer the term methodological tutorial over methodological review but, indeed, the main message is not about the results of the illustrative example we used; and thus the manuscript does not follow the typical structure of a paper presenting empirical research. See also our first general response.

With regards to questions 3, 4 and 5: We do realize that we were also using confusing language to refer to the different studies that play different roles in this work. This manuscript presents a methodological tutorial. The worked example, is the qualitative study that evaluates a quasi-experiment presented in Glebbeek et al. The study by Glebbeek et al. is part of a larger study presented by De Ridder. The different studies are now explicitly and consistently referred to as, respectively, “our study” (the tutorial), the worked example/illustration/Glebbeek et al. (the empirical qualitative study) and the bigger context study/De Ridder et al. (providing the study design and experimental manipulation).

4. In the “data collection” section, you state that you have carried out interviews and observations, but it's hard to understand why you are doing so (for what purpose) and with whom (list of interviewees). What's more, if I understand correctly, the purpose of these observations and interviews is to justify the results of the method you describe, but your results contain very little verbatim information about these observations and interviews. So I don't see how these observations and interviews validate your method. We feel that this remark follows from the same misconception about the general scope and the role of the empirical example in this paper. In the paper by Glebbeek et all., the empirical study and its results are presented in detail. Here, we just use it as an illustration to present a (relatively unknown and not often used) method. See also response 3.

As research methodologists with expertise and experience in qualitative methods, we feel we are in the position to write this tutorial, outlining how experimental designs can be evaluated using qualitative data and analysis. The aim is not to validate a method through an example study.

5. Still in the presentation of the method, you insert a few verbatims after figure 2 and then announce that “Observations and interviews clearly showed that social interactions are context-dependent, i.e. that behavior depends not only on intentions (based on reasoning about actions and motivations for complying with them) but also, for example, on how friends or other contacts behave, their intentions and the perceived risk of not behaving as intended (see figure 2).”. It's possible to agree with this statement, but I don't see how it validates your method of evaluating a qualitative experiment? In my opinion, to validate a method, it is necessary to provide very factual evidence, for example by comparing the responses of the experimental group and the control group, and by showing that each stage of the method improves the results. In my opinion, your article fails to do this. Table 2 compares the results between the two groups, but I fail to see how this validates your method. To validate your method, it would probably have been more appropriate to set up two groups, an experimental group studied using the new method, and a control group not studied using this method. We assume this comment is resolved by the changes as discussed in our first general response, combined with the responses for 3 and 4.

6. To conclude the presentation and justification of your method, you write that “Our intention was to demonstrate the iterative process of hypothesis generation and validation through repeated re-evaluation of the data.

of data.” Firstly, I don't think this intention corresponds to the initial objective of your study set out in the introduction. Secondly, I discover that your method ultimately consists in determining hypotheses by re-evaluating data. Figure 1, which presents your method, and Table 3 do not mention the construction of these hypotheses, which, in my opinion, is part of a qualitative abductive research methodology not presented in this work. We thank the reviewer for pointing out the confusing use of the word hypothesis in this part of the manuscript. To better convey our message we replaced the term ‘hypothesis’ with ‘preliminary explanation’, as we agree that ‘hypothesis’ can be misleading in this context. The iterative process of formulating ideas on potential themes and relations between themes, and checking/testing those ideas with the other available data is inherent to the processes of axial and selective coding, both included in Table 3.

By replacing the term, and defining the replacing term on p.26, we trust this point gets across more clearly now.

7. I'm not very convinced by your discussion, as you fail to demonstrate the validity of your method. You write “Table 3 summarizes the essential elements for conducting a rigorous qualitative investigation of potential research projects.” Table 3 presents a literature review of the various stages (Research objective; Data collection

Personally, I regularly conduct qualitative studies and apply these steps from the literature, but I've never tested the validity of the overall method, and I don't know if the method is valid. After reading your work, I still don't know. You have carried out a literature review of the design stages of a qualitative study, which is interesting, but it is not possible to say, as you do, that this method is rigorous and reproducible. To write that, it would have been necessary, at the very least, to apply this method to several studies, or to analyze the results between an experimental group and a control group. I'm not saying that your method isn't correct, but you don't provide proof of it. See our earlier responses, especially for comment 4. Testing the validity of a method is not the scope of this paper. We aim to provide a methodological tutorial on applying qualitative methods in the context of experimental designs.

Rev 2 1. Abstract: Please include the qualitative research design, specifying that the qualitative evaluation was conducted using Grounded Theory. I would also suggest clarifying this earlier in the manuscript to help guide the reader (e.g., a guide for a qualitative evaluation based on Grounded Theory). Thank you for pointing this out. We agree and added this to the abstract (p.1) and in the introduction at an earlier stage (p.4).

2. A. In your manuscript, you frequently refer to the 'effect of the intervention' or 'intervention effect' in the context of the qualitative approach. The term 'effect' is more commonly associated with quantitative research, although it is debated in the literature. I find your approach very interesting, particularly in the chapter 'The intervention effect,' which, among other things, addresses the quantitative representation of qualitative results and hypothesis generation. However, please clarify more clearly, what you mean by this in the qualitative context.

2B. You often discuss how qualitative approaches enrich quantitative (quasi-experimental) studies, and at times you refer to the 'intervention effect' as something that could potentially replace quantitative approaches. In this context, I would like to refer to the sentence on page 8: 'Note that intervention effects can be evaluated exclusively through qualitative data, or the qualitative data can be an addition to quantitative data within the same study.' There may be a language issue here, but I believe this statement requires further clarification. In my view, qualitative research cannot replace quantitative intervention research, but it can certainly enhance it by providing a clearer understanding of why an intervention works or doesn't work. 2A: We used the term intervention effect to describe any change that is assumed to be the consequence of being exposed to an intervention/manipulation, irrespective of how this change is defined or measured (i.e., with quantitative outcomes/variables or in a qualitative way). However, we agree that readers may interpret the word ‘effect’ only as a quantitative measure. We therefore added such an explanation on p.3.

2B: First of all, we see how our phrasing could be confusing, and adjusted it (p. 8). Obviously, we agree with the reviewer that qualitative experimental research cannot in general replace quantitative experimental research. However, we do believe and argue that one could, on occasion, decide to focus solely on qualitatively defined and measured effects.

3. In the ethics statement and data availability section (provided by the submission system), you mention that you obtained permission to describe the data and conduct of the qualitative study by Glebbeek et al. (2024) in this manuscript. You also note an overlap in authorship, which leads me to conclude that some of the authors were involved in both the Glebbeek study and this manuscript. Furthermore, in the manuscript itself, you mention that there was a quantitative study (De Ridder) and a qualitative one (Glebbeek) related to the COVID-19 intervention, with the latter being discussed in this manuscript. That was clear to me. However, the sudden use of phrases such as 'The research goal we defined for the qualitative study…,' as well as 'In our study...' and 'We did...' in the context of the qualitative study was somewhat confusing. I suggest clarifying th

Attachment

Submitted filename: Response to Reviewers.docx

pone.0324936.s005.docx (32.8KB, docx)

Decision Letter 1

Gabriel Velez

3 Apr 2025

PONE-D-24-29638R1Applying qualitative methods to experimental designs: A tutorial for the behavioral sciencesPLOS ONE

Dear Dr. Leplaa,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. I, as well as the reviewer, appreciate all the effort that went into addressing the comments in the first round. It is clear that much of this was taken into account and integrated into the manuscript. Still, there is a major concern the reviewer and I both share in terms of what is not included. Specifically, it is impossible to assess the quality, reliability, and validity of the qualitative method proposed by authors without presenting the data used to develop and test this method. Much more is needed in this regard, as the reviewer lays out. As a note, unfortunately, if it is not addressed in this round, the paper will most likely be rejected.

Please submit your revised manuscript by May 18 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Gabriel Velez, Ph.D.

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: No

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Dear authors,

I would like to thank you very much for the efforts you have made to rework and revise your article. The objective of the article is much clearer and the concepts are much better defined. To further improve this work, I recommend that the authors take the following remarks into consideration:

1) The objective of the article is now clearly defined: “to propose a step-by-step approach for future researchers who wish to conduct a qualitative evaluation of a (quasi-)experimental design”. However, in the title of the article, but also in the text, it is difficult to understand whether the qualitative evaluation proposed relates to an experiment or a quasi-experiment. The authors would do well to be consistent throughout the article and to refer to either qualitative evaluation of experiment or qualitative evaluation of quasi-experiment.

2) Also in the introduction, I understand that the research gap is as follows: “there is little literature on the use of a qualitative approach for the evaluation of experimental models in the field of behavioral sciences”. Therefore, is it not necessary to specify in the title of the article and in the research objective that it is a qualitative evaluation in the field of behavioral sciences? Moreover, on pages 6 and 7, the authors again indicate their research objective and mention that the objective is “the study of how to conduct a qualitative study of a (quasi-)experiment in the context of behavioral research.” In my opinion, it is necessary to be consistent and to follow the same research objective throughout the article.

3) In the methodological presentation, the authors first present the previous work of De Ridder et al. and Glebbeek et al., which seem to be the work on which the methodological study of this article is based.

Therefore, it would be appropriate to present the samples of each of these two studies in detail. How many people did these studies cover? Who were the respondents and interviewees? How were the control and intervention groups, if any, constituted? Throughout the methodological presentation that follows, the characteristics of the samples and data that were studied are not known. You present the different stages of the qualitative evaluation but we never know which people were questioned or interviewed, how these people were chosen, or how the control and intervention groups were designed. This is very problematic because we want to believe you, but it is difficult to evaluate the relevance of a qualitative method if you do not present the data used.

4) In your proposed evaluation method (figure 1), you emphasize data collection and analysis as essential steps, but in your own study you do not present this data and you do not indicate how it was collected. Once again, I understand that you have used secondary data from the work of De Ridder et al and Glebbeek et al. but you must present the characteristics of this data otherwise it is very difficult to judge the quality of your study and the qualitative method you propose. It is not a question of giving us access to all the data but of presenting it. When were they collected? From whom? How did the interviews go? What were the socio-demographic characteristics of the interviewees? You state on page 7 that “The opening up of data is also an essential aspect of scientific research, as it reinforces transparency, responsibility and reproducibility”. However, you do not present the data from your study. This is rather problematic.

5) Tables 1 and 2 are interesting because they present the data coding elements, but we still do not know the data used (table 1) and the characteristics of the people interviewed; nor do we know the make-up of the control and intervention groups (table 2). Without a presentation of the socio-demographic characteristics of these data, for my part, it is impossible for me to say whether your evaluative method is reliable and valid. In my opinion, no one can comment on the validity and reliability of a method without knowledge of the data used to test that method. Despite all the kindness that one can have for your study, it is impossible.

6) The discussion and contributions are interesting but once again, without a detailed presentation of your data, it is not possible to tell you whether or not your contributions and your 4-step methodology are valid. Furthermore, you state that the quantitative method reinforces the quantitative methods but it is not very clear, if this is the case, how the qualitative data of Glebbeek et al. reinforce the quantitative data of De Ridder et al.

I therefore recommend that you present your secondary data from the two previous studies in detail (in one or two additional tables, for example). Without the presentation of this data, I cannot comment on the results of your study and the reliability of your method, even if I can assume that it is sound. I am not asking for the identity of the people interviewed or questioned, but simply for their number, their socio-demographic characteristics, the way in which they were divided into the control and test groups, the duration of the interviews, the interview guide and any other relevant information that would enable the reliability and validity of your qualitative method to be evaluated. Thank you for your understanding.

Thank you very much for giving me the opportunity to read your work, which has a potentially very interesting objective.

Best regards

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2025 Jun 16;20(6):e0324936. doi: 10.1371/journal.pone.0324936.r005

Author response to Decision Letter 2


See the cover letter for a better edited version of this response:

We understand the confusion created by using a quasi-experiment as the worked example, while the methodological tutorial addresses experimental designs in general (including true experiments and quasi-experiments, as specified on lines 10-11). However, the methodological steps presented in this tutorial are equivalent for the two types, so it does not make sense to restrict the general presentation to quasi-experiments only, just because our worked example is of that type. We understand that this needs explicit communication and consistency in language throughout the paper and we thank the reviewer for pointing out that this needs improvement.

On lines 20-22, we added an explicit statement: “Throughout the paper, when we use the term ‘experiment’ or ‘experimental design’ this includes both true and quasi- experiments.”

We trust that by adding this explanation, this avoids the potential ambiguity at several positions in the paper, e.g., lines 84-86 state: “We provide a step-by-step approach for future researchers aiming for a qualitative evaluation of an experimental design, supported with a worked example.” Also, in the title we use the term “experimental designs” and not experiment or quasi-experiment.

In the presentation of the worked example, on lines 136-138 we state explicitly: “While the GLS study is an example of a quasi-experiment, the lessons and guidelines can be applied directly to true experiments as well.”

To further avoid confusion, throughout the paper we have removed every use of the term “(quasi-)experimental” and changed this into “experimental design” (lines 60, 135, 169, 451).

We agree with the reviewer that the title should capture exactly that. However, in our opinion, it does, because the title states: “Applying qualitative methods to experimental designs: A tutorial for the behavioral sciences.”

We are therefore not sure what the reviewer is missing (or perhaps the change in title between version 1 and 2 of the manuscript was not noticed?). If we are misunderstanding this remark, and it is not the term behavioral that is considered problematic but instead it is another reference to the confusion we created with the term (quasi-) experiments, then we refer to our response 1.

Our methodological tutorial is indeed based on the empirical study of Glebbeek et al., but not directly on the study by De Ridder et al. What we aimed to communicate is that we made use of the intervention that De Ridder et al. set-up for their own study, but other than that the Glebbeek study was executed independently from the De Ridder study. We made the following changes to better explain this in our manuscript:

On lines 86-100 we rephrased the introduction and description of both studies and their relation to the methodological tutorial presented in this manuscript.

We also changed the section title “Worked example” into ‘Context of the worked example’ in order to better represent the goal and content of this section. The section describes the study design and interventions of the De Ridder et al. study, because Glebbeek made use of this set-up, but the De Ridder study itself is not our worked example (Glebbeek et al. is).

We seriously considered if presenting information about the De Ridder study (e.g. sample characteristics) would be a useful addition to this paper and decided against it. The data collected by De Ridder et al. is not used in this paper; the analyses we use as illustrations in this paper are only based on the Glebbeek study.

We do understand the request for additional information about the Glebbeek study and have provided much of the requested information.

We have added sample information in Section 2.2: Data collection. On lines 294-296 we give more information on who were observed, and on lines 331-334 we give more information on who were interviewed.

We also added appendices A through D, containing the observation schemes and topic lists used for data collection by Glebbeek et al. We introduce these appendices on lines 292, 302, and 324.

In Section 2.3: Reflecting and Adjusting, we shortly discuss the theoretical sampling plan (lines 404-413) and explain that we limited the amount of information collected on participants, in order to prevent undue intrusion in their privacy. Collecting qualitative data always means that you collect quite some personal data, and limiting the amount of data to only the relevant information protects your participants from possible harm. With that in mind, Glebbeek et al. stopped registering, for example, gender and age and other demographic information that deemed not relevant in this study.

See our responses to 3/4 and corresponding additions to the paper.

In addition, we also like to stress again that the worked example is not used to validate the method but to illustrate the methodological steps presented in the tutorial. As such, the tutorial is based on the combination of a review of literature on qualitative research in the context of behavioral sciences, and our extensive personal experiences with qualitative research in general, and in this illustrative example evaluating results from an experimental design in particular. We very carefully checked that we do not claim, anywhere in the paper, that illustrations from the worked example are presented as evidence of validity.

Also, we like to refer to lines 841-848 (last paragraph of the discussion) for a modest formulation of what we offer: ‘a useful starting point, without claiming

that the presented guidelines are all-encompassing.’

Decision Letter 2

Gabriel Velez

4 May 2025

Applying qualitative methods to experimental designs: A tutorial for the behavioral sciences

PONE-D-24-29638R2

Dear Dr. Leplaa,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Gabriel Velez, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Gabriel Velez

PONE-D-24-29638R2

PLOS ONE

Dear Dr. Leplaa,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Gabriel Velez

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Appendix A. Observation list control week.

    The translated version of the observation list as was used during the control weeks

    (PDF)

    pone.0324936.s001.pdf (58.6KB, pdf)
    Appendix B. Observation list intervention weeks.

    The translated version of the observation list as was used during the intervention weeks.

    (PDF)

    pone.0324936.s002.pdf (74.7KB, pdf)
    Appendix C. Topic list short interviews.

    The translated version of the topic list as was used during the short interviews.

    (PDF)

    pone.0324936.s003.pdf (43.3KB, pdf)
    Appendix D. Topic list long interviews.

    The translated version of the topic list as was used during the long semi-structured interviews.

    (PDF)

    pone.0324936.s004.pdf (60.4KB, pdf)
    Attachment

    Submitted filename: Response to Reviewers.docx

    pone.0324936.s005.docx (32.8KB, docx)

    Data Availability Statement

    The authors of the qualitative study, the worked example in our paper, have decided to make the data openly available. We have added the relevant information in the submission, with a link to the repository (https://osf.io/e4zfw/).


    Articles from PLOS One are provided here courtesy of PLOS

    RESOURCES