Abstract
Introduction
Recently, validity as a social imperative was proposed as an emerging conceptualization of validity in the assessment literature in health professions education (HPE). To further develop our understanding, we explored the perceived acceptability and anticipated feasibility of validity as a social imperative with users and leaders engaged with assessment in HPE in Canada.
Methods
We conducted a qualitative interpretive description study. Purposeful and snowball sampling were used to recruit participants for semi-structured individual interviews and focus groups. Each transcript was analyzed by two team members and discussed with the team until consensus was reached.
Results
We conducted five focus group and eleven interviews with two different stakeholder groups (users and leaders). Our findings suggest that the participants perceived the concept of validity as a social imperative as acceptable. Regardless of group, participants shared similar considerations regarding: the limits of traditional validity models, the concept’s timeliness and relevance, the need to clarify some terms used to characterize the concept, the similarities with modern theories of validity, and the anticipated challenges in applying the concept in practice. In addition, participants discussed some limits with current approaches to validity in the context of workplace-based and programmatic assessment.
Conclusion
Validity as a social imperative can be interwoven throughout existing theories of validity and may represent how HPE is adapting traditional models of validity in order to respond to the complexity of assessment in HPE; however, challenges likely remain in operationalizing the concept prior to its implementation.
Abstract
Contexte
Une nouvelle manière de concevoir la validité en matière d’évaluation des apprentissages dans les programmes en sciences de la santé a récemment été proposée : la validité en tant qu’impératif social. Pour mieux la comprendre, nous avons exploré l’acceptabilité perçue et la faisabilité anticipée de la validité en tant qu’impératif social auprès d’utilisateurs et de leaders en matière d’évaluation en éducation des professions de la santé au Canada.
Méthodes
Nous avons mené une étude qualitative descriptive interprétative. Pour recruter les participants aux entrevues individuelles semi-structurées et aux groupes de discussion, nous avons utilisé l’échantillonnage par choix raisonné et l’échantillonnage boule de neige. Les transcriptions ont été analysées par deux membres de l’équipe et discutées avec l’ensemble de l’équipe jusqu’à l’obtention d’un consensus.
Résultats
Nous avons mené cinq groupes de discussion et onze entretiens avec deux groupes de parties prenantes, l’un composé d’utilisateurs, l’autre de leaders. Nos résultats suggèrent que les participants estiment acceptable le concept de validité comme impératif social. Quel que soit le groupe, les participants ont partagé des considérations similaires concernant : les limites des modèles de validité traditionnels, l’actualité et la pertinence du concept, la nécessité de clarifier certains termes utilisés pour définir le concept, les similitudes avec les théories modernes de la validité, et les défis anticipés de son application. En outre, les participants ont soulevé certaines limites des approches actuelles de la validité dans le contexte de l’évaluation en milieu de travail et de l’évaluation programmatique.
Conclusion
La notion de validité comme impératif social peut être incorporée dans les théories existantes pour traduire l’adaptation des modèles traditionnels de la validité à la complexité de l’évaluation en éducation des professions de la santé; cependant, certains défis liés à l’opérationnalisation du concept seraient à résoudre avant sa mise en œuvre.
Introduction
In recent years, different assessment approaches have gained popularity including narrative assessment,1 rater-based and workplace-based assessment,2,3 and programmatic assessment.4,5 In parallel, researchers and educators have increased their attention to integrating a variety of data sources in the context of a comprehensive validation process.4,6 Unfortunately, current validation practices may not be well suited for recent changes in assessment practices, and therefore, adapting our ways of thinking about - and engaging with - validity are required. Marceau et al.7 suggested that an emerging conceptualization of validity, validity as a social imperative, may reflect a shift in response to more recent assessment practices that necessitate consideration of validity evidence beyond psychometric indicators of quality. In a concept analysis, Marceau et al.7 identified four characteristics (also referred as attributes in the appendixes 1 and 2) of the concept of validity as a social imperative: 1) validity evidence seen as credible by society; 2) validity built into the assessment process; 3) interpretation of the combination of assessment findings and 4) validity evidence includes quantitative and qualitative data (Table 1).
Table 1.
Characteristics of the concept of validity as a social imperative identified in a concept analysis
Characteristics | Definition |
---|---|
Validity evidence seen as credible by society | Teaching institutions and regulatory agencies must be able to document, in a way that is perceived as credible by society, decisions made regarding a students’ knowledge, attitude, skills and competencies. |
Validity built into the assessment process | Validity evidence includes the justification of decisions made during the development and administration of an assessment, and the interpretation of assessment results. This evidence includes consideration of the potential consequences that the interpretation of the assessment scores could have on the individual, the institution, and society. |
Interpretation of the combination of assessment findings | Assessment data generated within an assessment program are often combined to make a final judgment. Validity evidence should be collected to support the combined or total score interpretation– evidences should align with the intended score use. |
Validity evidence includes quantitative and qualitative data | Validity evidence must be collected using rigorous approaches, and attention should be paid to quantitative and qualitative data sources as legitimate validity evidence. |
Adapted from Marceau et al.7
Our current understanding of the concept of validity as a social imperative is based on analyses of published literature, generated primarily within medical education, and therefore reflecting an academically-oriented discussion.7,8 We know little regarding if, and how, this emerging conceptualization of validity resonates with, is understood by, or perceived by members of the health professions education (HPE) community engaged in assessment, validity, and validation. Understanding the perceived strengths, weaknesses, anticipated challenges, and implications of validity as a social imperative for assessment in HPE is critical for further refinement of the concept, to better understand the place of this concept within current approaches to validity at play in HPE, and to engage stakeholders to help shape its operationalization for use. In this study, we explored the perceived acceptability and anticipated feasibility of the concept of validity as a social imperative with users and leaders engaged with assessment in HPE in Canada.
Methods
We conducted a qualitative interpretive description study adapted from Thorne's approach9–11 to ground findings in practice and to generate meaningful results, with consideration for multiple possible viewpoints. The constructivist paradigm guided our methodological decisions at each step (e.g., data collection, and analysis) to maximize consistency across the different phases and populations included in the study.12,13 We targeted individuals from two stakeholder groups for this study: 1) educators and assessment committee members (subsequently referred to as Users) and 2) individuals engaged in HPE assessment and scholarship (subsequently referred to as Leaders). A brief description of the study procedures for the two stakeholder groups is presented in Table 2. For Users, we chose focus groups to gather different perspectives enriched by the interaction between participants,14 but due to participant availability, some were interviewed individually. For Leaders, we conducted semi-structured individual interviews which allowed us to deepen the subject with the participants. Both stakeholders’ groups were selected to reflect the diversity within Canadian assessment communities in HPE. The study was conducted in Canada which has been identified as a leader in research in HPE (e.g.: number of publications per medical school), in assessment and validity in medical education specifically,15 and in medical education more broadly.16
Table 2.
Procedure used with stakeholders
Procedure | Stakeholders | |
---|---|---|
Population | Users | Leaders |
Participants | Educators and assessment committee members | Individuals involved in the Canadian HPE research and assessment community |
Sampling | Purposeful sampling | Purposeful sampling Snowball sampling |
Recruitment | Email to educational program directors | Direct email |
Duration | October and November 2016 | January to July 2017 |
Method | Focus group and individual interviews Length: 60-90 minutes |
Individual phone interviews Length: 60-90 minutes |
Material | Sociodemographic questionnaire Semi-structured interview guide |
Ethical consideration
This study was approved for both participant populations by the board of Education and Social Sciences, Université de Sherbrooke (2016-34-ESS). For interviews and focus groups with users, ethics approval was subsequently obtained from the Quebec universities that deliver nursing, medicine, physiotherapy or occupational therapy programs (our HPE programs of interest). Consent was obtained prior to the start of the interview and no compensation was provided. Participation was voluntary, and participants could withdraw at any time. The research team ensured the confidentiality of the data collected and removed all identifying information from the transcripts prior to analysis.
Participants and Recruitment
Users: We recruited educators and assessment committee members using purposeful sampling to encourage different perspectives from different programs of four universities. To be included, participants had to be professors, lecturers, or assessment committee members for more than one year. They had to be involved in nursing, medicine, physiotherapy, or occupational therapy programs from the targeted Québec Universities (region-specific data collection in English and French was done to facilitate face-to-face focus groups). We included participants engaged in the design, validation, or monitoring of assessment strategies within their local context. We excluded individuals who: 1) deliver only a few lectures, 2) are small-group tutors or instructors, or 3) solely supervise clinical trainees.
Leaders: We approached individuals involved in the Canadian HPE research and assessment community. To build our list of potential participants, we purposefully included researchers recognized for their contributions to assessment or validity and individuals with governance roles in assessment in HPE. We considered individuals across the professional spectrum (e.g., junior to senior researchers) representing various domains of work in HPE (e.g., licensure programs, membership on assessment policy committees, undergraduate, postgraduate), and geographic distribution. Snowball sampling was used to expand our list of potential participants. Participants in our sample had published an average of 100 publications relevant to HPE and hold or have held leadership positions within HPE or within their specific research domain. There were no exclusion criteria.
Data collection
Preparatory documents: Prior to the interview, all participants received two documents: 1) a two-page summary of three conceptualizations of validity in HPE, including a description of validity as a social imperative through the lens of a discourse analysis (Appendix A)8; and 2) a two-page summary of the results from the concept analysis describing the antecedents, characteristics, and consequences of the concept of validity as a social imperative (Appendix B).7 Participants were expected to review these documents before the interview.
Sociodemographic questionnaire: Participants completed a short sociodemographic questionnaire asking: their age group, gender, and initial training and experience in the field of assessment and validity.
Focus group and semi-structured interview guides: The focus group and semi-structured interview guides (see Appendix C for users, and Appendix D for leaders) was comprised of key open-ended questions, using probes as needed, and adapted as relevant for the different stakeholder groups.17,18 Participants were asked to describe their views concerning the acceptability and feasibility to the characteristics of validity as a social imperative and share their opinion on the concept of validity as a social imperative, and any further comments.
Procedure
Users: We emailed educational program directors and asked them to circulate a short description and an invitation to participate in the study to relevant individuals. Interested individuals contacted the research team to obtain a copy of the consent form and preparatory documents. MM collected data through in-person focus groups and semi-structured individual interviews. Focus groups and individual interviews lasted 60 to 90 minutes. All interviews were recorded and transcribed.
Leaders: We directly contacted participants by email. Those who expressed interest received additional information by email including a consent form, a sociodemographic questionnaire, and the above-mentioned preparatory documents. Two experienced research professionals (KD, LA) conducted semi-structured individual phone interviews. Interviews lasted 60 to 90 minutes, were recorded and transcribed. Interviewers received coaching (from MM) throughout the study to ensure data collection quality. We used interviewers outside the research team for participants to feel as though they could speak freely and critically.
Data Analysis
Descriptive statistics were used to summarize participant characteristics. Results from each stakeholder group were analyzed separately. Two team members (MM, FG) carried out the qualitative analysis and all team members discussed the interpretation until a consensus was reached.18 Data organization and management were facilitated by Dedoose.19 The analysis was guided by the three concurrent analysis cycles in Miles et al.’s method.20
Data condensation: Our study was informed by Sidani and Braden's21 definitions of acceptability (perception of the concept regarding relevance, convenience, effectiveness, and risks associated with the concept or the adherence of the concept) and feasibility (the possible application of the concept into practice). Through inductive process, additional codes were added throughout analysis.18,20
Data display: We explored participants’ perceived acceptability and anticipated feasibility for the four characteristics of validity as a social imperative individually (described in Table 1). We analyzed data from stakeholders independently and then, we compared the two data sets. Conceptually clustered matrices helped to organize and visualize data to draw conclusions20 and make visible discrepancies, similarities and relations, whether between the different characteristics of the concept or between the stakeholder groups.
Drawing and verifying conclusions: Regular team discussion and reflection throughout the analysis process helped to enhance the interpretation of the results. After each leader interview, the research team synthesized main themes in the transcript, identified exemplary quotes, and shared this summary with participants for review. Leaders were given the opportunity to expand, adapt, or suggest modifications to the summary as needed via email.22,23 Only one participant requested adjustments to the summary to provide more context supporting an exemplary quote.
Results
Twenty-three users and seven leaders participated. We conducted five focus groups (n = 19 participants; 3-8 participants per group) and four individual interviews with users. All leaders (n = 7) participated in individual interviews. Therefore, our data set was generated by a total of five focus groups and 11 individual interviews. Participant characteristics are presented in Table 3.
Table 3.
Sociodemographic characteristics of participants
Characteristics | Users (n = 23) | Leaders (n = 7) |
---|---|---|
Gender n (%) | ||
Men | 12 (52%) | 5 (71%) |
Women | 11 (48%) | 2 (29%) |
Initial training n (%) | ||
Medicine | 7 (30%) | 3 (43%) |
Nursing | 8 (35%) | - |
Physiotherapy | 2 (9%) | - |
Occupational Therapy | 2 (9%) | - |
Education | 4 (17%) | 1 (14%) |
Others (e.g., psychology) | 4 (17%) | 3 (43%) |
We noticed that educators and assessment committee members tended to move between discussing issues of assessment and issues of validity throughout the interviews–demonstrating the interdependence of the two concepts. To synthesize our findings across our datasets, we report our findings by summarizing the similarities and differences across participant groups in their views of the concept of validity as a social imperative (Table 4).
Table 4.
Users and leaders’ perspectives
Description | Themes |
---|---|
Similarities across stakeholder groups | Relevance of the concept in the current context |
Required clarification of terms used to describe validity as a social imperative | |
Similarities and differences with modern theories of validity and validity as a social imperative | |
Challenges in the application of the concept for practice | |
Differences between the two stakeholder groups | Differing conceptualizations of the importance of assessment and validity |
Society as a driving force to achieve different ends |
Similarities across stakeholder groups
Participants shared similar views regarding: 1) the relevance of the concept in the current context; 2) the need to clarify some terms used to describe validity as a social imperative; 3) the similarities and differences with modern theories of validity and validity as a social imperative, and 4) the challenges related to the application of the concept for practice.
Relevance of the concept in the current context
Participants suggested that within the concept of validity as a social imperative there is an opportunity to “integrating it [validity] as part of an entire program” (L7). As such they highlighted the importance of considering assessment results generated through programmatic assessment as suggested by this participant:
The modern validity or traditional validity, you can focus on just an exam itself and just a rating scale itself … as a part (…) thinking about the whole and what the whole means, that’s a new thing. (L7)i
Participants suggested that the uptake or consideration for validation practices built on qualitative data is “taking it a step further in saying those forms of evidence can be quantitative and qualitative, and I think that’s right, absolutely right.” (L3) They indicated integration and formal consideration of qualitative methods, alone or in combination with quantitative methods, is an important dimension of validity as a social imperative and reflects current and emerging validation practices in HPE.
For users, the current implementation of competency-by-design in training programs in Canada could benefit from the application of the concept of validity as a social imperative. More specifically, participants (FoG1, FoG4, FoG5) emphasized the coherence of the concept with the changes made in the various training programs such as a programmatic approach or the use of Entrustable Professional Activity in medical education.
Required clarification of terms used to describe validity as a social imperative
Participants frequently queried the definition of “society” or “social imperative” and requested clarification regarding the intended meaning, suggesting that the choice of words could help or hinder the interpretation of social imperative, as illustrated by this participant:
For me the concept of society, it is not clear. Because is society my mother, should I prove to my mother that we are finally certifying good physiotherapy students? (FoG3)
Having varied understandings of the intended interpretation of ‘society’ could lead to different uses or application of the concept. For example, “If we’re using evidence that is considered credible to society, then validity will become very very challenging to define” (L1).
The combination of assessment findings (third characteristic) was understood in two different ways. Firstly, as the global interpretation of different assessments instances where “integrating doesn’t mean just adding up scores on these separate tests” (L3). Secondly, as the global interpretation resulting from more than one person (groups of people making decisions). For example, in “the clinical environment, when you can’t make a judgment as an individual practitioner, you go out and you make a collective decision, because it’s more complex.” (L4).
Participant comments focused on aspects associated with combining assessment data (rather than the combination of validation practices) to obtain a more defensible judgment of a learner's knowledge, attitude, skills, or competencies.
Similarities and differences with modern theories of validity and validity as a social imperative
Participants questioned the notion that the concept of validity as a social imperative was new or emerging, stating that “… both of those [Messick and Kane’s24–26 theories of validity] take into account consequential validity and sort of impact on society.” (P2). From the participants’ point of view, these similarities between modern theories and the concept of validity as a social responsibility may create issues for the acceptability of it as a ‘new’ concept; but suggest perhaps it may be a new operationalization of some key features of preexisting validity approaches.
Leaders agreed that validation is an ongoing process throughout the development and validation stages. For two participants (P3 and P6), validation embedded through the assessment process was to be expected, and one participant (P4) felt that validation built into the assessment process (second characteristic) was coherent with purposefully programmatic assessment and contributes to credibility and defensibility of the validation process.
In addition to similarities, participants identified strengths which distinguish the concept of validity as a social imperative from other conceptualizations of validity. For example, they found the social role of the validity more explicit (FoG2, FoG5, E5) and they highlighted the importance given to anticipation of consequences (FoG5).
Challenges in the application of the concept for practice
Participants anticipated several potential barriers and facilitators associated with the feasibility of the concept of validity as a social imperative, with particular focus on the operationalization, implementation, and uptake of the concept.
Some participants were uncertain that the concept of validity as a social imperative would facilitate or compound the work expected of comprehensive validation practices, stating that “people might look at this and say this is just adding a whole other layer of complexity and I don’t see the benefit.” (L5)
A leader (L6) expressed doubts about the ability and willingness of medical educators and assessment administrators to put in the time and effort to complete a validation process that aligned with the concept of validity as a social imperative. Educators listed several challenges concerning the limits of their own context such as implicit or explicit institutional values. More specifically, the anticipated costs, time and effort associated with the implementation of the conceptualization of validity as a social imperative, although currently unknown, was assumed to be a potential limit to its applicability.
It's a bit of a creation of work. They [assessors] kind of want to buy something and just use it and so there is an implication here of time, cost, money, people’s time. So if you got too carried away with this, that could be an extra burden. A school’s education programs doesn’t have a lot of time or money (L3).
It was difficult for participants to explicitly name what they felt would be needed to apply the concept because they are stuck between “the responsibility towards the population” and having “to survive as faculty members”. (FoG5)
Differences between the two stakeholder groups
There were two areas in which the stakeholder groups appeared to be using lenses that foreground different dimensions of the same underlying themes. Specifically, both groups discussed 1) differing conceptualizations of the importance of assessment and validity and 2) society as a driving force to achieve different ends.
Differing conceptualizations of the importance of assessment and validity
Users expressed that, through taking part in the study, they were given the opportunity to reflect on the social responsibility of a professor. Interviewees described their awareness of the impact of assessment on the learner and society:
I've never heard of that, but now that I've heard about it, I wonder, 'God, why did it take so long before thinking about this, it's so obvious that we need to do things [assessment] that are valid, it has consequences for society’ (FoG4)
Leaders reported appreciating that within the concept of validity as a social imperative there is a consideration for assessors’ and administrators’ social responsibility regarding all components of an assessment and validation process.
So it [validity as a social imperative] sits well with me because it’s saying you have a social responsibility to make a compelling, a judicious argument, about validity. The final product but also there’s a process of assessment (L5).
Society as a driving force to achieve different ends
Users expressed the importance of the quality of the assessment to accurately judge the performance of the future health professionals and thus ensure public safety. They clearly stated the value that society places on university degrees that serve as proof of the competence of the professionals, and therefore their role in ensuring that a university degree does reflect competence.
Some leaders considered that first characteristic of the concept (validity evidence seen as credible by society) to be the ‘added value’ of this emerging concept. They discussed that the role of a program or a high-stake exam is to certify the competence of the future health professional – a decision that has a large impact on society.
So I think it’s the relevance in terms of (…) some method that can stand up and tell society, look: We believe in these individuals that they are competent, that we are there, like we have that belief in them (L4).
Discussion
The evolution of validity as a social imperative stems from a pragmatic need—the difficulties of using existing validation practices in the current context of assessment in HPE.27–31 Participants recognized these difficulties and acknowledged the need to bridge the gap between current assessment practices and existing validity theories, and that validity as a social imperative may be an acceptable means to bridge that gap. However, participants asked for clarification of some aspects in the concept of validity as a social imperative (acceptability and feasibility), and anticipated challenges in applying the concept (feasibility).
In our results, two key elements seem to generate hesitation regarding the acceptability (uptake by the community) and feasibility (anticipated applicability) of the concept of validity as a social imperative. First, participants expressed the need to better understand the similarities and differences between validity as a social imperative and other conceptualizations of validity. Within the concept of validity as a social imperative, Marceau et al.7 have made the social considerations linked to validity explicit, which had only been implicitly put forward by various validity theorists, such as Mislevy,32 Messick,24,25 and Kane.26 While the “social” is present in the original writings of these authors, it does not appear to have been explicitly described as these theories have imported into HPE. More specifically, Messick initially described the need to document consequential evidences during a validation process in the Unified theory of validity.24,25 Kane26,33-in the Argument-based validation framework-argued that consequences of assessment are important to the validation process to support inferences. Recently, Cook and Lineberry34 suggested that consequential validity is the most important evidence to consider in a validation process. Participants asked how validity as a social imperative is different—or adds to—existing conceptualizations of validity. We argue that validity as a social imperative is distinguished by the necessity to anticipate consequences throughout the whole assessment process and not only measure consequences a posteriori, extending traditional descriptions of consequences within validity frameworks. In validity as a social imperative we re-emphasize the importance of consequences of assessment, which are too often neglected in validation processes; as supported by Cook et al.35 and Labbé et al.36 who found that few studies measured response processes or consequences of the assessment scores in the context of assessment in different HPE contexts. Furthermore, the concept of validity as a social imperative specifically encourages consideration for the impact of assessment and score interpretation on individual students and on society as a whole, and as such can be interwoven with other validity frameworks. This explicit consideration for individual students and society makes validity as a social imperative well situated for many newer assessment practices and it can be used to focus how we go about collecting validity evidence.
A second key consideration brought forward by participants was the need to further define “social” and “society” in the concept of validity as a social imperative. A consideration for society in assessment and validity comes with several challenges–the first of which being the way in which we define ‘society’. Cook and Lineberry34 consider all those affected by the consequences of validity, including “learners, educators, and educational institutions; patients, providers, and health care institutions; and even society at large.”34(p788) For our participants, the notion of ‘society’ in validity as a social imperative was under-specified. This under-specification may permit local contextual factors to drive the identification of the relevant consideration of ‘society’–for local leaders and users to craft a meaningful contextualization of ‘society’ for their unique context. While this may contribute to a non-uniform operationalization of validity as a social imperative across contexts, this is not necessarily a bad thing–varied interpretations of society could be beneficial. To support transparency, we recommend that the relevant “society” be articulated and justified when conducting a validation study.
To better understand common uses of the term ‘social,’ we examined literature describing social accountability to explore how ‘social’ is typically defined or understood. The World Health Organization37 has specified that medical schools are accountable to health consumers, health authorities and graduates. Hanson et al.38 describe social responsibility as directed towards patients, families and society as a whole. The Royal College of Physicians and Surgeons of Canada39 refers to the accountability of the physicians to patients, communities, and broader populations they serve. Communities' and society's needs evolve over time,40 which necessitates evolutions in health care delivery, and consequently in health professional education. The needs of “society” as reflected in social accountability are constantly changing. The word society is flexibly defined in discussions of social accountability, it is not surprising that there was a varied understanding of “social” for our participants when discussing validity as a social imperative.
Both groups of stakeholders anticipated challenges concerning the application of the concept into practice (e.g., time, effort, and cost). This uncertainty regarding resource commitment may be because the concept of validity as a social imperative has not yet been formally operationalized for use in the way that the Standards41 operationalized Messick’s25 validity theory, and remains an important next step. The challenges highlighted by our participants are consistent with Sidani and Braden's21 framework concerning the feasibility of an intervention (i.e. application of the concept in our study). Indeed, the factors influencing the feasibility are, for instance, the context (including the physical and social environment), the resources available, and the training of the individuals involved.21 Our findings also mirror those of Onyura et al.42 who identified factors hindering knowledge translation in medical education, with common barriers being the fear of work, role overload, and financial and human resource limitations.
Strengths and limitations
Through purposeful and snowball sampling, we interviewed participants from various backgrounds, organizations, and programs, which contribute to diversity in our data. Triangulation from different perspectives, an inductive approach and the open-ended questions provided us with rich and varied views from a wide variety of participants. In addition, we relied on Sidani and Braden’s21 framework to explore acceptability and feasibility of the concept. Adopting an interpretive description approach10 provided a rigorous approach for analyzing, synthesizing and transforming data. Co-coding by two team members10,18,43 and verification of conclusions with leaders and research team23,43 ensured both credibility and confirmability. Thick and rich description of the research process and the sample provided in the article enables readers to establish the transferability to their context.10 Meetings between the members of the research team gave the opportunity to discuss the influence of experiences, values, and beliefs on findings (reflexivity)23 and thus better focus on the interpretation of the participant’s perspectives.
This study has limitations. The transferability of our results may be limited by the Canadian perspective embedded in our study design. However, we believe our study represents a broad sample of educators, committee members and leaders who work to improve the quality of assessment practices in a similar context of implementing. and enacting competency-based assessment (encompassing narrative assessment, rater- and workplace- based assessment, and programmatic assessment). Furthermore, we interviewed individuals from different professions, disciplines, universities, and backgrounds. Most of the participants work in medicine and nursing, and it may have been beneficial to have deliberately approached participants involved in other HPE programs. The research team are the same individuals who described the concept of validity as a social imperative in a previous concept analysis, and this may induce some researcher bias. For the leaders, we attempted to minimize this bias by relying on interviewers who did not know the participants, nor were involved in the original research work describing validity as a social imperative. Further, some participants were critical about validity as a social imperative, suggesting social desirability bias did not significantly limit critique of the concept under study.
Implications
The concept of validity as a social imperative expands our understanding of validation practices that consider the impacts of assessment and validity on society and on future health professionals. Furthering our understanding of how to engage with validity practices adapted to our current assessment approaches to improve the quality of assessment is relevant for several actors including learners, educators, and patients. We believe that the findings from this study will allow us to begin to operationalize the concept for use. A broader exploration within the international HPE community is a good avenue to enhance the acceptability and the feasibility of the concept. Identifying potential or perceived barriers to implementation before finalizing the operationalization of the concept of validity as a social imperative enables us to mitigate perceived difficulties and ensure the concept is accessible to users and leaders. Demonstrations of utility of the concept are dependent on the concept being refined, translated for use, and careful documentations of how the concept has been implemented.
Conclusion
The concept of validity as a social imperative appears to resonate with stakeholders’—users and leaders in HPE—and reinforces discussions in the literature regarding moving beyond the traditional focus on psychometrics for validation practices reported in HPE. Participants identified several avenues for further refinement of the concept, and these remain important areas for future research; specifically the operationalization of validity as a social imperative for use in assessment validation in HPE.
Acknowledgments
The authors would like to thank Kathleen Day and Lesley Ananny, Faculty of Medicine, University of Ottawa, Ontario, Canada, who conducted the interviews, and the participants for their generosity and availability.
Appendix A
Summary of three conceptualizations of validity in HPE8
Validity is one of the most debated constructs in our field; debates abound about what is legitimate and what is not, and the word continues to be used in ways that are explicitly disavowed by current practice guidelines. The resultant tensions have not been well characterized, yet their existence suggests that different uses may maintain some value for the user that needs to be better understood. We conducted an empirical form of Discourse Analysis to document the multiple ways in which validity is described, understood, and used in the health professions education field. We created and analyzed an archive of texts identified from multiple sources, including formal databases such as PubMED, ERIC and PsycINFO as well as the authors’ personal assessment libraries. An iterative analytic process was used to identify, discuss, and characterize emerging discourses about validity.
Three discourses of validity were identified. Validity as a test characteristic is underpinned by the notion that validity is an intrinsic property of a tool and could, therefore, be seen as content and context independent. Validity as an argument-based evidentiary-chain emphasizes the importance of supporting the interpretation of assessment results with ongoing analysis such that validity does not belong to the tool/instrument itself. The emphasis is on process-based validation (emphasizing the journey instead of the goal). Validity, as a social imperative, foregrounds the consequences of assessment at the individual and societal levels, be they positive or negative. The existence of different discourses may explain – in part – results observed in recent systematic reviews that highlighted discrepancies and tensions between recommendations for practice and the validation practices that are actually adopted and reported. Some of these practices, despite contravening accepted validation ‘guidelines’, may nevertheless respond to different and somewhat unarticulated needs within health professional education.
Summary of the three discourses
Validity as a Test characteristic | Validity as an argument-based evidentiary-chain | Validity as a Social imperative | |
---|---|---|---|
Definition | The degree to which the test actually measures what it purports to measure. | The evidences presented to support or refute the meaning or interpretation assigned to assessment results. | A bird’s eye view of assessment that foregrounds broader individual and societal issues |
Characteristics | Validity is a goal or a gold seal of approval. | Validity is a journey on which one embarks to provide evidence supporting the interpretation of scores. | Validity and validation are matters of social accountability. |
Validity is viewed as… | Static | Fluid | Built-in |
Focus of evidence is on… | Individual tools can be considered valid, and the validity can generalize to the tool format (« MCQs are valid ») | Defensible interpretation of scores | Individual and societal impact of assessment |
Things made possible | The quest for the holy grail of assessment; one tool that is more valid than the others. | Validation approaches and standards | Holistic and a priori consideration for societal impact of assessment |
Validation occurs… | A posteriori (mainly) | A priori (mainly) | |
Validation data focused on… | Psychometric | Mostly psychometric | Mostly expert judgment |
Appendix B
Summary of the results from the concept analysis describing the antecedents, characteristics, and consequences of the concept of validity as a social imperative
Context
The adoption of the competency-based education in the health professions education has been the catalyst for several changes in assessment, such as the use of assessment that mimics professional practice (authentic assessment) or the purposeful combination of different assessment strategies, contexts, and measurement times (programmatic assessment). Since assessment scores can have far-reaching consequences on future healthcare professionals and on society, it is essential to measure the quality (validity) of assessment strategies.44 However, there is a gap between the validation practices currently available to us and the current assessment approaches.44 This gap prompted reflections within the medical education literature, which were summarized in a recent discourse analysis.8 One of the results of this discourse analysis is a new conceptualization of validity: validity as a social imperative. Here, we explore this concept to describe it in the context of assessment in health professions education (HPE).
Concept analysis
The first phase of this research was to explore validity as a social imperative using a concept analysis, according to the Rodgers’ method.45 The concept analysis is a useful method for identifying, clarifying, and fine-tuning an unexplored concept.45,46 Rodgers’ framework46 focuses on the reliance on literature to describe the antecedents, attributes, and consequents surrounding a concept. More concretely, attributes are the characteristics that define the concept.46 Antecedents are what precedes the concept and the consequences are what happens as a direct result of the concept.46
Results of the concept analysis
Now, it's your turn!
Since our current understanding of this conceptualization is mainly based on an in-depth analysis of the literature, your participation
in this interview will allow us to explore the acceptability of the characteristics of the concept of validity as a social imperative.
Attributes (characteristics) | Description |
---|---|
Demonstration of the use of evidence considered credible by society to document the quality of assessments | The various professional bodies (teaching institutions and professional orders) must be able to document with certainty—for society—the decisions made regarding the learners’ academic pathways and their level of competency for starting a professional career independently and competently. For example: A university is accountable to society for the decisions made based on the assessment of learners. “When students graduate from a university, the degree indicates to society that the graduates have a certain level of skill and expertise.”47 |
Validation embedded through the assessment process and score interpretation | When constructing an assessment program, elements which compose it must be chosen purposefully.5 We should carefully consider how validity can be ‘built-in’ to the assessment process during its the development.48 This consideration for validity throughout the assessment development process increases the credibility, defensibility and accuracy of the score interpretation.27 Ebel49 also argued that validity can be a ‘built-in’ feature of an assessment method. We take the view that all assessment at the three bottom layers of Miller’s pyramid can be controlled and optimized: materials can be scrutinized, stakeholders prepared, administration procedures standardized, psychometric procedures put in place, etc.48 Another element to be considered during the development of the assessment program is the consequences of the assessment process and subsequent score interpretation. The person responsible for the assessments must anticipate the potential consequences and implement measures or strategies to minimize them. The consequences measured should not be solely limited to the impacts of the construct, but rather all possible consequences. “(… ) the measurement or scoring procedure (e.g., irrelevant, unreliable, or omitted test items); the specific interpretation (e.g., an inappropriate pass/fail cut point); the attribute being measured (i.e., the wrong construct); or the response (e.g., the actions that follow the decision).” 34 |
Documented validity evidence supporting the interpretation of the combination of assessment findings | The interpretation of assessment scores must be done from the perspective of a “whole” (the assessment program in its entirety) that is greater than the sum of its parts. “The central key is that the programme of assessment is set up to allow the whole picture of a student’s competence to be obtained by a careful selection of assessment methods, formulation of rules and regulations and design of organizational systems.”4 |
Demonstration of a justified use of a variety of evidence (quantitative and qualitative) to document the quality of all assessment strategies. | Since traditional quantitative analysis (e.g., Cronbach’s alpha, psychometric analysis, etc.) are often lacking applicability for demonstrating the quality of a set of assessment strategies (i.e., assessment program), the combination of quantitative and qualitative evidences appears to be a solution to be considered. For qualitative assessments, the synthesis of individual pieces of qualitative data to form an insightful, accurate and defensible interpretation is analogous to quantitative generalization. Whereas we treat inter-rater variability as error for most numeric scores, in qualitative assessments we view observer variability as representing potentially valuable insights into performance (different perspectives). The method for selecting and synthesizing data from different sources (triangulation) and deciding when to stop (saturation) will inform the Generalization inference for qualitative data.50 |
Appendix C
Focus group and semi-structured individual interview guide used for users in a qualitative study to explore perceived acceptability and anticipated feasibility of the concept of validity as a social imperative, 2016
1. | What is your name and what is your role at the University of <…> |
2. | Would you tell me about your vision of the validation process (process that measure quality of assessment) in your course or your program? Probes: Can you explain your point of view? What brings you to ...? Can you give me an example? |
3. | What do you think of the characteristics that we identified to describe validity as social imperative? (Discuss one by one) Probes: Can you explain your point of view? What brings you to ...? Can you give me an example? What do you like about this characteristic?” What don’t you like about this characteristic? |
4. | Overall, what is your personal opinion of validity as social imperative? Probes: Can you explain your point of view? What brings you to ...? Can you give me an example? What do you like about the concept?” What don’t you like about the concept? |
Appendix D
Semi-structured individual interview guide used for leaders in a qualitative study to explore perceived acceptability and anticipated feasibility of the concept of validity as a social imperative, 2017
1. | Can you tell me what motivated you to participate in this study? |
2. | Would you tell me what validity in the context of assessment means to you? Probes: Can you explain your point of view? What brings you to ...? Can you give me an example? |
3. | What do you think of the characteristics that we identified to describe validity as social imperative? (Discuss one by one) Probes: Can you explain your point of view? What brings you to ...? Can you give me an example? What do you like about this characteristic?” What don’t you like about this characteristic? |
4. | Overall, what is your personal opinion of validity as social imperative? Probes: Can you explain your point of view? What brings you to ...? Can you give me an example? What do you like about the concept?” What don’t you like about the concept? |
5. | Do you feel that we have missed important aspects of validity as a social imperative? Probes: Do you have anything else you would like to add? Something else we should consider as we move this work forward? What else can you say about that? Can you explain your point of view? Can you build on that? |
Footnotes
In the text and quotes, participants are identified by the following legend:
P: Interviews with users (educators and committee members)
FoG: Focus group with users (educators and committee members)
L: Interviews with leaders
Conflicts of interest
The authors declare that they have no conflict of interest.
Funding
This study was supported by the MEES-Universités programme (in the form of a scholarship obtained by MM), the Social Sciences and Humanities Research Council (in the form of a grant provided to CSO and MY no. 435- 2014-2159) and the Paul Grand’Maison de la Société des médecins Research Chair in Medical Education (in the form of a scholarship provided by CSO).
Author contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Mélanie Marceau, Christina St-Onge, Frances Gallagher and Meredith Young. The first draft of the manuscript was written by Mélanie Marceau and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
References
- 1.Cook DA, Kuper A, Hatala R, Ginsburg S. When assessment data are words: validity evidence for qualitative educational assessments. Acad Med. 2016;91(10):1359–69. 10.1097/ACM.0000000000001175 [DOI] [PubMed] [Google Scholar]
- 2.Govaerts MJB, Schuwirth LWT, Van der Vleuten CPM, Muijtjens AMM. Workplace-based assessment: effects of rater expertise. Adv Health Sci Educ. 2011;16(2):151–65. 10.1007/s10459-010-9250-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Govaerts M, van der Vleuten CPM. Validity in work-based assessment: expanding our horizons. Med Educ. 2013;47(12):1164–74. 10.1111/medu.12289 [DOI] [PubMed] [Google Scholar]
- 4.Schuwirth LW, Van der Vleuten CP. Programmatic assessment: from assessment of learning to assessment for learning. Med Teach. 2011;33(6):478–85. 10.3109/0142159X.2011.565828 [DOI] [PubMed] [Google Scholar]
- 5.van der Vleuten CPM, Schuwirth LWT, Driessen EW, et al. A model for programmatic assessment fit for purpose. Med Teach. 2012;34(3):205–14. 10.3109/0142159X.2012.652239 [DOI] [PubMed] [Google Scholar]
- 6.Brown KK, Maryman J, Collins T. An evaluation of a competency-based public health training program for public health professionals in Kansas. J Public Health Manag Pract. 2017;23(5):447–53. 10.1097/PHH.0000000000000513 [DOI] [PubMed] [Google Scholar]
- 7.Marceau M, Gallagher F, Young M, St-Onge C. Validity as a social imperative for assessment in health professions education: a concept analysis. Med Educ. 2018;52(6):641–53. 10.1111/medu.13574 [DOI] [PubMed] [Google Scholar]
- 8.St-Onge C, Young M, Eva KW, Hodges B. Validity: one word with a plurality of meanings. Adv Health Sci Educ. 2017;22(4):853–67. doi: 10.1007/s10459-016-9716-3 [DOI] [PubMed] [Google Scholar]
- 9.Hunt MR. Strengths and challenges in the use of interpretive description: reflections arising from a study of the moral experience of health professionals in humanitarian work. Qual Health Res. 2009;19(9):1284–92. 10.1177/1049732309344612 [DOI] [PubMed] [Google Scholar]
- 10.Thorne SE. Interpretive description: qualitative research for applied practice. New York, NY: Routledge; 2016. [Google Scholar]
- 11.Thorne S, Kirkham SR, O’Flynn-Magee K. The analytic challenge in interpretive description. Int J Qual Methods. 2004;3(1):1–11. 10.1177/160940690400300101 [DOI] [Google Scholar]
- 12.Creswell JW, Creswell JD. Research design: qualitative, quantitative, and mixed methods approaches. 5th ed. Los Angeles, CA: SAGE; 2018. [Google Scholar]
- 13.Morse JM, Niehaus L, Wolfe RR, Wilkins S. The role of the theoretical drive in maintaining validity in mixed-method research. Qual Res Psychol. 2006;3(4):279–91. [Google Scholar]
- 14.Krueger RA, Casey MA. Focus groups: a practical guide for applied research. 5th ed. Thousand Oaks, CA: SAGE Publications; 2015. [Google Scholar]
- 15.Young M, St-Onge C, Xiao J, Vachon Lachiver E, Torabi N. Characterizing the literature on validity and assessment in medical education: a bibliometric study. Perspect Med Educ. 2018;7(3):182–91. 10.1007/s40037-018-0433-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Doja A, Horsley T, Sampson M. Productivity in medical education research: an examination of countries of origin. BMC Med Educ. 2014;14(1):1–9. 10.1186/s12909-014-0243-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Brinkmann S, Kvale S. Interviews: learning the craft of qualitative research interviewing. 3rd ed. Thousand Oaks, CA: SAGE; 2015. [Google Scholar]
- 18.Patton MQ. Qualitative research & evaluation methods: integrating theory and practice. 4th ed. Thousand Oaks, CA: SAGE Publications; 2015. [Google Scholar]
- 19.Dedoose Version 8.2 . Web application for managing, analyzing, and presenting qualitative and mixed method research data [Internet]. Los Angeles, CA: SocioCultural Research Consultants, LLC; 2018. Available from: www.dedoose.com [Google Scholar]
- 20.Miles MB, Huberman AM, Saldana J. Qualitative data analysis: a methods sourcebook. 3rd ed. Thousand Oaks, CA: SAGE Publications; 2014. [Google Scholar]
- 21.Sidani S, Braden CJ. Design, evaluation, and translation of nursing interventions. Chichester: Wiley-Blackwell; 2011. [Google Scholar]
- 22.Birt L, Scott S, Cavers D, Campbell C, Walter F. Member checking: a tool to enhance trustworthiness or merely a nod to validation? Qual Health Res. 2016;26(13):1802–11. 10.1177/1049732316654870 [DOI] [PubMed] [Google Scholar]
- 23.Varpio L, Ajjawi R, Monrouxe LV, O’Brien BC, Rees CE. Shedding the cobra effect: problematising thematic emergence, triangulation, saturation and member checking. Med Educ. 2017;51(1):40–50. 10.1111/medu.13124 [DOI] [PubMed] [Google Scholar]
- 24.Messick S. Validity. In: Linn RL, editor. Educational measurement. New York, NY: Macmillan; 1989. p. 13–103. [Google Scholar]
- 25.Messick S. Standards of validity and the validity of standards in performance asessment. Educ Meas Issues Pract. 1995;14(4):5–8. 10.1111/j.1745-3992.1995.tb00881.x [DOI] [Google Scholar]
- 26.Kane MT. Validation. In: Brennan RL, editor. Educational Measurement. 4th ed. Westport, CT: American Council on Education/Praeger; 2006. p. 17–64. [Google Scholar]
- 27.Berendonk C, Stalmeijer RE, Schuwirth LWT. Expertise in performance assessment: assessors’ perspectives. Adv Health Sci Educ. 2013;18(4):559–71. 10.1007/s10459-012-9392-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Gingerich A, Kogan J, Yeates P, Govaerts M, Holmboe E. Seeing the. “black box” differently: assessor cognition from three research perspectives. Med Educ. 2014;48(11):1055–68. 10.1111/medu.12546 [DOI] [PubMed] [Google Scholar]
- 29.Hodges B. Assessment in the post-psychometric era: learning to love the subjective and collective. Med Teach. 2013;35(7):564–8. 10.3109/0142159x.2013.789134 [DOI] [PubMed] [Google Scholar]
- 30.Schuwirth LW, van der Vleuten CP. A plea for new psychometric models in educational assessment. Med Educ. 2006;40(4):296–300. 10.1111/j.1365-2929.2006.02405.x [DOI] [PubMed] [Google Scholar]
- 31.Ginsburg S, McIlroy J, Oulanova O, Eva K, Regehr G. Toward authentic clinical evaluation: pitfalls in the pursuit of competency. Acad Med. 2010;85(5):780–6. 10.1097/ACM.0b013e3181d73fb6 [DOI] [PubMed] [Google Scholar]
- 32.Mislevy RJ. Validity by Design. Educ Res. 2007;36(8):463–9. 10.3102/0013189X07311660 [DOI] [Google Scholar]
- 33.Kane M. The argument-based approach to validation. Sch Psychol Rev. 2013;42(4):448–57. 10.1080/02796015.2013.12087465 [DOI] [Google Scholar]
- 34.Cook DA, Lineberry M. Consequences validity evidence: evaluating the impact of educational assessments. Acad Med. 2016;91(6):785–95. 10.1097/ACM.0000000000001114 [DOI] [PubMed] [Google Scholar]
- 35.Cook DA, Zendejas B, Hamstra SJ, Hatala R, Brydges R. What counts as validity evidence? Examples and prevalence in a systematic review of simulation-based assessment. Adv Health Sci Educ. 2014;19(2):233–50. 10.1007/s10459-013-9458-4 [DOI] [PubMed] [Google Scholar]
- 36.Labbé M, Young M, Nguyen LHP. Validity evidence as a key marker of quality of technical skill assessment in OTL-HNS. Laryngoscope. 2018;128(10):2296–300. 10.1002/lary.27085 [DOI] [PubMed] [Google Scholar]
- 37.Boelen C, Heck JE, World Health Organization . Defining and measuring the social accountability of medical schools. Geneva: World Health Organization; 1995. [Google Scholar]
- 38.Hanson JL, Rosenberg AA, Lane JL. Narrative descriptions should replace grades and numerical ratings for clinical performance in medical education in the United States. Front Psychol. 2013;4:668. 10.3389/fpsyg.2013.00668 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Frank JR, Snell L, Sherbino J, editors. CanMEDS 2015 Physician Competency Framework. Ottawa: Royal College of Physicians and Surgeons of Canada; 2015. [Google Scholar]
- 40.Royal College of Physicians and Surgeons of Canada . Competence by Design (CBD) [Internet]. 2014. Available from: https://www.royalcollege.ca/rcsite/documents/educational-strategy-accreditation/royal-college-competency-by-design-ebook-e.pdf
- 41.American Educational Research Association, American Psychological Association, National Council on Measurement in Education, Joint Committee on Standards for Educational and Psychological Testing (U.S.) . Standards for educational and psychological testing. Washington, DC: American Educational Research Association; 1999. [Google Scholar]
- 42.Onyura B, Légaré F, Baker L, et al. Affordances of knowledge translation in medical education: a qualitative exploration of empirical knowledge use among medical educators. Acad Med. 2015;90(4):518–24. 10.1097/ACM.0000000000000590 [DOI] [PubMed] [Google Scholar]
- 43.Morse JM. Critical analysis of strategies for determining rigor in qualitative inquiry. Qual Health Res. 2015;25(9):1212–22. 10.1177/1049732315588501 [DOI] [PubMed] [Google Scholar]
- 44.van der Vleuten CPM, Schuwirth LWT. Assessing professional competence: from methods to programmes. Med Educ. 2005;39(3):309–17. 10.1111/j.1365-2929.2005.02094.x [DOI] [PubMed] [Google Scholar]
- 45.Rodgers BL. Concepts, analysis and the development of nursing knowledge: the evolutionary cycle. J Adv Nurs. 1989;14(4):330–5. 10.1111/j.1365-2648.1989.tb03420.x [DOI] [PubMed] [Google Scholar]
- 46.Rodgers BL, Knafl KA. Concept development in nursing: foundations, techniques, and applications. 2nd ed. Philadelphia: W. B. Saunders; 2000. [Google Scholar]
- 47.Boley P, Whitney K. Grade disputes: considerations for nursing faculty. J Nurs Educ. 2003;42(5):198–203. http://www.ncbi.nlm.nih.gov/pubmed/12769423 [DOI] [PubMed] [Google Scholar]
- 48.van der Vleuten CPM, Schuwirth LWT, Scheele F, Driessen EW, Hodges B. The assessment of professional competence: building blocks for theory development. Best Pract Res Clin Obstet Gynaecol. 2010;24(6):703–19. 10.1016/j.bpobgyn.2010.04.001 [DOI] [PubMed] [Google Scholar]
- 49.Ebel RL. The practical validation of tests of ability. Educ Meas Issues Pract. 1983;2(2):7–10. 10.1111/j.1745-3992.1983.tb00688.x [DOI] [Google Scholar]
- 50.Cook DA, Brydges R, Ginsburg S, Hatala R. A contemporary approach to validity arguments: a practical guide to Kane’s framework. Med Educ. 2015;49(6):560–75. 10.1111/medu.12678 [DOI] [PubMed] [Google Scholar]