Risky or rigorous? Developing trustworthiness criteria for AI‐supported qualitative data analysis

Michelle D Lazarus; Linxuan Zhao; Andrew Gibson; Roberto Martinez‐Maldonado; Georgina C Stephens

doi:10.1002/ase.70125

. 2025 Sep 26;19(2):330–337. doi: 10.1002/ase.70125

Risky or rigorous? Developing trustworthiness criteria for AI‐supported qualitative data analysis

Michelle D Lazarus ^1,^2,^✉, Linxuan Zhao ³, Andrew Gibson ^4,⁵, Roberto Martinez‐Maldonado ³, Georgina C Stephens ^1,^✉

PMCID: PMC12934370 PMID: 41013964

INTRODUCTION

The artificial intelligence (AI) movement, which some link to the 4th industrial revolution, ¹ , ² is infiltrating and impacting many aspects of academia, including teaching and assessment, ³ lab‐based research, and pertinent to this special issue, qualitative research. ⁴ There are examples of AI in anatomy education that are shaping the way we teach and learn, ³ and thus there is interest in how AI could also be used to support the education research process within the fields of anatomy and health professions education.

There are articles debating AI ethics in qualitative research and how to engage both commercially available and proprietary AI at different points in the research process. ⁴ , ⁵ , ⁶ , ⁷ For example, computer scientist Luca Longo ⁴ presents a detailed illustration of the potential roles of AI in the qualitative research process, examining how AI can be integrated across research stages, including participant recruitment, methodological design, data analysis, and dissemination. In this brave new world, one could picture AI bots running an online focus group, utilizing neural networks that dynamically adjust questions to probe deeper based on participant responses. Large language models (LLMs) could then be used to develop a rigorous codebook, classifying themes and patterns with a level of consistency that human analysts might struggle to maintain. ⁸ , ⁹ Simultaneously, a LLM could generate a written analysis, synthesizing and visualizing findings into a polished report. ¹⁰ Rigorous qualitative data analysis is known to take considerable researcher time, so there is considerable interest in how AI can support, augment, and expedite this aspect of the research process. ⁵ , ⁶ But does the hype represented in the literature match the reality in practice?

While AI LLM technologies like GPT‐4, DeepSeek, and Gemini offer promising features to enhance the efficiency and scalability of qualitative data analysis, concerns persist regarding AI's ability to interpret data in a way that is meaningful for humans, potential biases, methodological transparency (including ethical considerations for data use and storage), and propensity for output variability and hallucinations (i.e., incorrect or misleading information presented as facts). ⁴ This means that human expertise remains crucial in qualitative data analysis, playing an essential role in ensuring contextual accuracy, addressing ethical considerations, and providing critical interpretation of AI‐generated outputs. Thus, integrating AI into qualitative research should be approached as a collaborative effort, combining the strengths of AI and human judgment. Ultimately, the goal is to achieve insightful research outputs and a quality approach to the research process—known in qualitative work as rigor or trustworthiness. ¹¹ , ¹² However, as humans shift their roles toward assessing AI outputs rather than generating insights themselves, emerging evidence suggests that this reliance may inadvertently diminish critical thinking in various domains and could negatively impact not only research findings but also their adoption and application into practice. ¹³ With such risk in mind, how can researchers use AI to support qualitative data analysis in a rigorous and trustworthy manner?

We draw on our experiences using AI to support qualitative data analysis (Box 1), combined with the literature on rigorous approaches to qualitative data analysis, to provide recommendations for researchers interested in AI‐supported qualitative data analysis. Our research engaged AI in the form of a reflexive expressions language parser, which identifies linguistic patterns within reflective writing based on an established conceptual framework. ¹⁴ This AI reflexive expressions language parser was developed and refined by one of the co‐authors (AG), specifically for the purpose of identifying reflexive expressions. The parser is not a commercially available resource (see Box 1), thus reducing some of the concerns related to rigor, ethics, and logistics that occur when using for‐profit developed AI in the research process. ¹⁵ Further details on the development of the language parser are previously described. ¹⁴ These reflexive expressions can be positional (e.g., how someone positions themselves within the narrative) or expressive (e.g., how someone describes themselves within the narrative). For instance, a diary entry that stated that the author ‘had done a lot’ would indicate a positional reflexive expression, and if the same diary described how the author felt while engaging in these prior events, the reflexive expression would be coded to an expressive theme.

BOX 1. Case study.

Background

This case study is based on a secondary analysis of reflective diary entries completed by (n = 41) medical students at an Australian medical school undertaking clinical rotations in 2020. Participants were asked to reflect on their experiences of uncertainty and certainty at six timepoints across an academic year, and record these in audio, typed, or handwritten diary entries (n = 230). Primary analysis identified stimuli of uncertainty, ¹⁹ factors influencing or moderating participants' experiences of uncertainty, ²⁰ and how they described responding to uncertainty. ²¹ Based on these analyses, we identified an important role for critical reflection in developing learners' skills for managing uncertainty. ²⁰ , ²¹ These studies did not provide clarity, however, on how best to support learners' critical reflection. For instance, authors GCS and MDL noted that reflections varied in depth, quality, and focus, and thus wanted to more effectively consider how to support learners engaging in high‐quality critical reflection. This led to MDL and GCS reaching out to colleagues in other fields, including critical reflection (AG) and learning analytics (LZ & RM). In doing so, we ‘partnered’ with the AI language parser as well as the statistical approach of ENA to gain a deeper understanding of the patterns in this population's critical reflections.

Research question

What are the patterns in medical students' reflexive expressions within reflective diary entries focused on uncertainty and certainty?

Worldview

Interpretivism, where knowledge and knowing is socially co‐constructed by those experiencing it and may include multiple perspectives.

Study design

The dataset comprised the uncoded diary entries (n = 230) submitted as part of the prior studies. ¹⁹ , ²⁰ , ²¹ An AI model created with machine learning techniques and specialized in detecting reflexive expressions, termed a reflexive expressions language parser, coded these reflections. ¹⁴ ENA was then used to cluster coded diaries into groups according to patterns of co‐occurrence of reflexive expressions. Following ENA clustering, randomly selected diaries were reviewed (n = 15/cluster) by MDL and GCS for reflexive expressions using deductive thematic analysis based on Gibson et al., ¹⁴ with cluster themes identified.

AI reflexive expressions language parser

Reflexive expressions analysis uses a theoretically grounded computational model that identifies reflexive n‐grams (groups of words) associated with eight categories. ¹⁴ Data for its development comprised 13,841 short written reflections on personal experience, written by undergraduate and postgraduate students. N‐grams that were common in the British National Corpus—a repository of hundreds of writing samples ²² —were removed during the modeling process to ensure that remaining n‐grams were particularly characteristic of reflection. The resulting reflexive expressions were topic independent, representing reflective aspects of the text rather than the content of the text. For example, in the text “I've been thinking about my recent prac at XX High School,” the model would represent “I've been thinking about my,” but not “recent prac” or “XX High School.” This feature differentiates reflexive expression analysis from other text analysis techniques, which tend to focus on the content of the text.

Summary of study findings

Three distinct coding patterns were identified among reflective diary clusters: “superficially‐,” “partially‐,” and “deeply‐” reflective. Participants' clustering patterns were static throughout the study, indicating that individuals' reflective depth did not significantly change over time. Reflections focused on uncertainty had a greater frequency of expressive codes across all three clusters, suggesting that asking students to focus on uncertainty in critical reflections may support their engagement in the reflective process.

Benefits of using AI in this study

Pattern recognition across a large, complex dataset.
Efficient application of an existing theoretical framework (i.e., reflexive expressions).
Capacity to quickly adjust and rerun analysis when meaningful clusters were unable to be identified.

Study challenges

The AI language parser lacked contextual understanding. For instance, the language parser had challenges identifying whether emotions in reflective writing were experienced by the participant or related to the participants' observations of others.
Considerable researcher time still required for data analysis.
No clear endpoint when the data analysis was complete, with this still needing to be determined by human interpretation.

We also used epistemic network analysis (ENA), a technological adjunct to this qualitative data analysis, to explore the co‐occurrence of reflexive expression codes across the dataset (https://app.epistemicnetwork.org/login.html). ENA was originally developed to model the patterns of associations among cognitive elements, such as knowledge, skills, and meaning, that characterize the thinking of individuals. ¹⁶ More recently, ENA has been adopted to analyze wider scenarios, where the co‐occurrence of codes is expected to capture more subtle insights compared with the single occurrence of codes in isolation. ¹⁷ This is similar to how qualitative data analysis software can look for co‐occurrence of codes (e.g., cross‐tabulation or matrix analyses). ENA, however, can provide statistical analysis of this co‐occurrence and develop network diagrams to represent these connections across codes, allowing the human researcher to develop a deeper understanding of the patterns of code co‐occurrences. We employed ENA to analyze writing samples, specifically reflexive diary entries, which were coded by the reflexive expressions language parser, to facilitate the researchers' understanding of how students used reflexive expressions when reflecting on experiences of certainty and uncertainty in health professions education (Box 1).

Using this case study, this viewpoint commentary guides readers to consider the potential roles for AI in qualitative research and methods for ensuring rigor in such work. This case study illustrates how AI is challenging us to redefine and reshape what the field considers rigor for qualitative research. Approaches to ensuring rigor vary according to worldview. While many worldviews may lend themselves to AI‐integrated research and ENA, this article focuses on interpretivist worldviews and “Big Q” qualitative research. ¹⁸ For the purposes of this article, we will use the term ‘researchers’ to refer to humans and will identify AI through either AI in general or specific types (e.g., LLMs).

AI: A QUALITATIVE RESEARCH ANALYST OR TOOL?

While the case study (Box 1) led to some valuable insights into critical reflection (publication in process), the study also inspired us to consider some key questions about these methods and tools in the context of qualitative rigor. Typical criteria for qualitative research rigor in the interpretivist paradigm include credibility, dependability, confirmability, transferability, and reflexivity. We explore how AI may relate to qualitative research rigor by considering this case study in more detail in the following sections. Critically, all of these elements need to be considered at the study outset and throughout the research process. A summary of these considerations relevant to researchers, reviewers, and editors is provided in Table 1.

TABLE 1.

Overview of components of rigor in qualitative research, key question addressed in establishing each component, and specific considerations for qualitative data analysis supported by AI.

Component of rigor	Key question addressed	Considerations for qualitative data analysis supported by AI
Credibility	Was the research conducted with integrity, resulting in plausible findings?	Alignment across study epistemology, methodology, methods and research question Potential consultation with participants on the content of researcher‐evaluated AI outputs
Dependability	What details are needed for another researcher to replicate the study?	Details about the AI application, including its training and intended purpose Acknowledgment of AI limitations in the study context
Reflexivity	How have researchers critically examined their positionality and approaches throughout the research process?	Positionality of the researchers, and how this is influenced by the use of AI Details about human and AI interactions throughout the research process
Transferability	How does the study context influence whether results can be applied to populations beyond the study?	Contextual details about how the training of the AI applications relates to populations within and beyond the study Consideration of how theory and conceptual models relate to AI use, modeling and outputs
Confirmability	What links are provided between the primary data and study findings?	Inclusion of AI outputs which demonstrate how primary data is transformed by AI modeling Consideration of supplementary material or online data repositories for data sharing of entire datasets and their analysis

Open in a new tab

Credibility

Credibility considers all elements of the research process holistically and how each element contributes to plausible findings. ¹¹ , ²³ , ²⁴ A core concept that can help establish credibility is internal coherence, or the alignment of philosophy, methodology, and methods of research. ²⁴ Failure to consider internal coherence could result in researchers overlooking or missing details that are required to answer their research questions. The research paradigm for the larger project from which our case study is drawn (Box 1) was interpretivism, which privileges multiple perspectives and viewpoints on a topic. To explore these, we therefore utilized qualitative longitudinal methodology, ²⁵ reflective diary entries as our method, and deductive thematic analysis using reflexive expressions. ¹⁸

When considering credibility in the context of AI‐supported qualitative data analysis, researchers should examine how AI usage supports the internal coherence of the study. While human analysts are adept at identifying semantic content in small quantities of documents, it can be challenging to consistently identify lexical patterns over large numbers of documents. This is where a computational analysis can provide significant assistance. ¹⁴ For our study, we chose AI that could identify different approaches to reflection and assist with identifying patterns within our large and complicated dataset. ¹⁴ Through the lens of credibility, the nature of the dataset (e.g., large and complicated) could mean that AI would enhance credibility as, depending on its programming and training, it may be better equipped to manage this type of data than a human brain. However, reflexivity (see below) will still be a vital component of data analysis.

Further approaches to establishing credibility that are used in specific research methodologies and methods are included in checklists for reporting qualitative research. ²⁶ , ²⁷ Such approaches include audit trails, member checking, and triangulation or crystallization. Audit trails involve researchers documenting the research process from inception to completion, including decisions made during data analysis. ¹¹ In the context of AI‐supported qualitative data analysis, audit trails could include which parts of the analysis were researcher or AI led and, where relevant to the AI used, the prompts researchers used to elicit AI responses. Member checking involves discussing preliminary study findings with participants to explore how their perspectives align with those of the researchers. ²⁸ For studies using AI, member checking could involve the researchers sharing and discussing researcher‐evaluated AI outputs with participants.

Triangulation is a concept drawn from positivist paradigms where multiple ‘experiments’, or data collection methods, are used collectively to support findings. ²⁸ This concept is debated in interpretivist and Big Q qualitative research, with crystallization proposed as an alternative. ²⁸ Rather than confirming findings across different forms of data, crystallization supports using multiple data collection approaches to build a richer, more nuanced understanding of the phenomenon being studied. We considered crystallization in the larger study from which this case study was drawn, wherein participants completed semi‐structured interviews that explored ideas initially expressed in diary entries. ¹⁹ , ²⁰ , ²¹ In the present case study, AI contributed to crystallization as a source of data analysis, which, in combination with the researcher team, may add a level of nuance to the findings. Crystallization would, by contrast, not be supported if AI were running autonomously, as it is the human–AI interactions that promote crystallization of findings. Although further interviews were not completed for the present study, interview approaches that more deeply explore preliminary findings of AI‐supported analyses may be an ideal approach to support the credibility of future research engaging AI for qualitative data analysis.

Dependability

To demonstrate dependability, researchers need to provide sufficient detail in the study methods to enable another researcher to conduct similar research in their own context. This typically includes details about data collection, such as interview protocols, how participants were recruited, the approach to data analysis, etc. In considering dependability for our study (Box 1), we also discussed which details needed to be provided about our use of AI, such as our choice in using the reflexive expressions language parser, awareness of how it was developed and trained, and how the language parser training dataset compared with our study dataset. We decided that the reflexive expressions language parser was suitable to support our research due to the alignment between our narrow and specific research questions and the program capabilities. This reflexive expressions language parser is quite different from commercially available LLMs (e.g., Chat GPT‐4, DeepSeek, etc.) in that the development and training of the parser are clearly described. ¹⁴ The reflexive expressions language parser model was trained using a human‐guided process of word embedding, clustering, human labeling, and supervised machine learning (see Box 1 for more information).

When considering dependability in relation to AI usage in qualitative data analysis, researchers should also consider and detail what the AI “knows.” While researchers have knowledge and knowings based on our personal experiences, including memories, sensations (e.g., taste, vision, tactile, hearing), and contextual clues, AI's knowings are comparatively limited. The knowledge that AI has and can use is based on the tangible (e.g., programmable and accessible data), and any learning the AI undertakes is through its programming—but this has limits compared with biological learning. ²⁹ The reflexive expressions language parser ‘knows’ things because of how it was trained and the data it was trained on (see Box 1). Critically examining what (and how) the AI ‘knows’ and in what ways this shapes the research forms an important part of reflexivity (see next section). In our case study, the limitations of this training were illustrated when the language parser grouped reflexive diaries that included strong emotions. When we, the researchers, reviewed these diaries, however, we noted that the strong emotions conveyed were often observed in others rather than experienced by the students themselves. An example was a participant who described observing other students who were “scared” of speaking during learning activities. Our engagement as researchers was therefore vital to ensure that AI outputs could be refined to best answer our research question.

In contrast to the language parser we utilized, many proprietary AI programs fail to be “explainable.” Explainable AI “shows its work” in a way that end‐users can understand what the AI program is doing, how it is doing it, and who it will impact. ³⁰ In our case, we decided to use AI for a very specific, narrow purpose—to identify linguistic patterns based on an existing conceptual framework, which is explained in previously published work. ¹⁴ By contrast, commercial LLMs, at the time of writing, are often not programmed to identify all the elements required for dependability, creating substantial issues with this aspect of rigor.

Reflexivity

In addition to the details about the AI program, qualitative researchers should also detail how they worked with the support of AI. Reflexivity is the process of critically examining one's beliefs throughout the research process, the details of which are typically summarized in a reflexivity statement within the methods section of a research article. Of all elements of trustworthiness, it is perhaps the one that is most challenging to engage with and detail for readers. Terry & Hayfield ³¹ discuss reflexivity in their article on reflexive thematic analysis in this special issue, noting that positionality is important to reflexivity: “Positionality becomes reflexivity when researchers can develop insights into how they shape the entire research process—from conception, to design, data generation, and critically, data analysis and reporting.” The term ‘positionality’ in this context is where the researcher considers their worldview and how this relates to the research methods and context being studied. For more on positionality in quality research, see Dueñas et al. ³² in this special issue.

This is an interesting idea when working with AI, as AI may influence the positionality of the researchers. In considering positionality, researchers typically need to consider their relationship with the participants: Do researchers have similar experiences to participants? If so, the researchers may be relative ‘insiders,’ but if not, the researchers may state in their reflexivity that they are ‘outsiders’. ³¹ In the case of outsiders, the addition of AI may further ‘color’ the research process as the ‘AI’ is a definitive outsider given its aforementioned lack of contextual knowing. Failure to consider positionality could, for instance, result in researchers deferring to AI outputs as being ‘correct,’ instead of critically examining how outputs align with different perspectives, including those of insiders.

Importantly, interpretivist or “Big Q” qualitative research rejects notions of researcher “bias” (i.e., a source of potential error according to positivist and post‐positivist research worldviews) and embraces researcher subjectivity during data interpretation. ³³ Reflexivity is thus vital to demonstrate the factors that inform researchers' interpretation of the data. In other words, human researchers critically reflect on and transparently communicate what positivist researchers would consider their biases. However, researchers may not be able to identify the biases within the datasets that AI is trained on and how these inform data analysis due to, at present, limited transparency about this. Thus, for qualitative human researchers using AI‐supported qualitative data analysis, critical reflexivity and consideration and identification of potential ‘biases’ influencing AI outputs are increasingly relevant.

Practically, a manuscript should include an overview of the researchers' reflexivity and AI engagement. Due to the typical space limitations of a manuscript, a summary could include a flowchart that outlines the researchers' role in decision‐making and includes points at which AI was integrated into this research process, as well as the researchers' positionality. Such a flow chart would support the ‘human‐in‐the‐loop’ paradigm of technology, ³⁴ or rather the human‐leading‐the‐loop, where ongoing and iterative reflexivity ensures that the positionality of the researchers is not usurped by the technology, which is ultimately a human choice. A flowchart can help demonstrate the research process, which communicates elements of dependability (e.g., similar to the approach of an audit trail). However, reflexivity will need to include further details, such as worldview, perspectives, and experiences (as noted above). Depending on the space limitations of journals, more extensive details on reflexivity could also be included in supplementary materials, in addition to a summary within the methods section of a research article.

Transferability

The concept of transferability considers how qualitative research findings may be relevant in contexts beyond those studied. ¹¹ , ²³ In traditional qualitative research, transferability is enhanced by considering similarities in context, relevant frameworks, conceptual models, and theory that apply to elements of the study. ²³ , ³⁵ The same could be considered when researchers integrate AI into the research process. For the case study, the AI reflexive expressions language parser draws on existing theoretical frameworks for reflection developed from an extensive decade‐long set of human‐led studies. ³⁶ , ³⁷ , ³⁸ In this way, we—the case study researchers— have considered transferability in the selection of the AI‐collaboration tool. Herein, we chose AI that was purpose‐built from data and analysis related to the theoretical underpinnings of the research question, and a closed system that only has access to the datasets it was programmed on and those we explored. ¹⁴

Typically, discussions of transferability also reference study context and potential similarities with other settings and populations. For instance, the case study included medical student participants in Australia and their experiences of uncertainty. Due to the ubiquity of uncertainty in healthcare, findings may be transferable to other health professions learners. Thus, this case study illustrates how theory and conceptual frameworks can be used to enhance transferability in both the selection of the AI and the applicability of the findings.

Confirmability

Confirmability relies on researchers demonstrating links between the primary data and study findings. In qualitative research in general, this is typically demonstrated by researchers sharing data extracts, such as illustrative quotations assigned to themes, in their dissemination outputs. In some instances, larger parts or even whole datasets may be shared (e.g., with participant permission, deidentified as per ethical approval) as part of online data repositories or supplementary materials. Within the context of AI engagement for data analysis, confirmability may be further enhanced by providing outputs such as coded data. In our case study, the language parser provides a fully coded dataset, including colored highlighting and references to the coding framework (Figure 1), samples of which could form supplementary material to support confirmability.

Example of a reflective diary entry coded by the reflexive expressions language parser. Colored portions of text represent the reflexive n‐grams (i.e., groups of words) identified by the parser. The categories of reflexive n‐grams identified in this diary entry are AF = Affective, CN = Contending, EP = Epistemic, ER = Egoreflexive, RR = Retroreflexive, VR = Vertoreflexive. See Gibson et al. ¹⁴ for definitional details.

CONCLUSIONS

While much of the for‐profit AI company hype signals emerging technologies as research saviors for their ‘efficiency’ and ‘bias‐free’ approach in the qualitative research process, this assumption can result in the violation of core tenets of qualitative research rigor if elements of rigor are not intentionally considered in the research process. Rather, qualitative researchers are encouraged to consider how existing concepts of rigor need to evolve to address the engagement and selection of AI. Indeed, AI may be the trigger we need to more deeply consider the elements of rigor in qualitative research more broadly, and the critical role that humans (and our knowing) play in the qualitative research process.

AUTHOR CONTRIBUTIONS

Michelle D. Lazarus: Conceptualization; formal analysis; methodology; project administration; validation; visualization; writing – original draft; writing – review and editing. Linxuan Zhao: Data curation; formal analysis; methodology; visualization; writing – review and editing. Andrew Gibson: Software; methodology; writing – review and editing. Roberto Martinez‐Maldonado: Methodology; writing – review and editing. Georgina C. Stephens: Conceptualization; data curation; formal analysis; investigation; methodology; project administration; validation; visualization; writing – review and editing; writing – original draft.

ACKNOWLEDGMENTS

The authors acknowledge the people of the Kulin Nations and the Turrbal and Yuggera peoples as the traditional owners of the unceded lands on which many of us work and respectfully recognize Elders past and present. We acknowledge that technological advances– including AI– can negatively impact Country and that First Nations people here and around the world have technological evolutions which are in harmony with Country. The authors also wish to thank participants from the original study from which this case study is drawn, and those involved with the studies that helped train the AI language parser supporting this study. We also would like to thank Professor Rashina Hoda for the critical review of the manuscript. Open access publishing facilitated by Monash University, as part of the Wiley ‐ Monash University agreement via the Council of Australian University Librarians.

Biographies

Michelle D. Lazarus, SFHEA, PhD, is a professor and Director of the Centre for Human Anatomy Education in the Department of Anatomy and Developmental Biology, Biomedical Discovery Institute, Faculty of Medicine, Nursing and Health Sciences at Monash University, Clayton, VIC, Australia. She is also the Deputy Director for the Monash Centre for Scholarship in Health Education. Michelle's research focuses on understanding how anatomy education can foster healthcare workforce professional identity. She leads the clinical anatomy portion of the undergraduate entry medical degree at Monash University.

Linxuan Zhao is a research fellow in the Center of Learning Analytics at Monash in the Department of Human‐Centered Computing. His main research focus is using multimodal learning analytics to facilitate teaching and learning in physical environments, such as the classroom.

Andrew Gibson, PhD, is an information scientist researching the intersection of cognition, time, and personal well‐being. He is a Senior Lecturer in Information Science within the School of Information Systems, Faculty of Science, Queensland University of Technology (QUT). His research investigates structures and processes of cognitive systems and involves both conceptual and computational modeling. This work includes topics such as the nature of abduction, the significance of complexity and holism in cognitive systems, the experience of flow and temporal distortion, the relationship to well‐being and learning, and cognitive reflexivity for learning.

Roberto Martinez‐Maldonado, PhD, is an associate professor of learning analytics and human‐centered AI and Deputy Director of the Centre for Learning Analytics at Monash University. His research explores multimodal learning analytics, AI‐assisted teamwork assessment, and data storytelling to enhance learning and decision‐making in complex domains such as healthcare and education. His work integrates computational and qualitative methods to understand and support social interactions, professional practice, and learning at scale.

Georgina C. Stephens MBBS (Hons.), PhD, FHEA is a Senior Lecturer in the Centre for Human Anatomy Education in the Department of Anatomy and Developmental Biology, Biomedical Discovery Institute, Faculty of Medicine, Nursing, and Health Sciences at Monash University, Clayton, VIC, Australia. She teaches clinical anatomy with a focus on developing professional skills through donor dissection. Her research focuses on health professions learners' development of uncertainty tolerance and the intersection of donor dissection and person‐centered care.

Contributor Information

Michelle D. Lazarus, Email: michelle.lazarus@monash.edu.

Georgina C. Stephens, Email: georgina.stephens@monash.edu.

REFERENCES

1. Mhlanga D. Exploring the evolution of artificial intelligence and the fourth industrial revolution an overview. FinTech and artificial intelligence for sustainable development. Sustainable development goals series. Cham: Palgrave Macmillan; 2023. 10.1007/978-3-031-37776-1_2 [DOI] [Google Scholar]
2. Sahai AK, Rath N, Elngar AA, Panda SK, Mishra V, Balamurali R. Artificial intelligence and the 4th industrial revolution. Artificial intelligence and machine learning in business management. Volume 1. 1st ed. Boca Raton: CRC Press; 2022. p. 127–143. 10.1201/9781003125129-8 [DOI] [Google Scholar]
3. Lazarus MD, Truong M, Douglas P, Selwyn N. Artificial intelligence and clinical anatomical education: promises and perils. Anat Sci Educ. 2024;17(2):249–262. 10.1002/ase.2221 [DOI] [PubMed] [Google Scholar]
4. Longo L. Empowering qualitative research methods in education with artificial intelligence. In: Costa A, Reis L, Moreira A, editors. Computer supported qualitative research. WCQR 2019. Advances in intelligent systems and computing, vol 1068. Cham: Springer; 2020. 10.1007/978-3-030-31787-4_1 [DOI] [Google Scholar]
5. Bano M, Hoda R, Zowghi D, Treude C. Large language models for qualitative research in software engineering: exploring opportunities and challenges. Autom Softw Eng. 2024;31(1):8. 10.1007/s10515-023-00407-8 [DOI] [Google Scholar]
6. Hitch D. Artificial intelligence augmented qualitative analysis: the way of the future? Qual Health Res. 2024;34(7):595–606. 10.1177/10497323231217392 [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Morgan DL. Exploring the use of artificial intelligence for qualitative data analysis: the case of ChatGPT. Int J Qual Methods. 2023;22:10. 10.1177/16094069231211248 [DOI] [Google Scholar]
8. Yan L, Echeverria V, Fernandez Nieto G, Jin Y, Swiecki Z, Zhao L, et al. Human‐AI collaboration in thematic analysis using ChatGPT: a user study and design recommendations. In: Williamson JR, Sas C, editors. Extended abstracts of the 2024 CHI Conference on Human Factors in Computing Systems. New York: Association for Computing Machinery (ACM); 2024. article 191. 10.1145/3613905.3650732 [DOI] [Google Scholar]
9. Yang Y, Alba C, Wang C, Wang X, Anderson J, An R. GPT models can perform thematic analysis in public health studies, akin to qualitative researchers. J Social Comput. 2024;5(4):293–312. 10.23919/jsc.2024.0024 [DOI] [Google Scholar]
10. Kabir A, Shah S, Haddad A, Raper DMS. Introducing our custom GPT: An example of the potential impact of personalized GPT builders on scientific writing. World Neurosurg. 2025;193:461–468. 10.1016/j.wneu.2024.10.041 [DOI] [PubMed] [Google Scholar]
11. Ayton D. Rigour. In: Ayton D, Tsindos T, Bercovic D, editors. Qualitative research—a practical guide for health and social care researchers and practitioners. Melbourne, Australia: Monash University; 2023. 10.60754/chqr-dn78 [DOI] [Google Scholar]
12. O'Brien BC, Rees EL, Palermo C. Quality in health professions education research. In: Rees CE, Monrouxe LV, O'Brien BC, Gordon LJ, Palermo C, editors. Foundations of health professions education research. Hoboken, NJ: Wiley; 2023. p. 58–80. [Google Scholar]
13. Lee H‐P, Sarkar A, Tankelevitch L, Drosos I, Rintel S, Banks R, et al. The impact of generative AI on critical thinking: self‐reported reductions in cognitive effort and confidence effects from a survey of knowledge workers. CHI Conference on Human Factors in Computing Systems (CHI ‘25), April 26–May 01, 2025, Yokohama, Japan. New York, NY: ACM; 2025. 10.1145/3706598.3713778 [DOI] [Google Scholar]
14. Gibson A, Vine LD, Canizares M, Willis J. Reflexive expressions: towards the analysis of reflexive capability from reflective text. International Conference on Artificial Intelligence in Education. Cham: Springer Nature Switzerland; 2023. p. 353–364. [Google Scholar]
15. Christou P. A critical perspective over whether and how to acknowledge the use of artificial intelligence (AI) in qualitative studies. Qual Rep. 2023;28(7):1981–1991. 10.46743/2160-3715/2023.6407 [DOI] [Google Scholar]
16. Shaffer DW, Collier W, Ruis AR. A tutorial on epistemic network analysis: analyzing the structure of connections in cognitive, social, and interaction data. J Learn Anal. 2016;3(3):9–45. 10.18608/jla.2016.33.3 [DOI] [Google Scholar]
17. Elmoazen R, Saqr M, Tedre M, Hirsto L. A systematic literature review of empirical research on epistemic network analysis in education. IEEE Access. 2022;10:17330–17348. [Google Scholar]
18. Braun V, Clarke V. Thematic analysis—a practical guide. London, UK: SAGE; 2022. [Google Scholar]
19. Stephens GC, Sarkar M, Lazarus MD. A whole lot of uncertainty': a qualitative study exploring clinical medical students' experiences of uncertainty stimuli. Med Educ. 2022;56(7):736–746. 10.1111/medu.14743 [DOI] [PMC free article] [PubMed] [Google Scholar]
20. Stephens GC, Sarkar M, Lazarus MD. Medical student experiences of uncertainty tolerance moderators: a longitudinal qualitative study. Front Med. 2022;9:864141. 10.3389/fmed.2022.864141 [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Stephens GC, Sarkar M, Lazarus MD. I was uncertain, but I was acting on it': a longitudinal qualitative study of medical students' responses to uncertainty. Med Educ. 2024;58(7):869–879. 10.1111/medu.15269 [DOI] [PubMed] [Google Scholar]
22. University of Oxford . British national corpus. 2014. Available from: http://www.natcorp.ox.ac.uk/
23. Monrouxe LV, Brown MEL, Ottrey E, Gordon LJ. Introducing interpretivist approaches in health professions education research. In: Rees CE, Monrouxe LV, O'Brien BC, Gordon LJ, Palermo C, editors. Foundations of health professions education research. Hoboken, NJ: Wiley; 2023. p. 122–144. [Google Scholar]
24. Palermo C, Reidlinger DP, Rees CE. Internal coherence matters: lessons for nutrition and dietetics research. Nutr Diet. 2021;78(3):252–267. 10.1111/1747-0080.12680 [DOI] [PMC free article] [PubMed] [Google Scholar]
25. Rees CE, Ottrey E. “Lives and times”: the case for qualitative longitudinal research in anatomical sciences education. Anat Sci Educ. 2026;19(2):218–230. 10.1002/ase.2514 [DOI] [PMC free article] [PubMed] [Google Scholar]
26. O'Brien BC, Harris IB, Beckman TJ, Reed DA, Cook DA. Standards for reporting qualitative research: a synthesis of recommendations. Acad Med. 2014;89(9):1245–1251. 10.1097/ACM.0000000000000388 [DOI] [PubMed] [Google Scholar]
27. Tong A, Sainsbury P, Craig J. Consolidated criteria for reporting qualitative research (COREQ): a 32‐item checklist for interviews and focus groups. Int J Qual Health Care. 2007;19(6):349–357. 10.1093/intqhc/mzm042 [DOI] [PubMed] [Google Scholar]
28. Varpio L, Ajjawi R, Monrouxe LV, O'Brien BC, Rees CE. Shedding the cobra effect: problematising thematic emergence, triangulation, saturation and member checking. Med Educ. 2017;51(1):40–50. 10.1111/medu.13124 [DOI] [PubMed] [Google Scholar]
29. Song Y, Millidge B, Salvatori T, Lukasiewicz T, Xu Z, Bogacz R. Inferring neural activity before plasticity as a foundation for learning beyond backpropagation. Nat Neurosci. 2024;27:1–358. 10.1038/s41593-023-01514-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
30. Xu F, Uszkoreit H, Du Y, Fan W, Zhao D, Zhu J. Explainable AI: a brief survey on history, research areas, approaches and challenges. In: Tang J, Kan MY, Zhao D, Li S, Zan H, editors. Natural language processing and Chinese computing. NLPCC 2019. Lecture Notes in Computer Science (vol. 11839). Springer, Cham; 2019. 10.1007/978-3-030-32236-6_51 [DOI] [Google Scholar]
31. Terry G, Hayfield N. Reflexive thematic analysis and men's embodiment following injury or illness: a worked example. Anat Sci Educ. 2025. [DOI] [PMC free article] [PubMed] [Google Scholar]
32. Dueñas AN, Lazarus MD, Byram JN. There is a method to the madness, and a madness to the method: a beginner's guide to qualitative research. Anat Sci Educ. 2026;19(2):166–180. 10.1002/ase.70055 [DOI] [PMC free article] [PubMed] [Google Scholar]
33. Braun V, Clarke V. Toward good practice in thematic analysis: avoiding common problems and be(com)ing a knowing researcher. Int J Transgender Health. 2022;24(1):1–6. 10.1080/26895269.2022.2129597 [DOI] [PMC free article] [PubMed] [Google Scholar]
34. Mosqueira‐Rey E, Hernández‐Pereira E, Alonso‐Ríos D, Bobes‐Bascarán J, Fernández‐Leal Á. Human‐in‐the‐loop machine learning: a state of the art. Artif Intell Rev. 2022;56:3005–3054. 10.1007/s10462-022-10246-w [DOI] [Google Scholar]
35. Firestone W. Alternative arguments for generalizing from data as applied to qualitative research. Educ Res. 1993;22(4):16–23. 10.2307/1177100 [DOI] [Google Scholar]
36. Gibson A, Aitken A, Sándor Á, Buckingham Shum S, Tsingos‐Lucas C, Knight S. Reflective writing analytics for actionable feedback. Proceedings of the Seventh International Learning Analytics & Knowledge Conference. New York, NY, USA: Association for Computing Machinery; 2017. p. 153–162. [Google Scholar]
37. Gibson A, Kitto K, Bruza P. Towards the discovery of learner metacognition from reflective writing. J Learn Anal. 2016;3(2):22–36. [Google Scholar]
38. Gibson A, Willis J. Ethical challenges and guiding principles in facilitating personal digital reflection. Ethics of digital well‐being: a multidisciplinary approach. Cham: Springer International Publishing; 2020. p. 151–173. [Google Scholar]

[ase70125-bib-0001] 1. Mhlanga D. Exploring the evolution of artificial intelligence and the fourth industrial revolution an overview. FinTech and artificial intelligence for sustainable development. Sustainable development goals series. Cham: Palgrave Macmillan; 2023. 10.1007/978-3-031-37776-1_2 [DOI] [Google Scholar]

[ase70125-bib-0002] 2. Sahai AK, Rath N, Elngar AA, Panda SK, Mishra V, Balamurali R. Artificial intelligence and the 4th industrial revolution. Artificial intelligence and machine learning in business management. Volume 1. 1st ed. Boca Raton: CRC Press; 2022. p. 127–143. 10.1201/9781003125129-8 [DOI] [Google Scholar]

[ase70125-bib-0003] 3. Lazarus MD, Truong M, Douglas P, Selwyn N. Artificial intelligence and clinical anatomical education: promises and perils. Anat Sci Educ. 2024;17(2):249–262. 10.1002/ase.2221 [DOI] [PubMed] [Google Scholar]

[ase70125-bib-0004] 4. Longo L. Empowering qualitative research methods in education with artificial intelligence. In: Costa A, Reis L, Moreira A, editors. Computer supported qualitative research. WCQR 2019. Advances in intelligent systems and computing, vol 1068. Cham: Springer; 2020. 10.1007/978-3-030-31787-4_1 [DOI] [Google Scholar]

[ase70125-bib-0005] 5. Bano M, Hoda R, Zowghi D, Treude C. Large language models for qualitative research in software engineering: exploring opportunities and challenges. Autom Softw Eng. 2024;31(1):8. 10.1007/s10515-023-00407-8 [DOI] [Google Scholar]

[ase70125-bib-0006] 6. Hitch D. Artificial intelligence augmented qualitative analysis: the way of the future? Qual Health Res. 2024;34(7):595–606. 10.1177/10497323231217392 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ase70125-bib-0007] 7. Morgan DL. Exploring the use of artificial intelligence for qualitative data analysis: the case of ChatGPT. Int J Qual Methods. 2023;22:10. 10.1177/16094069231211248 [DOI] [Google Scholar]

[ase70125-bib-0008] 8. Yan L, Echeverria V, Fernandez Nieto G, Jin Y, Swiecki Z, Zhao L, et al. Human‐AI collaboration in thematic analysis using ChatGPT: a user study and design recommendations. In: Williamson JR, Sas C, editors. Extended abstracts of the 2024 CHI Conference on Human Factors in Computing Systems. New York: Association for Computing Machinery (ACM); 2024. article 191. 10.1145/3613905.3650732 [DOI] [Google Scholar]

[ase70125-bib-0009] 9. Yang Y, Alba C, Wang C, Wang X, Anderson J, An R. GPT models can perform thematic analysis in public health studies, akin to qualitative researchers. J Social Comput. 2024;5(4):293–312. 10.23919/jsc.2024.0024 [DOI] [Google Scholar]

[ase70125-bib-0010] 10. Kabir A, Shah S, Haddad A, Raper DMS. Introducing our custom GPT: An example of the potential impact of personalized GPT builders on scientific writing. World Neurosurg. 2025;193:461–468. 10.1016/j.wneu.2024.10.041 [DOI] [PubMed] [Google Scholar]

[ase70125-bib-0011] 11. Ayton D. Rigour. In: Ayton D, Tsindos T, Bercovic D, editors. Qualitative research—a practical guide for health and social care researchers and practitioners. Melbourne, Australia: Monash University; 2023. 10.60754/chqr-dn78 [DOI] [Google Scholar]

[ase70125-bib-0012] 12. O'Brien BC, Rees EL, Palermo C. Quality in health professions education research. In: Rees CE, Monrouxe LV, O'Brien BC, Gordon LJ, Palermo C, editors. Foundations of health professions education research. Hoboken, NJ: Wiley; 2023. p. 58–80. [Google Scholar]

[ase70125-bib-0013] 13. Lee H‐P, Sarkar A, Tankelevitch L, Drosos I, Rintel S, Banks R, et al. The impact of generative AI on critical thinking: self‐reported reductions in cognitive effort and confidence effects from a survey of knowledge workers. CHI Conference on Human Factors in Computing Systems (CHI ‘25), April 26–May 01, 2025, Yokohama, Japan. New York, NY: ACM; 2025. 10.1145/3706598.3713778 [DOI] [Google Scholar]

[ase70125-bib-0014] 14. Gibson A, Vine LD, Canizares M, Willis J. Reflexive expressions: towards the analysis of reflexive capability from reflective text. International Conference on Artificial Intelligence in Education. Cham: Springer Nature Switzerland; 2023. p. 353–364. [Google Scholar]

[ase70125-bib-0015] 15. Christou P. A critical perspective over whether and how to acknowledge the use of artificial intelligence (AI) in qualitative studies. Qual Rep. 2023;28(7):1981–1991. 10.46743/2160-3715/2023.6407 [DOI] [Google Scholar]

[ase70125-bib-0016] 16. Shaffer DW, Collier W, Ruis AR. A tutorial on epistemic network analysis: analyzing the structure of connections in cognitive, social, and interaction data. J Learn Anal. 2016;3(3):9–45. 10.18608/jla.2016.33.3 [DOI] [Google Scholar]

[ase70125-bib-0017] 17. Elmoazen R, Saqr M, Tedre M, Hirsto L. A systematic literature review of empirical research on epistemic network analysis in education. IEEE Access. 2022;10:17330–17348. [Google Scholar]

[ase70125-bib-0018] 18. Braun V, Clarke V. Thematic analysis—a practical guide. London, UK: SAGE; 2022. [Google Scholar]

[ase70125-bib-0019] 19. Stephens GC, Sarkar M, Lazarus MD. A whole lot of uncertainty': a qualitative study exploring clinical medical students' experiences of uncertainty stimuli. Med Educ. 2022;56(7):736–746. 10.1111/medu.14743 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ase70125-bib-0020] 20. Stephens GC, Sarkar M, Lazarus MD. Medical student experiences of uncertainty tolerance moderators: a longitudinal qualitative study. Front Med. 2022;9:864141. 10.3389/fmed.2022.864141 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ase70125-bib-0021] 21. Stephens GC, Sarkar M, Lazarus MD. I was uncertain, but I was acting on it': a longitudinal qualitative study of medical students' responses to uncertainty. Med Educ. 2024;58(7):869–879. 10.1111/medu.15269 [DOI] [PubMed] [Google Scholar]

[ase70125-bib-0022] 22. University of Oxford . British national corpus. 2014. Available from: http://www.natcorp.ox.ac.uk/

[ase70125-bib-0023] 23. Monrouxe LV, Brown MEL, Ottrey E, Gordon LJ. Introducing interpretivist approaches in health professions education research. In: Rees CE, Monrouxe LV, O'Brien BC, Gordon LJ, Palermo C, editors. Foundations of health professions education research. Hoboken, NJ: Wiley; 2023. p. 122–144. [Google Scholar]

[ase70125-bib-0024] 24. Palermo C, Reidlinger DP, Rees CE. Internal coherence matters: lessons for nutrition and dietetics research. Nutr Diet. 2021;78(3):252–267. 10.1111/1747-0080.12680 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ase70125-bib-0025] 25. Rees CE, Ottrey E. “Lives and times”: the case for qualitative longitudinal research in anatomical sciences education. Anat Sci Educ. 2026;19(2):218–230. 10.1002/ase.2514 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ase70125-bib-0026] 26. O'Brien BC, Harris IB, Beckman TJ, Reed DA, Cook DA. Standards for reporting qualitative research: a synthesis of recommendations. Acad Med. 2014;89(9):1245–1251. 10.1097/ACM.0000000000000388 [DOI] [PubMed] [Google Scholar]

[ase70125-bib-0027] 27. Tong A, Sainsbury P, Craig J. Consolidated criteria for reporting qualitative research (COREQ): a 32‐item checklist for interviews and focus groups. Int J Qual Health Care. 2007;19(6):349–357. 10.1093/intqhc/mzm042 [DOI] [PubMed] [Google Scholar]

[ase70125-bib-0028] 28. Varpio L, Ajjawi R, Monrouxe LV, O'Brien BC, Rees CE. Shedding the cobra effect: problematising thematic emergence, triangulation, saturation and member checking. Med Educ. 2017;51(1):40–50. 10.1111/medu.13124 [DOI] [PubMed] [Google Scholar]

[ase70125-bib-0029] 29. Song Y, Millidge B, Salvatori T, Lukasiewicz T, Xu Z, Bogacz R. Inferring neural activity before plasticity as a foundation for learning beyond backpropagation. Nat Neurosci. 2024;27:1–358. 10.1038/s41593-023-01514-1 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ase70125-bib-0030] 30. Xu F, Uszkoreit H, Du Y, Fan W, Zhao D, Zhu J. Explainable AI: a brief survey on history, research areas, approaches and challenges. In: Tang J, Kan MY, Zhao D, Li S, Zan H, editors. Natural language processing and Chinese computing. NLPCC 2019. Lecture Notes in Computer Science (vol. 11839). Springer, Cham; 2019. 10.1007/978-3-030-32236-6_51 [DOI] [Google Scholar]

[ase70125-bib-0031] 31. Terry G, Hayfield N. Reflexive thematic analysis and men's embodiment following injury or illness: a worked example. Anat Sci Educ. 2025. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ase70125-bib-0032] 32. Dueñas AN, Lazarus MD, Byram JN. There is a method to the madness, and a madness to the method: a beginner's guide to qualitative research. Anat Sci Educ. 2026;19(2):166–180. 10.1002/ase.70055 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ase70125-bib-0033] 33. Braun V, Clarke V. Toward good practice in thematic analysis: avoiding common problems and be(com)ing a knowing researcher. Int J Transgender Health. 2022;24(1):1–6. 10.1080/26895269.2022.2129597 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ase70125-bib-0034] 34. Mosqueira‐Rey E, Hernández‐Pereira E, Alonso‐Ríos D, Bobes‐Bascarán J, Fernández‐Leal Á. Human‐in‐the‐loop machine learning: a state of the art. Artif Intell Rev. 2022;56:3005–3054. 10.1007/s10462-022-10246-w [DOI] [Google Scholar]

[ase70125-bib-0035] 35. Firestone W. Alternative arguments for generalizing from data as applied to qualitative research. Educ Res. 1993;22(4):16–23. 10.2307/1177100 [DOI] [Google Scholar]

[ase70125-bib-0036] 36. Gibson A, Aitken A, Sándor Á, Buckingham Shum S, Tsingos‐Lucas C, Knight S. Reflective writing analytics for actionable feedback. Proceedings of the Seventh International Learning Analytics & Knowledge Conference. New York, NY, USA: Association for Computing Machinery; 2017. p. 153–162. [Google Scholar]

[ase70125-bib-0037] 37. Gibson A, Kitto K, Bruza P. Towards the discovery of learner metacognition from reflective writing. J Learn Anal. 2016;3(2):22–36. [Google Scholar]

[ase70125-bib-0038] 38. Gibson A, Willis J. Ethical challenges and guiding principles in facilitating personal digital reflection. Ethics of digital well‐being: a multidisciplinary approach. Cham: Springer International Publishing; 2020. p. 151–173. [Google Scholar]

PERMALINK

Risky or rigorous? Developing trustworthiness criteria for AI‐supported qualitative data analysis

Michelle D Lazarus

Linxuan Zhao

Andrew Gibson

Roberto Martinez‐Maldonado

Georgina C Stephens

INTRODUCTION

BOX 1. Case study.

AI: A QUALITATIVE RESEARCH ANALYST OR TOOL?

TABLE 1.

Credibility

Dependability

Reflexivity

Transferability

Confirmability

FIGURE 1.

CONCLUSIONS

AUTHOR CONTRIBUTIONS

ACKNOWLEDGMENTS

Biographies

Contributor Information

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Risky or rigorous? Developing trustworthiness criteria for AI‐supported qualitative data analysis

Michelle D Lazarus

Linxuan Zhao

Andrew Gibson

Roberto Martinez‐Maldonado

Georgina C Stephens

INTRODUCTION

BOX 1. Case study.

AI: A QUALITATIVE RESEARCH ANALYST OR TOOL?

TABLE 1.

Credibility

Dependability

Reflexivity

Transferability

Confirmability

FIGURE 1.

CONCLUSIONS

AUTHOR CONTRIBUTIONS

ACKNOWLEDGMENTS

Biographies

Contributor Information

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases