Abstract
In this design case, we present the integration of Generative Artificial Intelligence (GenAI) into the development of a self-paced online course on program evaluation for biomedical sciences research training programs. We designed and developed this online course to be hosted on the Moodle platform. We used GenAI tools, including ChatGPT, Copilot, Gemini, and Audiate, at various stages of the instructional design process. The tools assisted with several design and development tasks such as brainstorming content ideas, drafting a video script, generating audio narration, and creating assessments. As a design team, we treated the GenAl tools as co-thinkers to leverage the generative capabilities of the tools while ensuring a human was always present to provide informed decisions. Additionally, we dealt with ethical issues associated with bias, accessibility, and trusting AI-generated multimedia products. Throughout the process, we made deliberate design choices, including revising learning objectives, developing AI-assisted assessments aligned with higher-order learning outcomes, and scripting multimedia content suitable for synthetic voice delivery to balance innovation with ethical responsibility and pedagogical needs. This case study outlines our experience of developing instructional materials with emerging technologies. Our experiences suggest that while GenAI can enhance design efficiency, the final design must be pedagogically convenient, ethically responsible, and human-centered.
INTRODUCTION
We are noticing that the emergence of GenAI is beginning to influence how instructional designers think, plan, and create. This design case tells the story of how the idea for our project emerged from recognizing a barrier to the sustained success of biomedical research training programs. The National Institutes of Health (NIH), National Institute of General Medical Sciences (NIGMS) observed that many biomedical research programs lacked the internal capacity to conduct meaningful evaluations, which limited their long-term impact, and we responded to the call to develop an online course, Evaluation for Biomedical Research Programs, aimed at strengthening program evaluation capacity across institutions (National Institutes of Health [NIH], National Institute of General Medical Sciences [NIGMS], 2023).
Around the same time, we were also witnessing the rapid growth of GenAI tools and became curious about how they might support our instructional design (ID) process. As we explored the current practice of IDs, we saw increasing interest in using AI tools for content generation, drafting, assessment design, and visual creation. However, there was still little transparency around how these tools were being applied in real instructional design contexts (Luo et al., 2024). As instructional designers, we are struggling to find practical guidance on GenAI integration. We have seen very limited works that have rich, detailed, practice-based reflections on GenAI integration in design cases or other forms of scholarship in the ID field.
We believe it is essential to develop reflective design narratives that illustrate the complexities, challenges, and evolving nature of the ID process. In approaching this challenge, we treated the design process itself as a form of experimentation. We viewed GenAI as a collaborator in an iterative workflow to enhance our design process. This allowed us to reflect on our practices and to examine how AI might support human-centered practices. Throughout the design process, we found ourselves rethinking assumptions about design, pedagogical judgment, and human decision-making in ID workflows.
We wrote this design case to explore what we encountered in our own instructional design work. It offered us an opportunity to critically reflect on our practice and to open a space for us to discuss what it means to design with emerging technologies.
PROJECT CONTEXT
This design case centers on the development of a self-paced, asynchronous online course designed to build evaluation capacity within biomedical sciences (BMS) research training environments. Across different contexts, faculty and staff involved in graduate-level training in the biomedical sciences are increasingly being asked to assess and report on the impact of their training programs. Additionally, they have schedules and varying responsibilities. We chose an online medium to meet their needs, and we are currently developing a course designed to teach program evaluation concepts in an interactive and user-friendly format, and flexibility of context is a central component.
Project Team
We are a multidisciplinary team with expertise in online education, program evaluation, biomedical research, and instructional design. The first and second authors worked as graduate assistants on this project and were mainly responsible for the content and design development of the online course, conducting user tests with potential users, and revising and refining the modules as needed. They have expertise in instructional design and human-computer interaction. The remaining authors were principal investigators at a higher education institution and also worked as subject matter experts (SMEs), contributing expertise in instructional design, human-centered design approaches, program evaluation, curriculum development, and biomedical research.
We also worked closely with Information Technology (IT) professionals to oversee Moodle implementation and functionality, and with the accessibility team to make sure our materials are accessible to the audience. After the initial designs of the first three modules, an external evaluator with expertise in instructional design and evaluation evaluated the modules to make sure that the project was meeting its goals, stakeholders’ expectations, and learners’ needs.
Module Overview
We aimed to design short and digestible modules. Each module included 5–7-minute instructional videos, H5P-based interactive activities, downloadable planning templates, and self-check questions. We also incorporated case-based examples to help learners apply the evaluation concepts in their own fields. Table 1 presents the module titles and learning focus.
TABLE 1.
Overview of the modules and their respective learning focus.
| MODULE | MODULE TITLE | LEARNING FOCUS |
|---|---|---|
| 1 | Fundamental Concepts of Evaluation | Introduce core evaluation definitions, purposes, and types |
| 2 | Approaches to Program Evaluation | Major evaluation frameworks such as the Kirkpatrick model, the logic model, and other frameworks |
| 3 | Steps of Designing and Conducting Evaluation | Stages of planning and implementing an evaluation, such as writing evaluation questions, sampling |
| 4 | Design and Development of Evaluation Instruments | Learn instrument design and data collection methods for quantitative and qualitative data. |
| 5 | Data Gathering and Analysis Techniques in Evaluation | Quantitative and qualitative data collection and analysis approaches in evaluation |
| 6 | Ethical Considerations and Standards of Evaluation | Explore ethics and professional standards. And dilemmas in program evaluation |
| 7 | Communicating the Evaluation Result | Structure reports, apply data visualization, and tailor communication for stakeholders. |
Our target learners are:
Program directors and training grant managers responsible for planning and reporting.
Administrative staff and faculty involved in curriculum design, learner assessment, and outcomes tracking.
Graduate students in biomedical sciences preparing for research or academic careers with evaluation responsibilities.
Al Tools Integrated
According to Ch’ng (2023), Al not only creates, but it also contributes across the entire design process by generating instructional assets such as visual materials and structural outlines from ideation to the product phase. Keeping this in mind, we incorporated GenAI tools into several phases of development to support the timeline and scope of the project. The specific GenAI tools we utilized included:
ChatGPT-4o and Microsoft Copilot: generating video narration drafts, example scenarios, quiz items, and summaries of key concepts.
Canva, DALL-E, and Gemini: creating illustrations and layout visuals.
Audiate and Camtasia: producing audio narration using Al-generated voices.
DESIGN PROCESS
We adapted the Success Approximation Model (SAM), an iterative approach to instructional design that has repeated cycles of design, feedback, and revision. The three phases are: 1. Preparation phase, 2. Iterative design phase 3. Iterative development phase (see Figure 1).
FIGURE 1.

SAM framework for supporting the iterative implementation of our instructional materials.
Preparation Phase
Our first step was to build a strong foundation for the project, which included clarifying our objectives, understanding our learners, defining the scope of content, and choosing the right tools to support our design. Keeping this in mind, we conducted a needs analysis to understand who our learners are.
Needs Analysis
We leveraged GenAI to simulate early-stage learner research to understand our target audience, utilizing learner personas. Creation of learner personas is often considered a common practice in ID.
These personas help design teams align on learner goals, challenges, and preferences. They also serve as a lens through which to make instructional decisions. In our case, we used AI to both generate and explore these personas in depth.
To begin our analysis of our learners, we drafted a detailed persona,“User 101”, a biomedical program coordinator without formal training in program evaluation but very motivated to use evaluation in their work. Then we asked ChatGPT to simulate a structured interview with User 101, where the Al acted as both the interviewer and the interviewee.
We used AI to elicit information across four areas: prior knowledge and adjacent skills; zone of proximal development (ZPD); preferred learning frames and instructional strategies; and suggested learning objectives based on learner goals and constraints (see Table 2 for a snippet; full prompt and report are available in Table S1 in the Appendix).
TABLE 2.
Snippet of ChatGPT Prompt used for needs analysis, and the report generated.
| PROMPT | OUTPUT |
|---|---|
| You are an expert instructional designer who specializes in learner needs analysis. I will give you a list of information to generate and a learner persona. Your task is to use your expertise in learner needs analysis to conduct a deep needs analysis for the learner persona and the specified course. You must act as both the interviewer and the respondent. Your task is to both ask and respond to a series of probing questions in order to assess: User 101’s prior knowledge and skills on the topic User 101’s Zone of Proximal Development (ZPD), including a list of concepts and skills that are in and out of scope, and an explanation of why they are in or out of scope User 101’s preferred mode of delivery, including preferred instructional strategy and preference for online asynchronous, blended, or in-person The relevant objectives for this course, given their goals, ZPD, and available learning time. You must ask probing questions. For each of the four areas, you must ask at least two follow-up questions in order to get an in-depth understanding of the learner and their needs. Afterward, you must create a final report about the interview, detailing what you learned from the needs analysis. what you learned from the needs analysis. Learner persona: Name: User 101 Age: 45 Course: Program Evaluation Course for Biomedical Scientists, Biomedical program coordinators, and researchers Role: Program Coordinator of a biomedical science program Learning Needs: Evaluate biomedical science programs and training |
User 101 has no formal experience in program evaluation but has a solid foundation in project management and basic quantitative data interpretation. They are familiar with analyzing participant feedback and reporting outcomes to stakeholders, but lack experience in designing evaluation frameworks. |
In Scope:
| |
Out of Scope (for now):
| |
| User 101 prefers online, asynchronous learning due to their unpredictable schedule, though they would appreciate occasional live discussions. They benefit from mixed content types, with a preference for case studies, concise videos, and infographics. Long theoretical readings would likely be less engaging. |
We chose to focus on these aspects because they provide a holistic view of the target learners. Prior knowledge helps us identify what learners already know and what other skills they can get from the learning experience. The Zone of Proximal Development gives insights into the scaffolding and support needed. Preferred learning frames enable us to learn more about individual differences and how learners prefer to engage with the content. Lastly, suggested learning objectives ensure that instructional design aligns with learner goals and contextual needs of the learners, and also provide some insights about content development and assessment.
This simulation generated a robust, conversational needs analysis that revealed aspects of User 101’s motivations, challenges, and learning preferences (see Table S2 in the Appendix). While the AI-driven approach is not a replacement for actual learner input, it sets the stage. It provided a low-cost starting point for informing an instructional strategy. This report helped us guide early design decisions and reinforced our commitment to human-centered design by enabling empathy-driven planning from the start.
This learner analysis helped us define our goal for how we were going to proceed with the design process. It influenced our decisions in identifying the preferred mode of delivery, learning objectives, and module components. We also had to take a stance on how to use GenAI in our materials development, as there was hesitation around compliance and institutional guidelines.
This hesitation came from the lack of clear policies on the use of Al-supported content and concerns about data privacy, copyright, and originality. We wanted to make sure that any use of AI tools in our design process aligned with ethical standards and maintained transparency in how the content was developed.
Brainstorming
After reviewing the existing NIH guidelines on utilizing GenAI for research, we decided to utilize AI tools as our co-thinking partner to support our ideation process, improve accessibility, and explore other opportunities available in module development. We were brainstorming and exploring the various AI tools available to design our modules and content. We first began with drafting module-level learning objectives to guide content development and assessment strategies based on the findings from our learner analysis. These objectives were initially developed by SMEs, and the goals were outlined in our grant proposal.
Module Level Objectives (MLOs)
We used GenAI tools, including ChatGPT, to refine the wording of the learning objectives. This process helped us draft an initial set of objectives. However, the AI-generated objectives were not always accurate. They often included non-measurable verbs, vague phrasing, or combined multiple actions within a single statement (for example, “identify and differentiate”).
Table 3 presents examples of the AI-generated drafts along-side the finalized learning objectives. Then we revised and refined these drafts through iterative collaboration with our SMEs (project Pls). Together, we ensured that each objective aligned with Bloom’s Taxonomy, used measurable verbs, and fit the expectations of a self-paced learning environment. These finalized objectives became the foundation for all subsequent content development and assessment design phases.
TABLE 3.
Snippet of ChatGPT Prompt used for needs analysis, and the report generated.
| AI-GENERATED MLO DRAFTS | REVISED MLO DRAFT |
|---|---|
For Module 3 on “Planning, Designing, and Conducting Evaluation,” here are the suggested learning objectives:
|
By the end of the course, you will be able to:
|
| These objectives will help learners grasp the key components of designing and conducting effective evaluations. |
Experimenting with AI Tools
After finalizing the module objectives, we began experimenting with a range of GenAI tools to produce key components such as video scripts, graphics, narration, presentation slides, and assessments. We utilized ChatGPT first and moved to Microsoft Copilot, and Gemini, as our university had an enterprise license for data projection for scripting and ideation, and InVideo for video generation (see Figure 3). Canva and Microsoft Designer were tested for slide and visual creation, while ElevenLabs, Zapier, and Speechify were explored for audio narration.
FIGURE 3.

A video clip generated by InVideo to illustrate evaluation vs. research. View the video through this link.
Although these tools offered a fast start, we encountered several limitations. The scripts often lacked the contextual understanding needed for our audience. Graphics generated through Gemini were frequently distorted or included misspelled text (see Figure 2). We were not happy with the videos produced by InVideo as we did not have any control over the visuals, which were repetitive. We also found the voice-over of the narration very robotic. Most importantly, it lacked cohesion across the segments. Therefore, we decided to create the videos ourselves with AI audio narration. Experimenting with several free tools and some tools with a free trial, we were impressed with the wide range of voices that were available. However, we found all the AI audios struggling to deliver a natural tone and rhythm, which made the content feel impersonal and disconnected. These limitations also reminded us of the importance of integrating human expertise throughout our process and guided us to re-evaluate which tools could be most effectively used in combination with traditional design methods.
FIGURE 2.

A graphic generated by Gemini was intended to depict “external evaluation,” displaying spelling errors.
We collaborated with our design team to create a unified logo and slide template, consistent with the structure of our module. We then decided to purchase the CREATE subscription plan from Camtasia for Al video and audio editing, instead of using free AI tools. We decided to use Camtasia because of the features it offers, such as customizing the voiceover mood, speed, and accent. It is also trustworthy in terms of consistency. Many free AI voiceover tools lack transparency regarding how data is used or generated. To ensure reliability and ethical use, we preferred a paid and more transparent platform. We used Audiate to record and edit audio for our course, which helped us to adjust pacing, pronunciation, and tone. For the video portion, we used Camtasia to combine our audio and slide content into more cohesive multimedia presentations.
As for interactivity, we worked with the H5P plugin within Moodle at first, but we found that it had accessibility issues and was inherently limited in the flexibility of design. With some exploration and with the aid of GenAl tools, we brainstormed and found creative ways to build interactives outside of Moodle but still provide accessible alternative formats for all learners.
These initial challenges, in addition to our practice of reflection, allowed us to make meaning of our experience to create a clear path forward, stressing the need to begin the next phase of design with a sound basis to make the design a little more intentional and a little more balanced.
Iterative Design Phase
With our module-level objectives, Gen AI tools, and design plan in place, we began this stage with a complete final draft of the course content for our first module. We used ChatGPT and Microsoft Copilot to assist with aligning the content to the learning objectives, video scripts, and assessments, and to support tasks such as content organization and formatting. Through this approach, we ensured coherence across course elements while maintaining full control over the instructional material.
Case Scenarios
While our course focuses on program evaluation, the needs of our target audience, graduate students and professionals in biomedical research, required a different approach than a general evaluation course. We wanted the content to reflect the characteristics of research training environments, not only generic evaluation practices. Therefore, one of our primary goals was to integrate realistic case scenarios throughout the course to situate learning in authentic contexts. We consulted GenAI tools to support the technical layout and organization of the case materials, but we developed the final cases ourselves to ensure relevance (see Table 4).
TABLE 4.
Early ChatGPT-generated case scenario featuring a clinical context.
| AI-GENERATED CASE SCENARIO |
|---|
| Scenario: Training Graduate Students in Biomedical Lab Techniques and Research Skills |
| As part of your graduate program in biomedical sciences, you are participating in a rotation at a university-affiliated clinical research center that focuses on translational cancer research. During your rotation, you are assigned to a project investigating biomarkers for early-stage lung cancer detection. The aim is to develop a minimally invasive blood test to detect specific genetic markers associated with lung cancer progression. |
Training Objectives:
|
| This experience not only improves your technical lab skills but also enhances your ability to communicate complex findings in a clinical context. |
Video Script Content
We also used GenAl to draft video scripts to streamline production since the final videos would use Al-generated voiceovers. This created additional problems. The AI-generated scripts often sounded like they were not written for reading out loud, and this was even more apparent during voice synthesis. Issues were common, such as repetitive wording and awkward transitions, as well as the repeated phrase “let’s dive into.” The drafts just as frequently included long dash marks (—) and contractions (e.g., “I’ve” instead of “I have”), which made reading awkward and created problems during documentation and narration.
Given these issues and the time needed to fix AI-generated drafts, the team spent a considerable amount of time rewriting the drafts. In some cases, the effort to revise Al-generated drafts was close to or more than writing the scripts from scratch. Nonetheless, AI still contributed to the process. It helped generate a base structure, offered language variations that inspired new approaches, and supported ideation in the early stages.
Editing and reviewing drafts relied heavily upon our input to rewrite sentence structure, wordy passages, and make sure that the pacing, tone, and formatting of scripts were appropriate for delivery in synthetic voice while being accurate and accessible to the target audience.
This drafting process also helped us come up with an internal structure on how to write AI-supported script content. For example, we refrained from longer sentences to reduce learners’ cognitive load in the instructional videos, we avoided using idiomatic phrasing, and included pauses to ensure voice emphasis. This experience also showed us that writing scripts for an Al voice-over differs from scripts created for human narrators. Al voice-over requires more attention to rhythm and readability.
In addition to revising AI-generated scripts for tone and relevance of the context, we cross-checked all content with the literature and other trusted sources in program evaluation. While GenAI tools helped us generate draft material quickly, they often produced inaccuracies or omitted details that made the content unsuitable for the learners. We also noticed the lack of transparency for the data sources used by these tools, along with potential copyright concerns, which required extensive review and verification on our part. Table 5 illustrates an example of initial output drafted by Al and the final video script.
TABLE 5.
Snippet of the initial output drafted by Al and the final revised video script.
| AI GENERATED | REVISED |
|---|---|
| “Hello everyone! In this video, we’ll dive into the basics of program evaluation. We’ll cover what evaluation is, why it’s crucial, and how it can enhance the effectiveness of programs. Through real-world examples, you’ll see how evaluation helps us identify areas for improvement and ensures programs reach their full potential. Let’s get started! | In this video, we will explore what evaluation is, why it is important, and how it can help improve the effectiveness of programs. We will also see how it differs from research, although they share certain commonalities. You will see how evaluation helps us identify areas for improvement and ensures programs reach their full potential. Let’s get started! |
| What is Evaluation? | |
| Now, whether we realize it or not, we evaluate things every day. Whether it’s choosing the best treatment approach for a patient or deciding on a new methodology for research, we constantly assess what works best. In a similar way, evaluation helps us understand how well something is working, but in a more organized and systematic way. | “Now, whether we realize it or not, we actually evaluate things every day. Whether it’s choosing the best route to work or deciding which recipe to cook for dinner, we are constantly figuring out what works best in our daily lives. |
| In a similar way, evaluation helps us figure out how well something is working, but in a more organized and systematic way. So, what exactly is program evaluation? | |
| So, let’s explore what program evaluation means. To make this clearer, let’s start with a biomedical research scenario. | According to the American Evaluation Association, ‘Evaluation is a systematic process to determine merit, worth, value, or significance.’ In simpler terms, it is about identifying problems and making improvements to help a program achieve its goals. |
| Imagine you’re part of a team that’s developed a new community health program aimed at reducing Type 2 diabetes risk. You’ve run successful pilot tests, and now the program is rolling out in several hospitals. | There are three main reasons why evaluation is critical for training programs: |
| Improve Future Programs: By identifying areas of improvement, we ensure programs evolve and become more effective. | |
| However, after some time, you notice issues: patient participation rates are low, clinics struggle to track patient data, and healthcare staff aren’t fully trained on the program’s details. Despite the program’s potential, it’s not producing the results you anticipated. | Decide Program Continuity: It helps us determine whether to continue, modify, or end a program. |
| Justify Investments: Evaluation provides evidence to show that programs deliver tangible benefits, making a strong case for their value to stakeholders. | |
| So, let’s look into what program evaluation means with an example. | |
| “Meet Daniel. He is part of a university outreach team launching an exciting new initiative to inspire high school students to explore careers in biomedical science.” … |
Assessments
Another important step in the process was to then create contextualized and well-written assessment questions in the context of biomedical research to fit within the assessments perceivable to the learning objectives. One of the most challenging parts in this phase was to create questions that could be auto-graded in a self-paced online course without any instructor interference. This meant that we needed to re-evaluate our learning outcomes and consider what was possible in an interactive design, such as H5P, that would support self-paced learning and demonstrate higher-order thinking skills.
We utilized ChatGPT to generate preliminary drafts of assessment questions only to realize that many questions it generated were limited by their inability to assess higher-level skills and were in the domain of simple recall, such as defining something or a simple word or fact. In some situations, the correct answer was always in the longest response option, which could downplay the validity of the question. Other times, the distractors were too similar to each other, causing reasoning difficulties for the learner to differentiate between similar answers.
Figure 4 illustrates typical issues such as overly long correct answers and imbalanced multiple-choice options. Despite these challenges, GenAI was helpful for drafting automated feedback, which is highly important for a self-paced and auto-graded course.
FIGURE 4.

Initial assessment questions generated by ChatGPT, Microsoft Copilot and a finalized question.
We also wanted to embed interactive elements within the video content to maintain engagement. We used GenAI to brainstorm ideas for these interactions for reflective prompts and short questions to scenario-based challenges. These ideas were curated, adapted, and aligned with the learning objectives to ensure they provided meaningful reinforcement of the learning goals.
Prototype
We designed the presentation slides for each module. This stage was important before moving into multimedia production. With Copilot, we outlined the key concepts from the narration scripts, then brainstormed ways to visualize complex ideas clearly and engagingly. Design suggestions embedded in Microsoft PowerPoint and additional input from ChatGPT helped us to develop new layout and visualization ideas. Every slide was then carefully customized to align with our visual identity and instructional goals. Before finalizing, we reviewed the slide decks for consistency, clarity, and visual impact to ensure they effectively supported the learning experience.
Once we had our visual assets, we generated the audio narration of the script with the help of Audiate (see Figure 5). While the AI narration happened instantly, we still had to edit the pitches, speed, and add pauses. We also noted a few pronunciation errors and had to make necessary changes. For example, the AI was not able to pronounce the word “mentees”, but this was fixed with a simple hyphen (−), “men-tees”, which was not always the case. Using AI voice-over saved us time when compared to recording it ourselves, and we did not have to worry about recording equipment, accents, editing, mispronunciation, or getting frustrated because of the recording mistakes. One other advantage of us using Audiate was that we were able to migrate the audio to Camtasia for easier video editing and continue to develop it during that process.
FIGURE 5.

Sample audio clip generated using Audiate’s text-to-speech tool for instructional narration. Listen to the audio track through this link.
In Camtasia, we matched the slides to the audio and generated videos and exported them to YouTube, where we utilized the auto-captioning feature powered by AI, making sure the timings of the captions were aligned. Finally, we exported the videos, texts, H5P interactives, assessments, and templates onto the Moodle platform and completed our first module.
Initial Review
We maintained an iterative process with the SMEs who were in our research team through ongoing feedback, initial user tests with potential users of the modules, and through external evaluation. Since our design process is still ongoing, we are sharing the initial reactions to our modules in this case.
A). SME REVIEW:
Experts of our design team (PIs) involved in reviewing the design throughout the design process. We held biweekly project meetings to review the modules as well as collected written feedback in our shared design folders. We experienced that the following elements of the design required excessive revisions in the SME review process related to GenAI related design issues.
-
Video Narration and Content Accuracy: Our primary sources for developing the module content were textbooks, peer-reviewed articles, and credible websites.
When we experimented with using AI tools for content generation, we encountered outputs that lacked depth and included inaccuracies. Without an extensive review process, such errors could easily go unnoticed. For instance, in the first module, the AI-generated draft inaccurately generated data analysis as a research method, which was not correct in our context (see Figure 6). In such cases, SME review cycles were quite helpful to strengthen the accuracy of the content.
Assessment Questions: Al-drafted assessment questions were often misaligned with the intended learning objectives and tended to target lower-level recall rather than higher-order thinking skills. We identified prevailing structural formats present in the AI-created responses that limited validity. For example, the correct answer choice was always the longest, which made it convenient to isolate the correct answer choice without understanding the actual content; the experts continually reviewed the questions to establish validity and whether or not they were accurate and aligned with the learning objectives and audience (see Figure 6).
FIGURE 6.

Screenshots of some of the SME review comments.
B). USER TESTS:
Once we completed our first prototype, we invited four participants who represented our target users of the online modules. The group included two faculty members, one administrator, and one graduate student in the field of biomedical science. To gather feedback, we conducted one-hour online tryout sessions where participants explored the module platform, engaged with interactive activities, reviewed the content, and navigated the Moodle system.
During these sessions, participants were encouraged to think aloud and share their impressions of the design, content, and usability. They were interviewed briefly about their user experiences. This process helped us to observe how real users interacted with the prototype and identify the areas we needed to revise before moving forward. (see Figure 7).
FIGURE 7.

Selected screenshots from user testing sessions illustrating interactions with course materials and feedback points.
Users followed a similar sequence to what actual learners would experience in the course. We asked them to start the interactions on the main page, review the learning objectives, and then navigate through the module sections. They watched an example instructional video narrated by an AI voice and completed interactive assessment questions to experience how the module incorporated multimedia and interactivity.
Scenarios: Scenarios were an important part of our design because we wanted to make the content meaningful for biomedical researchers. They were also one of the main parts where we had GenAI support. During user tests, participants encountered GenAI drafted scenarios that introduced fictional program contexts and evaluation challenges. They interacted with these by analyzing each situation and identifying the evaluation focus. However, users said that the prototyped scenarios did not match their own work contexts. The early versions focused mostly on clinical examples (see Table 4). Users suggested adding new scenarios, such as assessing the impact of training programs on research productivity and evaluating outreach programs. In response, we decided to collect scenarios directly from real users, instead of integrating GenAI suggestions.
Al Voice: We had some concerns about using Al voice-overs in our instructional videos. Although it helped speed up the development process, we were not sure if it would sound authentic to learners. Surprisingly, users mentioned that the AI voice did not affect their engagement. They said they were focusing on the content rather than the voice. As a result, we decided to keep using AI voice to narrate our instructional videos. However, it is worth noting that we tended to use a male AI voice, as it sounded more natural to both users and experts. This choice, however, concerned us with gender representation in our design.
Assessment Questions: Users interacted with drag-and-drop, multiple-choice, and scenario-based questions and received immediate feedback. Users responded positively to them. They enjoyed the interactive format and found it helpful for engaging with the content. However, similar to their feedback on the scenario section, users mentioned that the scenario-based questions did not fully reflect their own work contexts. As a response, we decided to change all the scenario-based questions based on the new scenarios we would collect from real users.
ITERATIVE DEVELOPMENT PHASE
The final phase of our process was where all the planning, design work, and feedback implementation came to life. This is where we produced multimedia content and deployed it on the learning platform, while continuing to collect and respond to feedback. Table 6. Provides an overview of our design decisions that we incorporated in our final draft development.
TABLE 6.
The summary of key instructional design decisions and AI involvement.
| DESIGN AREA | INITIAL APPROACH WITH AI | CHALLENGES ENCOUNTERED | FINAL DECISION |
|---|---|---|---|
| Development of Learning Objectives/Curriculum | Prompting ChatGPT to draft course level and module level objectives and curriculum mapping | ChatGPT produces learning objectives including more than 1 Bloom’s taxonomy verbs which are not measurable. | Rewritten using one measurable verb per objective, aligned to assessments. |
| Video Script Drafting | Used ChatGPT to accelerate drafting and structure content. | AI overused mechanical phrases like “deep” dive” or “Let’s explore,” leading to repetitive and unnatural tones. Also, em dash (—), and short forms were frequently used by AI. | Scripts were edited manually to improve flow, reduce repetition, and ensure a natural, conversational tone. |
| Case Development | Prompted ChatGPT to draft realistic biomedical education scenarios. | Initial outputs lacked contextual specificity and nuance required for the learners’ real context. | We asked real learners to provide examples from their own context instead of relying on AI generated examples. |
| AI Images for course images | AI Images for Course Graphics | Experimented with AI-generated course banners and illustrations early in the design process. Generated images were inconsistent. | Discontinued AI image generation; hired graphic designers. |
| AI Images for Scenarios | AI for drafting visual elements for case scenarios | Misspelt sentences and noncohesive images were generated | We decided to use our own images instead. AI tools lack the authenticity and accuracy for the design also raised questions for creativity. |
| Assessment Questions | Used ChatGPT and Copilot to draft H5P type and scenario-based items | Issues included bugs like correct answers always being the longest, and scenarios not reflecting real contexts. | Revised AI-generated questions based on expert feedback; when prompts were refined, results were more satisfactory. For creating automated feedback text for the correct and wrong answers, AI was quite helpful. |
| Accessibility | Used AI to suggest alternative text and screen-reader-friendly formats | AI offered helpful format ideas. | Used AI-generated suggestions as a starting point; final materials reviewed manually for full accessibility compliance. |

Content Refinement
We refined our content based on the feedback received from our user test. This time, instead of using the general conversational based GenAI, we started using a custom GPT where we utilized multiple prompting techniques like ROLE-based and few-shot prompting to avoid repeating the instructions repeatedly, trying to refine the output. This helped the Al to understand what we were trying to look for and get us the results with one or fewer prompts (see Figure 8).
FIGURE 8.

Custom GPT prompt, our final version of the slides based on the output generated.
The deep-thinking feature with sources provided a chance to validate the content’s authenticity as well as increase our confidence to use the material. With these advancements, we created our materials for the first module.
Upon completion of the instructional materials and editing quality, we pushed on to upload the full modules to our learning management system, Moodle. We structured modules to guide learners through a cohesive learning journey. It began with a clearly stated set of learning objectives, followed by the instructional content, including narrated videos embedded with interactive components (see Figure S1 in the Appendix).
We aimed to promote engagement by embedding interactive H5P-based activities and tools in the course, such as drag-and-drop activities, decision-making activities, and self-assessment tools (see Figure 9 and Figure S2 in the Appendix). In addition, downloadable project templates were provided to offer students an opportunity to apply knowledge in the real world (see Figure S3 in the Appendix). A reflective checklist supported students in monitoring their understanding and progress, while a summary section helped reinforce key takeaways. Each module ended with a formative quiz to assess students’ conceptual clarity prior to continuing on to the next unit. This scaffolding process of layering was intended to scaffold learning in an accessible and consistent way across all the modules.
FIGURE 9.

H5P activity visual suggestion and adapted version for our online module.
External Evaluation
In addition to SME reviews and user tests, we worked with an external evaluator to gain an independent perspective on the flow, clarity, and overall design of the learning experience. The evaluator’s feedback helped us refine both instructional and visual elements. The feedback also provided critical input on GenAI-related aspects of the design.
The evaluator stated that the AI male voice lacked natural pauses and tonal variation, which made the narration sound mechanical and cognitively demanding for learners. Although this AI voice was initially perceived as more natural by both users and experts during early testing, the evaluator raised concerns about gender representation and suggested alternating between male and female AI voices for better inclusivity and engagement. Following this feedback, we decided to incorporate both male and female voices across modules to enhance learner experience. The evaluator also recommended stronger alignment between AI narration and on-screen visuals by introducing progressive text reveals and clearer visual signaling.
The external evaluator also shared the importance of assessment authenticity. Quiz questions were described as well as video examples, leading to redundancy for the learners. The evaluator encouraged the team to design more applied and contextually varied questions that would help learners transfer knowledge to their own projects. Feedback was also inconsistent across quizzes. Some questions provided detailed rationales, while others gave minimal confirmation of correctness. Based on these comments, we standardized feedback to include explanations for both correct and incorrect answers. In some cases, we attributed feedback to subject matter experts to strengthen teaching presence and authenticity.
Regarding content and instructional flow, the evaluator highlighted the need to strengthen learner motivation early in the course. They recommended starting Module 1 with a relatable scenario illustrating how weak evaluation design can lead to project failure. This would help learners see the real-world relevance of evaluation concepts from the start.
Finally, the evaluator emphasized accessibility and user experience concerns, including inconsistent performance of interactive H5P elements and the need for keyboard-navigable alternatives. These recommendations led to technical revisions and simplified interaction designs, such as replacing drag-and-drop activities with checklists or multiple-choice formats.
Overall, this external evaluation played a critical role in strengthening both the instructional design and the Al integration process. While AI tools helped us accelerate video and quiz development, the evaluator’s insights reminded us that human review and iteration are indispensable for maintaining a depth pedagogical approach in the ID process. With the help of SME expertise, user testing, and external evaluation allowed us to view our design through multiple human-centered lenses and refine it iteratively.
DESIGN DILEMMA
We would also like to share our core design dilemmas during this process. We were using GenAI tools in the instructional design process at a time when everything was still new, and most people were experimenting rather than relying on evidence-based practices. There were moments when we were unsure how much to trust Al or how to document our use in a transparent and ethical way. We were learning as we went. To reflect on our experience, we can summarize our design dilemmas as follows:
The Core Dilemma Efficiency vs. Creativity:
We repeatedly questioned the appropriateness and ethics around relying on GenAI tools to generate educational content. This question became one of the biggest tensions for us. While AI tools helped us generate content in a timely way, it caused us to question the authenticity, originality, and pedagogical depth of what we were producing. We often found ourselves pausing to consider whether we were using AI as a supportive tool or beginning to depend on it too much.
Human Judgement vs. Automation:
Through multiple rounds of SME feedback, user testing, and external evaluation, we gradually built a clearer workflow. We used AI for early ideation and drafting, but we handled all alignment, accuracy checks, and contextualization through human intervention. Our faculty mentors played an important role in helping us distinguish between content that was merely correct on the surface and content that was pedagogically meaningful for learners. Over time, we reached the conclusion that AI could help us start the work, but it could not make pedagogical decisions.
Ownership vs. AI Generation:
The use of AI also raised important questions about copyright and attribution. Because GenAI tools use large and often non-transparent datasets, we were never certain about the originality of some outputs, and made us question who owns AI-generated content. Is it the user, the developer, or the system itself? This uncertainty made us cautious about how we used AI. To stay on the safe side, we limited its use to idea generation and made sure that all final content was written and verified in our own words.
Iterative Refinement vs. Control:
Interestingly, we discovered that iterative prompting or refinement with AI did not always lead to improvement, especially with the image generations. At times, repeated iterations introduced errors, redundancies, or diverged from the intended pedagogical focus. We learned that human provided control was critical at each stage, especially after the first draft, to not stray from the educational focus (see Figure 10).
FIGURE 10.

Comparison of two AI-generated image styles using DALL·E. The first contains minor spelling errors; the second avoids text and is better suited for visualizing case-based content.
REFLECTIONS
As we progressed through these phases, several key human-AI collaboration themes emerged based on our personal reflections during the design process:
Al as not a Replacement:
Al provided a quick way to generate content ideas, but did not replace human judgment, contextual knowledge, and pedagogical reasoning. We discovered that AI- generated ideas require continual human input for accuracy and relevance. Al can be helpful in supporting some instructional design tasks, but it cannot replicate the rationality of instructional designers’ processes using their background knowledge and experience to make these nuanced decisions.
Decision-Making and Trust in AI Outputs:
One of our largest unknowns was gauging when to trust AI-generated materials. We found ourselves completely restructuring some of the AI-generated materials, and at times we began to lose trust in AI-generated text altogether. Over time, however, we discovered the ability to distinguish the AI-generated content that was usable versus the AI-generated content that was cumbersome and needed significant modifications. These reflections support the perspective that prompt engineering, along with subject-area expertise, influences quality outputs, and instructional designers must be engaged in reviewing and editing content generated by Al.
The Role of AI in Structuring Designers’ Thinking:
AI tools assisted in scaffolding our design thinking because Al provided structure in terms of outlines, frameworks, and examples that related to biomedical sciences that we may not have developed independently. Yet, we are also worried about building a dependency on AI outputs because it runs the risk of authorship and agency in the instructional design process. We should not focus on designing GenAl out of the learning experience, or designing it into the learning experience, but designing instruction so students actually learn.
Ethical and Practical Considerations:
As we considered the practical use of AI in generating instructional materials and educational content, it raised a number of ethical discussions regarding data privacy, authorship, and whether the materials generated would be valid. Our team created practices for maintaining ethical integrity, such as documenting Al contributions with clarity, having human oversight, and an SME review at every stage.
Care and caution should be very clear for instructional designers or e-Learning developers using ChatGPT and not use the platform to create content that contains real facts. Specificity, awareness of its limitations with real facts, and using personas that are not associated with harmful or discriminatory opinions can help in preventing unethical or offensive content output. The integration of elemental Al throughout instructional design workflows has both opportunities and risks, further pointing to the importance of using responsibly, being ethically aware, and keeping a human at the helm.
CONCLUSIONS
Although AI has allowed us to be more efficient in producing content, it was sometimes a limitation on our creative freedom and exchanged some fluidity for a robotic-sounding language.
Betz (2023) describes two primary forms of artificial intelligence: “capability based” and “functionality based.” Capability-based AI is meant to exceed human ability, while functionality-based AI is intended to use its learned ability to modify and respond to their environments. GenAI aligns more with functionality-based Al as it generates a response based on data patterns.
However, it does not truly understand educational goals, learner needs, or contextual details. Instead, it produces content based on statistical probabilities rather than critical thinking or teaching insight. Therefore, its outputs need careful review, revision, and shaping by instructional designers, who provide the necessary subject matter knowledge and teaching foundation to ensure quality and relevance.
To address these challenges, we treated AI-generated materials as scaffolds rather than finished products. Every AI output was reviewed and adjusted by human judgment. This approach allowed us to benefit from GenAI while maintaining the ethical standards and academic integrity vital in instructional design.
As GenAI becomes more common in educational workflows, designers will take on roles beyond content creators. They will also be curators, editors, and evaluators of AI-generated outputs. This changing role requires new skills, such as prompt engineering, critical evaluation of AI content, and a solid understanding of how to connect technology with teaching objectives.
Going forward, a discerning integration of GenAI within instructional design will require some amount of reflection, testing, and collaboration across different fields. We must continue to document real-life examples that analyze the outcome of GenAl practice in different educational settings, and as GenAI tools improve, instructional designers will need to continue to adjust their practices to use their full potential. This will include guidance on responsible uses, developing policies around ethical integration of AI at institutional levels, and fostering an atmosphere of transparency about the design process.
At the end of the day, GenAI does not really add value solely because it may replace some piece of human expertise, though engaging, or maintaining human expertise is important. Thoughtful and ethical use of GenAl can help instructional designers create learning experiences that are more efficient, authentic, and innovative.
However, this requires balance and care taken consistently while focusing on teaching, ethics, and learner-centered design practices. By thinking of GenAI as a collaborator rather than a substitute for the human brain, instructional designers can and will work to help create a future where technology supports education in its integrity and effectiveness.
Supplementary Material
ACKNOWLEDGMENTS
Research reported in this publication was supported by the National Institutes of Health (NIH) under award number R25GM154009. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
AI DISCLOSURE STATEMENT
We acknowledge the assistance of AI tools in the preparation of this design case. AI tools such as Grammarly, ChatGPT, and Gemini supported the grammar and language editing to enhance readability. Also, these tools assisted in outlining the design case structure. In addition, we included AI-generated images in this manuscript to showcase AI-generated artifact examples, which are the core component of this case. All the authors carefully reviewed all content and ensured the content was original and compliant with ethical guidelines.
Contributor Information
Divya Suresh, Ph.D. student in the Human-Computer Interaction program at lowa State University. Her research explores the use of Artificial Intelligence in education..
Melis Dilek, Ph.D. student in the Educational Technology and Human-Computer Interaction programs at lowa State University. Her research interests include Artificial Intelligence in education, teacher education, and distance education..
Aliye Karabulut-Ilgu, director of Curricular Assessment and Teaching Support (OCATS) at the College of Veterinary Medicine at Iowa State University. Her work focuses on curricular assessment, program evaluation, and technology in higher education..
Evrim Baran, Professor of Educational Technology in the School of Education at Iowa State University. Her research investigates the design and evaluation of learning technologies within teacher education and STEM learning contexts..
Michael Kimber, Chair and Professor in the Department of Biomedical Sciences at lowa State University College of Veterinary Medicine. His research investigates molecular and physiological mechanisms in parasitic nematodes..
REFERENCES
- Betz S (2023). 7 types of artificial intelligence. Built In. https://builtin.com/artificial-intelligence/types-of-artificial-intelligence [Google Scholar]
- Ch’ng LK (2023). How Al makes its mark on instructional design. Asian Journal of Distance Education, 18(2), 32–41. 10.5281/zenodo.8188576 [DOI] [Google Scholar]
- Luo T, Muljana PS, Ren X, & Young D et al. (2025). Exploring instructional designers’ utilization and perspectives on generative Al tools: A mixed methods study. Education Tech Research Dev 73, 741–766 (2025). 10.1007/s11423-024-10437-y [DOI] [Google Scholar]
- National Institutes of Health (NIH), National Institute of General Medical Sciences (NIGMS) (2023). Modules for enhancing biomedical research workforce training (R25 - independent clinical trial not allowed). https://grants.nih.gov/grants/guide/pa-files/PAR-24-040.html
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
