Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Aug 9;15:29202. doi: 10.1038/s41598-025-14164-z

Enhancing art creation through AI-based generative adversarial networks in educational auxiliary system

Yongjun He 1,, Shijie Zhang 2
PMCID: PMC12335513  PMID: 40783610

Abstract

Creative art education increasingly demands interactive, personalized tools that support students in developing aesthetic expression and technical skill. Traditional digital art tools often lack adaptive feedback and require manual intervention, which limits their pedagogical potential in self-guided or remote learning environments. This study introduces an AI-enhanced educational auxiliary system powered by Generative Adversarial Networks (GANs) to support art creation, creativity development, and learning engagement. The proposed system integrates a hybrid GAN architecture that enables semantic sketch-to-image transformation, style transfer, and real-time visual feedback based on user input. Students can co-create digital artwork with the system, which dynamically learns their preferences and evolves to provide constructive artistic suggestions and inspirations. The GAN model is trained on curated datasets of historical and contemporary art styles, enabling it to emulate diverse visual aesthetics. Evaluation on a cohort of 60 undergraduate art students showed a 35.4% improvement in creative output quality, as judged by expert reviewers, and a 42.7% increase in engagement over traditional tools. The system also offers explainable visual outputs that foster reflection and critique. This work provides a scalable AI-assisted learning framework that enhances artistic exploration while preserving creative autonomy in educational settings.

Keywords: Art creation, Generative adversarial network, Education system

Subject terms: Computational science, Computer science, Scientific data, Software, Statistics

Introduction

Art education plays a vital role in nurturing creativity, expression, and critical thinking, yet it faces unique challenges in the digital age 1. The rapid integration of technology into classrooms has opened new pathways for artistic exploration, but it has also exposed gaps in traditional instructional models. Students often lack timely feedback, creative scaffolding, and access to adaptive learning tools that support divergent thinking. Moreover, the transition from conventional studio environments to digital platforms has highlighted the need for intelligent systems that go beyond static drawing tools and offer real-time, interactive assistance 2. While digital art software and tablets are widely used, most serve as passive instruments that do not contribute pedagogically to the student’s creative process. This passive nature limits the learning potential for beginners, who may struggle with technique, composition, or confidence in their creative decisions 3.

Recent advancements in artificial intelligence (AI), particularly in generative models, offer compelling opportunities to reimagine how art can be taught and created in educational settings 4. Among these models, Generative Adversarial Networks (GANs) have demonstrated remarkable success in generating realistic images, transferring artistic styles, and even co-creating visuals with human input. The dual-network architecture of GANs, composed of a generator and a discriminator, enables them to learn fine-grained visual features and produce outputs that mimic human-created art with high fidelity 5. This generative capacity makes GANs especially well-suited for educational use cases where visual suggestion, feedback, or iterative transformation of creative input is required.

In educational psychology, it is well understood that creativity flourishes when learners are encouraged to explore, iterate, and reflect on their work 6. A generative model that reacts to user input by producing stylistically diverse and semantically meaningful artwork can serve as a valuable pedagogical partner. Instead of replacing human creativity, AI systems like GANs can provide stimuli that inspire student imagination, validate exploratory sketches, and demonstrate alternative artistic pathways 7. However, despite the popularity of GANs in fields such as entertainment and fashion, their integration into formal educational settings–especially in visual arts curricula–remains underexplored. There is limited research on the design, implementation, and evaluation of GAN-powered systems that are tailored specifically for classroom use and aligned with pedagogical goals 8.

Traditional AI applications in art education often focus on classification tasks, such as style recognition or visual content tagging. These approaches offer analytical insights but do little to assist students in the actual process of art creation 9. Other models are designed for automated art generation but operate in a closed-loop fashion, generating fixed outputs without user influence or contextual relevance. For an AI tool to be pedagogically effective in an educational context, it must support co-creation, facilitate iterative learning, and offer feedback in ways that align with how human instructors guide creative development. Furthermore, the system must be intuitive, explainable, and customizable to diverse learning styles, artistic media, and curriculum standards 10.

Another major limitation of existing tools is their lack of real-time interactivity. Students engaged in art creation benefit immensely from dynamic feedback loops, where they can modify inputs and immediately observe the effects of those changes 11. Most current generative art tools either operate in batch mode or require significant computational overhead, making them impractical for classroom environments. There is a pressing need for efficient, lightweight, and responsive AI systems that can be deployed in schools, studios, and online learning platforms. These systems must not only generate high-quality visuals but also integrate seamlessly with digital drawing interfaces and support multimodal inputs such as sketches, textual prompts, and semantic maps 12.

Moreover, the creative process is deeply personal and culturally influenced. AI systems that contribute to art education must therefore be sensitive to diverse aesthetic traditions and inclusive in their design 13,14. This necessitates the use of curated and representative datasets, interpretable algorithms, and human-centered evaluation criteria. Equally important is the system’s ability to facilitate artistic growth without dominating the creative narrative. The AI should act as a collaborative partner–suggesting, refining, and enhancing–while leaving room for human intention, error, and improvisation 15 as illustrated in Fig. 1.

Fig. 1.

Fig. 1

Potential Challenges and Proposed AI Solutions in Art Education. This figure presents the limitations of traditional art education systems, such as subjective assessments, lack of multi-modal feedback, and inconsistent progression tracking. It contrasts these issues with the proposed solution: a Generative Adversarial Network (GAN)-based educational auxiliary system that enhances objectivity, delivers adaptive feedback, and provides scalable insights into artistic development and student creativity.

In light of these considerations, this study introduces a novel educational auxiliary system for art creation based on Generative Adversarial Networks. The proposed system is designed to empower students by providing adaptive, real-time, and stylistically diverse visual feedback during the art-making process 16. It incorporates a modular GAN architecture capable of handling sketch-to-image transformation, abstract-to-realistic generation, and dynamic style adaptation 17,18. The system supports multi-modal inputs and can be integrated into digital learning environments, including tablets, web platforms, and stylus-based drawing tools.

Evaluation of the system is conducted in both quantitative and qualitative terms 19. Quantitatively, metrics such as Inception Score (IS), Fréchet Inception Distance (FID), and Structural Similarity Index (SSIM) are used to assess the realism and diversity of generated images. Qualitatively, expert reviews, student surveys, and creativity assessment frameworks are used to evaluate the system’s impact on learning engagement and creative expression. The system is tested across a range of educational scenarios, including beginner drawing classes, design thinking workshops, and digital illustration courses 20,21. The system provides real-time stylistic feedback that encourages students to experiment with their creative process, which enhances their ability to critically evaluate and improve their own work. This feedback loop fosters a sense of progression and growth, particularly for beginners who may struggle to visualize alternative approaches.

By bridging AI and arts education, the proposed system contributes to the growing field of AI-enhanced learning technologies. It provides a blueprint for how GANs can be used not only to generate art but to inspire and guide learners. More importantly, it shifts the narrative from automation to augmentation–where AI amplifies human creativity rather than replacing it. The findings of this research demonstrate the feasibility, pedagogical value, and technical scalability of GAN-based auxiliary systems, paving the way for future innovation in creative education. Through this work, we aim to foster a new generation of intelligent educational tools that uphold the integrity of artistic expression while embracing the possibilities of machine intelligence.

Research objectives and contributions

This study introduces a GAN-powered educational auxiliary system designed to enhance artistic creativity and support interactive learning in visual arts education. By integrating generative adversarial networks (GANs) with real-time sketch-to-image synthesis and multimodal user input, the system serves as a dynamic co-creator for students, encouraging exploration and providing immediate visual feedback. The core objective is to bridge the gap between passive digital tools and active AI-driven pedagogical agents that support creativity, iteration, and self-expression. The key contributions of this work are as follows:

  • Design of a modular AI architecture combining semantic sketch-to-image translation, style transfer modules, and latent feature interpolation powered by GANs to support real-time co-creation in artistic learning environments.

  • Development of a pipeline that integrates multimodal user input including freehand sketches, textual prompts, and style references to generate visually diverse and pedagogically relevant art outputs that adapt to learner preferences.

  • The proposed system demonstrates a 35.4% improvement in creative output scores (as rated by expert instructors) and a 42.7% increase in student engagement compared to traditional digital art tools, validated through a controlled classroom experiment involving 60 undergraduate students.

  • Inclusion of explainable AI components such as visual attention heatmaps, intermediate latent walkthroughs, and progressive generation previews to help learners and instructors understand the model’s generative decisions and provide pedagogical scaffolding.

  • Demonstration of the system’s cross-disciplinary applicability and curriculum adaptability, supporting a variety of visual art domains (e.g., sketching, design thinking, illustration), and enabling deployment in both in-person and remote learning environments with minimal retraining.

Literature review

Recent advancements in artificial intelligence, particularly in deep generative models, have transformed how we understand and support creativity in art and design 22. The intersection of AI and artistic education has evolved significantly, driven by the capability of neural networks to synthesize, generate, and interpret complex visual patterns. A considerable body of work has focused on generative adversarial networks (GANs) for visual content creation, positioning them as powerful tools not only for automatic image generation but also for augmenting human creativity in domains such as art, fashion, animation, and design education 23.

One of the primary areas where GANs have demonstrated transformative potential is in sketch-to-image translation. Models like Pix2Pix and SPADE GAN have been employed to convert rudimentary line drawings into highly detailed, colored images by learning the mapping between semantic sketches and realistic representations 24. These models serve as the backbone for various creative applications, enabling novice users to create professional-looking visuals from minimal input. While most implementations of such systems target digital artists or professional creators, recent literature has explored the pedagogical value of these models in classrooms. By allowing students to receive real-time visual feedback from partial or incomplete sketches, GANs can provide guidance, encourage iteration, and build confidence in learners with minimal prior training in art 25.

Another influential direction in this space is the use of style-based generation, where models like StyleGAN2 and BigGAN are trained to capture high-dimensional features of various artistic styles 26. These models allow for latent space manipulation, enabling users to blend, interpolate, and traverse through different artistic domains. Educational systems that incorporate these capabilities have the potential to introduce students to a wide variety of styles, promoting an understanding of composition, texture, and color theory. Some studies have integrated these models with interactive UIs, allowing students to select or evolve artworks through intuitive parameters, thus supporting a hands-on exploration of artistic style and generative design 27.

Beyond generation, other research focuses on using GANs and related deep learning models as collaborative tools in co-creative art-making processes 28. These systems function not as black-box generators but as interactive assistants that evolve their output based on human input. For instance, systems like DeepThInk provide a sandbox environment where users can sketch, provide textual cues, or iteratively modify generated outputs. These co-creative systems emphasize responsiveness and mutual influence between human and machine. In educational settings, such frameworks can scaffold creative thinking by presenting alternatives, augmenting incomplete ideas, and providing stylistic variety–all of which are essential for building creativity and originality in students 29.

Despite these advancements, most existing AI-powered art platforms are not built with pedagogical goals in mind. Many lack curricular alignment, educational scaffolding, or evaluation metrics that could guide their integration into formal learning environments. Furthermore, their usability often assumes a certain level of technical skill or artistic background, making them inaccessible to beginners 30. Studies also show that existing systems tend to operate in isolated workflows, lacking integration with classroom learning management systems or real-time data collection for assessment. Consequently, there is a growing recognition of the need for educational auxiliary systems that embed AI into structured, learner-centered environments, particularly for foundational and intermediate-level art education 31.

In terms of performance evaluation, research in this domain has adopted both quantitative and qualitative approaches. Commonly used quantitative metrics include the Inception Score (IS), Fréchet Inception Distance (FID), and Structural Similarity Index (SSIM) to measure the realism and diversity of generated images. While these metrics are useful for benchmarking generative quality, they fall short in capturing pedagogical efficacy 32. More recent studies have supplemented these with user-centric measures such as engagement time, creativity ratings by instructors, and self-reported learner satisfaction. For instance, systems that incorporate real-time interaction, visual suggestion, and guided refinement tend to report higher engagement and perceived learning outcomes, highlighting the value of dynamic, adaptive systems. This Table 1 presents GAN-based systems from 2024–2025 focused on art generation and creative education. While models improve visual creativity and learner interaction, challenges include domain scope, computational demands, and real-world applicability across diverse learners and media.

Table 1.

Recent Proposed Models in Literature with Possible Limitations.

Ref Proposed Model Technology Used Performance Metrics Contributions Limitations
33 IoT-Integrated GAN for Art Learning GAN, IoT Sensors, Real-Time Feedback Inception Score (4.5), Student Satisfaction (90%) Combines real-time physiological data and GAN generation for art education support. Boosts creativity and interaction. Small-scale trials, high system complexity; generalization needs further testing.
34 DeepThInk: AI Co-Creation Art Tool SPADE GAN, Style Transfer, Web UI Therapist Feedback, Usability Observations Enables therapeutic co-creation using sketch-to-image GAN and filters. Boosts non-artists’ expressivity. Limited object categories; tested with small therapist group only.
35 Sketch-to-Real Multi-GAN Pipeline Pix2Pix, DCGAN, ESRGAN mIoU, Accuracy (97%), Output Quality Cascaded GANs transform sketches into high-res realistic images. Supports concept-to-art flow. High computational cost; requires clean sketches for best results.
36 GA-Tuned Self-Attention GAN (GA-SAGAN) SAGAN, Genetic Algorithms Entropy, Precision/Recall, Loss Suggests and generates diverse art styles. Optimizes attention flow via GA. Requires curated datasets; can’t fully replicate emotional depth of human art.
37 Autolume 2.0 No-Code GAN Art Tool StyleGAN2, ESRGAN, Latent Editing Output Customizability, User Feedback Artist-friendly platform to train and control GANs on custom art. Enables style blending. GPU-dependent; dataset quality critical for output fidelity.
38 GAN with Sparse Encoding for Art Repair Conditional GAN, Sparse Coding, CNN Restoration Accuracy, Observer Ratings Repairs/inpaints art while preserving structure. Enhances CAD workflows. Not suitable for 3D/multimedia; complex parameter tuning required.
39 GAN-RL Interactive Exhibition System GAN, Reinforcement Learning, AR/VR Accuracy (0.99), Engagement Signals Adapts visuals based on user behavior in virtual art shows. Promotes personalized learning. Needs AR/VR setup; domain-specific; adaptation-vs-authenticity trade-offs.
40 Hybrid GAN-CNN for Sculpture Creation GAN, CNN, Visual Feature Analysis Style Realism, Expert Comparison Generates sculpture visuals with style control. Assists in teaching 3D form. Image-based only; lacks 3D model generation; small dataset limits diversity.

Some literature has explored the integration of GANs with reinforcement learning (RL) and attention mechanisms to improve the adaptability and context-awareness of generative outputs. These hybrid models show promise for educational use, where the system must respond to evolving user goals or provide customized outputs 41. For example, GAN-RL systems used in virtual museum exhibits adapt visualizations based on visitor behavior, thereby offering potential for similar personalization in educational art tools. However, such systems often require large-scale sensor integration or complex simulation environments, which can limit their feasibility in standard classroom settings.

Explainability and interpretability of AI decisions is another growing concern in the deployment of generative models in educational settings. While models such as CNNs and GANs can produce stunning visual outputs, they often lack mechanisms for explaining why certain visuals were generated or how specific inputs influenced the output 42,43. To address this, recent work has incorporated attention heatmaps, latent vector visualization, and intermediate feature maps to make the decision-making process more transparent. These techniques are particularly valuable in educational contexts, where learners and educators need insight into the system’s behavior to reflect, critique, and improve artistic outcomes.

Importantly, the scalability and inclusivity of AI-powered systems remain under-addressed in much of the current literature. Many GAN-based educational systems are trained on curated datasets that do not reflect diverse cultural aesthetics or non-Western art traditions 44,45. This limits their generalizability and raises concerns about bias and representation. Recent discussions in AI ethics emphasize the need for training datasets and generative models that are inclusive and culturally adaptive, especially in education where students’ creative expression is deeply personal and often rooted in cultural identity. The proposed system surpasses earlier GAN-based models in creative support, real-time feedback, and educational integration see Table 2.

Table 2.

Comparison of Existing Models with the Proposed GAN-Based Educational Art System. Abbreviations: Creat. (Creativity Support), Interact. (Interactivity), RT Feed. (Real-Time Feedback), Multi-Inp. (Multi-Modal Input), Edu. Use (Educational Use), Expln. (Explainability), AI Use (Artificial Intelligence Usage).

References Creat. Interact. RT Feed. Multi-Inp. Edu. Use Expln. AI Use
33 Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic X Inline graphic
34 Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic X Inline graphic
35 Inline graphic Inline graphic X X Inline graphic X Inline graphic
36 Inline graphic Inline graphic X Inline graphic X X Inline graphic
37 Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic X Inline graphic
38 Inline graphic Inline graphic X Inline graphic X Inline graphic Inline graphic
39 Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic X Inline graphic
40 Inline graphic X X X X Inline graphic Inline graphic
Proposed Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic

In summary, the current body of literature demonstrates a growing interest in AI-generated art and its applications in education, yet it also reveals substantial gaps. Most existing models prioritize artistic quality over educational usability, lack real-time interactive capabilities, and often do not provide meaningful feedback aligned with pedagogical goals. Few systems are explicitly designed for novice learners, and even fewer offer explainability, user-driven co-creation, or cross-curricular adaptability. There is thus a significant opportunity to design AI-powered art tools that serve not only as generators of images but as intelligent collaborators that support the learner’s creative journey. This paper addresses this gap by presenting a GAN-based educational auxiliary system that emphasizes interactivity, interpretability, and real-time co-creation for visual art education.

Problem statement

Art education traditionally relies on studio-based mentorship and manual critique, which can limit personalized feedback and continuous creative exploration, especially in digital or distance learning environments. While digital tools and platforms have made art creation more accessible, they often function as passive canvases, lacking adaptive or pedagogical features that foster creativity or learning. Moreover, beginner-level students may struggle with generating or refining artistic ideas due to a lack of exposure to diverse styles and feedback mechanisms. Existing AI applications in art creation have largely focused on generative art for professional or entertainment use, with minimal emphasis on their integration into education. There is a significant gap in designing intelligent systems that not only generate creative content but also serve as collaborative learning companions for novice artists as illustrated in Tables 1 and 2. The challenge lies in building a responsive system that understands visual semantics, guides users without dominating their creativity, and aligns with educational goals. This research addresses the need for an interactive, AI-powered auxiliary system that supports art education by leveraging GANs to provide intelligent, real-time assistance, inspiration, and feedback during the creative process.

Proposed architecture

System overview

The proposed architecture is an end-to-end generative framework that facilitates co-creative learning in visual arts education through the use of Generative Adversarial Networks (GANs). It is designed to support students in sketch-based and concept-driven art generation by accepting multimodal input and providing real-time generative feedback. The system comprises three core components: an input encoding pipeline, a GAN-based generation module, and a style transfer and personalization module. These components work together to transform semantic sketches or textual prompts into high-fidelity artistic outputs. The system is built to be interactive and interpretable, offering users not only visual outcomes but also visual cues (e.g., attention maps and progressive previews) that assist learning. It is optimized for deployment on GPU-supported educational platforms, including digital tablets, web applications, and classroom projection systems as illustrated in Fig. 2.

Fig. 2.

Fig. 2

GAN-Based Educational Auxiliary System Architecture. This diagram illustrates the flow of the proposed GAN-based art creation system. It includes a style encoder, generator, discriminator, feedback engine, and a learning database for personalization. The AI system produces refined artwork outputs and delivers personalized guidance through a feedback loop designed for learner creativity enhancement.

LMS integration and curriculum alignment

The GAN-based system can seamlessly integrate with existing Learning Management Systems (LMS) such as Canvas and Moodle, allowing educators to track student progress and provide feedback directly through the LMS interface. This integration ensures that students’ assignments, style selections, and feedback are synced, making the system easily adoptable within established educational platforms. Furthermore, the system can be customized to align with specific curriculum standards in art education, enabling instructors to set learning goals, monitor student performance, and deliver targeted feedback based on the students’ creative development.

Input modalities and feature representation

The system accepts multiple input modalities to provide flexible user interaction. These include hand-drawn sketches, textual style or concept descriptions, and optionally reference images for conditioning. Each input modality is processed through a corresponding encoder: sketches are converted to semantic maps using edge-detection or pre-trained encoders, while textual prompts are encoded via a transformer-based model such as BERT to capture semantic intent. Style references are processed via convolutional feature extractors to obtain global style vectors. All feature representations are normalized and projected into a shared latent space to enable seamless fusion across modalities. This multimodal fusion facilitates a deeper contextual understanding of the user’s creative intent, allowing the generative module to synthesize more relevant and coherent outputs. The multimodal input design, including sketches, textual prompts, and style references, offers a flexible environment for student creativity. As we continue to refine the system, we plan to incorporate additional input types, such as audio cues or gesture-based controls, to make the system even more inclusive and responsive to diverse artistic processes.

GAN-based generation module

At the core of the system lies a conditional GAN (cGAN) framework, where the generator G maps the fused latent input z to an image domain Inline graphic, and the discriminator D evaluates the authenticity and alignment of the output with the conditioning input. Formally, the adversarial objective is defined as:

graphic file with name 41598_2025_14164_Article_Equ1.gif 1

where x represents the input sketch or conditioning vector, and y is the corresponding ground-truth image. The generator is based on a U-Net architecture with skip connections, allowing fine-grained detail preservation from the input. An auxiliary reconstruction loss is added to enforce pixel-level consistency:

graphic file with name 41598_2025_14164_Article_Equ2.gif 2

The final objective function combines adversarial and reconstruction losses:

graphic file with name 41598_2025_14164_Article_Equ3.gif 3

where Inline graphic is a weighting coefficient to balance realism and content fidelity.

Style transfer and personalization

To support creative exploration and stylistic variation, a style transfer module is integrated into the generative pipeline. This module accepts style embeddings extracted from reference artworks and injects them into the generator via adaptive instance normalization (AdaIN). Let Inline graphic and Inline graphic be the content and style features respectively; AdaIN aligns their statistics as follows:

graphic file with name 41598_2025_14164_Article_Equ4.gif 4

This process enables the generated output to maintain the structural semantics of the sketch while adopting the color, texture, and compositional characteristics of the reference style. Users may dynamically update style references during the creative session to experiment with different visual expressions. Furthermore, a personalization module retains user-preferred style vectors over time, allowing the system to adapt to individual artistic preferences and learning progress as illustrated in Fig. 3.

Fig. 3.

Fig. 3

GAN Training and Optimization Pipeline. This landscape flowchart visualizes the model training pipeline, starting from data ingestion and augmentation to feature extraction, adversarial training, loss computation, and model evaluation using metrics like FID score and user feedback. It highlights the structured optimization stages that guide the learning of generator and discriminator modules.

Discriminator and confidence feedback

The discriminator D in the proposed system not only plays a traditional adversarial role but also functions as a confidence feedback mechanism for learners. During training, D learns to distinguish between real artwork and generated images conditioned on user input. In deployment, its intermediate activations and final probability outputs are repurposed as indicators of image quality and alignment with user intent.

Specifically, the scalar output Inline graphic is used to estimate how realistic and contextually accurate a generated image Inline graphic is with respect to the input x. This value is presented to the learner as a confidence score, where higher scores suggest greater alignment with expert-level execution or stylistic fidelity. Additionally, class activation maps (CAMs) are derived from D’s final convolutional layers to highlight regions that contribute most to the realism evaluation. These maps are overlayed on the output image to provide learners with targeted visual feedback, enabling guided refinement and iterative improvement.

Loss functions and training objectives

To train the model effectively, the architecture optimizes a composite objective that balances multiple competing goals: realism, content preservation, style fidelity, and diversity. The total loss function Inline graphic is a weighted sum of the following components:

  • Adversarial Loss (Inline graphic): Promotes realism in the generated outputs by penalizing divergence from true data distribution, as defined in Equation (1).

  • Reconstruction Loss (Inline graphic): Minimizes the pixel-wise difference between the generated and target images to ensure semantic fidelity (Equation (2)).

  • Perceptual Loss (Inline graphic): Extracts high-level features from a pre-trained VGG network and penalizes their difference between target y and generated image Inline graphic:
    graphic file with name 41598_2025_14164_Article_Equ5.gif 5
    where Inline graphic denotes the Inline graphic layer activation in VGG.
  • Style Consistency Loss (Inline graphic): Uses Gram matrices of style features to align stylistic content between reference and generated outputs.

  • Diversity Loss (Inline graphic): Encourages latent space exploration by penalizing mode collapse. Given two random latent vectors Inline graphic, the generated images Inline graphic should differ proportionally:
    graphic file with name 41598_2025_14164_Article_Equ6.gif 6

The complete training objective becomes:

graphic file with name 41598_2025_14164_Article_Equ7.gif 7

where Inline graphic through Inline graphic are hyperparameters determined empirically to balance visual fidelity, artistic expressiveness, and learner engagement.

Real-time interaction and user interface

To ensure that the system enhances the creative learning experience rather than impeding it, significant emphasis is placed on designing a responsive, intuitive, and visually rich user interface (UI). The frontend is implemented using web-based frameworks that support stylus input, drag-and-drop reference image placement, and live text prompt editing. Users interact with a canvas-like drawing space, from which sketch strokes are streamed directly to the backend processing unit.

The model inference pipeline is optimized for latency using TensorRT and model quantization techniques, ensuring a response time of under 300ms per generation cycle on an NVIDIA RTX GPU. This enables near-instant feedback for learners, maintaining the natural rhythm of the art-making process. When a sketch is submitted, the system displays both the generated output and a color-coded confidence overlay from the discriminator. Users can optionally toggle on visual explanations, such as attention maps, to understand which regions contributed most to the final result.

Interpretability features, such as attention heatmaps and confidence scores, are integrated to demystify the generative process and provide actionable feedback. Attention Heatmaps: Generated using gradient-based methods heatmaps highlight influential regions in the input sketch or output image, displayed as color-coded overlays red for high influence, blue for low in a toggleable panel beside the canvas. Tooltips in plain language shows which parts of your sketch the AI focused on ensure accessibility for novices. Users can zoom in for detailed inspection, while educators access aggregated heatmap data via the administrative panel to inform critique. Confidence Scores: Derived from the discriminator’s output, confidence scores (0–100 scale) indicate the realism and alignment of generated images 85% with a green progress bar for high alignment. A hoverable breakdown explains contributing factors, and qualitative label cater to non-technical users. Educators can view score trends to track student progress. Accessibility Design: Features are optional and toggleable to avoid overwhelming novices. The UI uses high-contrast visuals, screen-reader compatibility, and simple terminology important areas instead of attention weights. A Simplified Mode reduces heatmaps to binary overlays and provides natural-language feedback summaries to add detail here to improve your artwork.

The interface also supports an “evolve” mode, where learners can experiment with latent vector interpolations to observe gradual transitions between multiple generated styles. This promotes creative exploration and helps students develop visual intuition about composition, balance, and contrast. Instructors can access an administrative panel to review session histories, track student engagement, and download visual analytics that summarize creativity metrics over time. With an average response time under 300 milliseconds, the system delivers immediate feedback that encourages students to explore creative variations and refine their work without delay. This quick turnaround time is crucial for sustaining engagement and maintaining the flow of the artistic process. Students can experiment freely, adjusting their inputs in real-time, which helps to develop their visual sensitivity and fosters a sense of ownership over their creative decisions.

Interpretability and deployment

In the context of educational support systems, especially those aiming to enhance artistic creativity, interpretability and seamless deployment are essential for both pedagogical effectiveness and system adoption. The proposed system addresses the challenge of black-box AI by incorporating visual and quantitative interpretability mechanisms that help users understand and trust the generative process. For instance, attention heatmaps are extracted from the generator and discriminator to highlight the most influential regions in the input sketch or prompt. These heatmaps are calculated using gradient-based attention weights Inline graphic over the spatial activations of intermediate layers, where the relative importance of region (ij) is computed as Inline graphic with respect to the generator loss Inline graphic. These are overlaid on the output to visualize model focus. Additionally, Gram matrix comparisons are used for style interpretability. The Gram matrix Inline graphic at layer l for feature map Inline graphic with channels C is defined as:

graphic file with name 41598_2025_14164_Article_Equ8.gif 8

This allows the system to visualize how closely the generated output mimics the stylistic structure of the reference artwork. To improve accessibility, the interpretability features such as attention heatmaps, confidence scores are presented as overlays on the generated artwork. These features are color-coded to provide an intuitive visual explanation of the system’s focus areas. For novice users, we will include user-friendly tutorials and tooltips explaining the significance of these features. These educational resources will guide students and instructors in understanding how the system generates art and assist in reflective learning.

Implementation setup and performance metrics

System environment and tools

The proposed GAN-based educational assistant was implemented using Python 3.10 and PyTorch 2.0. The interactive frontend was developed with Vue.js and HTML5 Canvas for stylus-compatible sketch input, while the backend used Flask APIs containerized via Docker for scalable deployment. Experiments were executed on a system featuring an NVIDIA RTX 4090 GPU (24GB), Intel Core i9-13900K CPU, and 128GB RAM, running Ubuntu 22.04. Model inference was accelerated using TensorRT, and training/evaluation tracking was performed using the Weights and Biases (wandb.ai) platform. Libraries such as Grad-CAM++, OpenCV, and Matplotlib supported visualization of attention maps and feature importance. The system incorporates progress tracking features that allow educators to monitor student performance over time. This section explores how data such as student sketches, style selections, and feedback are captured and analyzed, providing valuable insights for both students and instructors. The system generates automated feedback logs that document each stage of a student’s artwork creation. These logs are integrated into the LMS, enabling educators to track students’ progress and provide targeted, actionable feedback.

Dataset description and preprocessing

The training dataset was curated from QuickDraw, Sketchy, BAM!, and WikiArt repositories, containing over 50,000 paired sketches and style artworks. Each instance included: a sketch Inline graphic, a style image Inline graphic, and a target artwork y. All sketches were binarized and resized to Inline graphic pixels:

graphic file with name 41598_2025_14164_Article_Equ9.gif 9

where Inline graphic and Inline graphic are the dataset mean and standard deviation for sketch pixel values. Semantic edge maps were extracted using the Canny algorithm to enrich structural representation.

Feature encoding and data augmentation

Multimodal inputs were embedded into a shared latent space. Sketch features Inline graphic were extracted via a VGG-16 encoder:

graphic file with name 41598_2025_14164_Article_Equ10.gif 10

Textual prompts were embedded using a pre-trained BERT model, producing a sequence embedding Inline graphic, while style images were processed through VGG-19 to compute style feature maps Inline graphic:

graphic file with name 41598_2025_14164_Article_Equ11.gif 11

To unify modalities, all embeddings were projected into a latent vector Inline graphic via learnable MLP layers:

graphic file with name 41598_2025_14164_Article_Equ12.gif 12

where Inline graphic, Inline graphic, Inline graphic are projection weights. Augmentations included sketch jitter, Gaussian blur, hue shifts, and random rotations Inline graphic, applied as affine transforms to improve generalization.

Training configuration and hyperparameters

The training objective combines adversarial and perceptual losses. The total loss is defined as:

graphic file with name 41598_2025_14164_Article_Equ7.gif 13

Model parameters were optimized using Adam with learning rate Inline graphic, Inline graphic, Inline graphic, and batch size Inline graphic. The learning rate was linearly decayed after epoch 100:

graphic file with name 41598_2025_14164_Article_Equ14.gif 14

Spectral normalization was applied to discriminator layers to maintain stability. Gradient clipping with norm Inline graphic was enforced.

Evaluation metrics

We employed multiple quantitative metrics to assess the system:

  • Fréchet Inception Distance (FID) between real and generated samples:
    graphic file with name 41598_2025_14164_Article_Equ15.gif 15
    where Inline graphic, Inline graphic are the mean and covariance of real data, and Inline graphic, Inline graphic for generated outputs.
  • Inception Score (IS) using KL divergence between conditional and marginal label distributions:
    graphic file with name 41598_2025_14164_Article_Equ16.gif 16
  • Structural Similarity Index (SSIM) to measure perceptual quality:
    graphic file with name 41598_2025_14164_Article_Equ17.gif 17
  • Sketch-Style Consistency Score (SSC) defined as cosine similarity between Gram matrices of style and generated outputs.

Real-time performance was assessed by measuring the average feedback latency Inline graphic, calculated as:

graphic file with name 41598_2025_14164_Article_Equ18.gif 18

These metrics provide a comprehensive picture of the system’s effectiveness in generating high-quality, stylistically aligned, and educationally meaningful artwork. To ensure accessibility of the GAN-based system in low-resource and rural educational settings, we aim to develop lightweight, edge-compatible versions. This involves optimizing models through techniques like distillation and quantization to reduce computational demands while preserving performance. By leveraging edge computing, the system can run on local devices such as tablets and low-cost laptops, minimizing dependence on high-end GPUs and enabling real-time feedback. For more complex tasks, a cloud-edge hybrid approach will be adopted to balance local processing with cloud-based resources.

Simulations and results discussion

Quantitative performance evaluation

To evaluate the generative quality and diversity of outputs produced by the proposed GAN-based educational system, we compared our model against four baseline architectures commonly used in image-to-image translation: Baseline-GAN, StyleGAN, Pix2Pix, and CycleGAN. The comparison was conducted using two industry-standard metrics: Fréchet Inception Distance (FID) and Inception Score (IS).

As shown in Fig. 4, the proposed model achieved the lowest FID score of 34.2, significantly outperforming CycleGAN (53.4) and Pix2Pix (58.9), indicating that the generated images are more visually aligned with real artwork distributions. In terms of Inception Score, our model reached 3.9, the highest among all tested models, suggesting superior output diversity and semantic recognizability. The results highlight that our fusion of sketch, style, and textual inputs through a multi-modal GAN architecture contributes effectively to generating realistic and stylistically consistent outputs.

Fig. 4.

Fig. 4

Quantitative Evaluation using FID and IS. The proposed model achieves the lowest FID of 34.2, indicating high visual fidelity, and the highest Inception Score (IS) of 3.9, reflecting both diversity and recognizability of generated artworks.

Qualitative visual outcomes and user feedback

Beyond quantitative metrics, we evaluated the perceptual quality and user satisfaction of the generated artworks through qualitative analysis. Two user studies were conducted: (1) A student survey (n = 60) rating visual realism of outputs from different models, and (2) an expert review by professional artists and instructors (n = 12) assessing stylistic alignment and educational utility.

As shown in Fig. 5, participants rated the outputs of the proposed model highest for both visual realism (mean = 4.7/5) and style consistency (mean = 4.8/5), surpassing all baselines. In comparison, CycleGAN and Pix2Pix scored 4.0 and 3.8 for realism, and 3.9 and 3.6 for style, respectively. These results reflect the system’s ability to generate educationally meaningful and aesthetically compelling content. Notably, participants commented that the outputs “resembled instructor-quality illustrations” and “captured individual artistic style effectively.”

Fig. 5.

Fig. 5

Subjective Evaluation of Visual Outcomes. Ratings show that the proposed GAN-based system significantly outperforms baseline models in both perceived realism and stylistic alignment, according to students and domain experts.

Ablation study and component impact

To understand the relative contribution of each input modality and architectural component, we conducted an ablation study by systematically removing key modules from the proposed system and evaluating the resulting performance using FID and SSIM metrics. The ablations included removing sketch input, style reference, textual prompt, and the feature fusion layer.

As shown in Fig. 6, the full model achieved the best performance with a FID of 34.2 and SSIM of 0.81. When the sketch input was removed, FID increased to 46.8 and SSIM dropped to 0.72, indicating that sketches are essential for structural guidance. Excluding style reference degraded stylistic fidelity significantly (FID = 49.5, SSIM = 0.69), confirming its importance for visual coherence. Omitting text input resulted in FID = 44.7 and SSIM = 0.75, suggesting textual prompts aid in theme alignment and abstraction. The removal of the fusion mechanism yielded the worst performance (FID = 52.1, SSIM = 0.66), demonstrating the critical role of effective multi-modal integration. This component-level evaluation confirms that all inputs – sketch, style, and text – contribute uniquely and complementarily, validating the hybrid architecture design and its necessity for producing personalized, stylistically accurate educational artwork.

Fig. 6.

Fig. 6

Ablation Study Results. Removing key components (sketch, style, text, fusion) results in a significant drop in performance. The full model achieves the best scores, confirming the complementary nature of each input modality.

Latency and real-time responsiveness

To assess the suitability of the proposed GAN-based educational system for real-time classroom environments, we measured inference latency and scalability under increasing user loads. Latency was defined as the time elapsed from input submission to output generation, including pre-processing, model inference, and post-processing.

As illustrated in Fig. 7a, the proposed model demonstrated an average inference latency of 278 milliseconds per request, outperforming baseline architectures such as CycleGAN (430ms), Pix2Pix (470ms), and StyleGAN (510ms). This responsiveness makes it viable for live feedback scenarios in digital art classrooms. Figure 7b shows the system’s scalability when deployed in a server-based architecture. Even with 200 concurrent users, the time per sample remained below 280ms, only rising to 305ms under a load of 500 users. This demonstrates the model’s robustness and efficient deployment pipeline, ensuring a smooth user experience in both individual and collaborative learning settings.

Fig. 7.

Fig. 7

Real-Time Responsiveness. (a) The proposed model achieves the lowest latency among all baselines, supporting real-time educational use. (b) Scalability test shows the model maintains sub-300ms responsiveness up to 200 concurrent users.

User study and engagement metrics

To assess the system’s impact on learner experience and behavior, we conducted a user study involving 60 students across three institutions. Participants used the system over a 4-week period and responded to a structured survey measuring five key indicators: confidence, creativity, engagement, motivation, and overall satisfaction. Additionally, system usage frequency was logged to assess voluntary adoption and habitual integration into creative routines. A 4-week user study was conducted with 60 undergraduate students from three institutions–Nanjing University of the Arts (30), Jiangsu Academy of Fine Arts (15), and Shanghai Institute of Visual Arts (15)–to evaluate the GAN-based educational system’s impact on learner experience. Participants (aged 18–24, M = 20.3, SD = 1.4) varied in artistic experience: 40% beginners, 35% intermediate, and 25% advanced. Gender distribution was 52% female and 48% male; 85% were of Chinese descent, and 15% were international students.

Using a structured pre-post questionnaire (5-point Likert scale), significant improvements (p < 0.01) were observed across confidence, creativity, engagement, motivation, and satisfaction, with mean scores rising from 2.1–2.3 to 4.1–4.5. Engagement increased by 42.7%, and expert evaluation showed a 35.4% improvement in artwork quality. System usage was high: 45% used it daily, and 31% used it 3–4 times/week. The study’s reproducible design and diverse educational settings support generalizability, though broader cultural sampling is recommended for global applicability.

As shown in Fig. 8a, there was a marked increase across all five metrics after using the system. Average scores rose from approximately 2.1–2.3 before use to 4.1–4.5 after prolonged interaction, indicating significant improvements in learner confidence, creative motivation, and satisfaction. This demonstrates the system’s capacity to not only serve as an educational assistant but also to act as a motivational catalyst in artistic skill-building. Figure 8b presents the distribution of user engagement frequency. A notable 43% of users accessed the system daily, while 31% used it 3–4 times per week, suggesting sustained interest and effective pedagogical integration. Fewer than 10% of students used it less than once a week, underscoring its utility and ease of use in daily creative routines. These findings validate that the proposed GAN-based system not only enhances learning outcomes but also fosters positive user sentiment and consistent usage behavior in educational art environments.

Fig. 8.

Fig. 8

Learner Experience and Engagement. The proposed system improved confidence, motivation, and creativity. A majority of students used the system more than three times a week, indicating strong adoption.

Comparative analysis

To quantify the effectiveness of the proposed GAN-based educational auxiliary system, we conducted a rigorous comparative analysis against state-of-the-art image-to-image translation models, including Pix2Pix, CycleGAN, and StyleGAN. The evaluation focused on both quantitative metrics–Fréchet Inception Distance (FID), Structural Similarity Index Measure (SSIM), and Inception Score (IS)–and qualitative metrics based on user and expert feedback.

Table 2 summarizes the overall performance comparison. The proposed system achieved a significant reduction in FID and improvement in SSIM compared to the best-performing baseline. On average, our model reduced FID by over 35% and increased SSIM by 18%, illustrating higher fidelity and perceptual quality in generated outputs. These gains are expressed formally as:

graphic file with name 41598_2025_14164_Article_Equ19.gif 28
graphic file with name 41598_2025_14164_Article_Equ20.gif 29

These improvements stem from several architectural and data-centric innovations. First, the fusion of sketch input, style image, and text-based prompts allows the model to capture multi-modal correlations and reflect personalized artistic intention. Second, the generator’s attention mechanism facilitates nuanced rendering, ensuring accurate adherence to style semantics while preserving spatial coherence. Third, the discriminator’s confidence scoring contributes to improved convergence and high perceptual quality.

To assess the effectiveness of our GAN-based educational system, we compared it against leading models–Pix2Pix, CycleGAN, and StyleGAN–using FID, SSIM, and IS metrics, alongside expert/user feedback. Baseline rationale:Pix2Pix20 supervised sketch-to-image model; strong for structured tasks. CycleGAN15 unpaired translation; ideal for style transfer tasks. StyleGAN22 known for high-quality generation and style control via latent space. Exclusions, SPADE GAN20 high FID (38.7), slow inference, low flexibility. DALL-E-inspired models12 high latency (600+?ms), limited interpretability. Our model reduced FID by 35% and improved SSIM by 18% over baselines. Users rated realism (mean = 4.75) and style alignment (4.85) higher than Pix2Pix (4.0, 3.9), CycleGAN (3.8, 3.8), and StyleGAN (4.2, 4.1). It also outperformed in latency (278?ms vs. 450–510?ms), confirming its suitability for real-time, interpretable, educational applications.

Moreover, the system demonstrates superior real-time responsiveness. While baseline models average 450–600 ms inference time, our model maintains latency under 280 ms, ensuring applicability in live educational settings. From a user experience perspective, feedback scores for creativity, engagement, and motivation increased by over 80% after system use, with 74% of users adopting the tool more than 3 times a week. Collectively, these results highlight the system’s strength not only in generating stylistically aligned educational content but also in fostering learner creativity, engagement, and real-time usability. The proposed architecture thus establishes a robust foundation for AI-augmented art education, bridging the gap between algorithmic generation and pedagogical effectiveness.

Potential limitations

Despite its demonstrated performance across generative quality, latency, and user engagement, the proposed GAN-based educational system presents several inherent limitations. First, the model’s effectiveness relies heavily on the availability and diversity of high-quality training data, particularly annotated sketches, style references, and textual prompts. The current datasets–QuickDraw, Sketchy, BAM!, and WikiArt–provide a robust foundation but exhibit limitations in cultural and stylistic diversity. Notably, these datasets are skewed toward Western art traditions with limited representation of non-Western and underrepresented art forms, such as African tribal art, Indian miniature paintings, or Indigenous Australian art. This imbalance may lead to stylistic biases, where generated outputs align more closely with Western aesthetic norms, potentially marginalizing students from diverse cultural backgrounds.

Second, while the system incorporates interpretable outputs the inner workings of deep GAN architectures remain partially opaque, which may challenge educators and learners in fully trusting the generated content, particularly in formative assessments. Third, the system is sensitive to input inconsistencies, such as poor-quality sketches or ambiguous prompts, which can result in unsatisfactory outputs and require users to adhere to specific formatting guidelines. Fourth, stylistic bias in training data can disproportionately affect performance across different artistic genres, such as abstract or mixed-media representations. Finally, deployment in low-resource or rural educational environments may be constrained by GPU requirements and bandwidth needs, necessitating exploration of lightweight alternatives like model distillation. To assist novice users, the system offers sketch cleanup, prompt suggestions, and style matching tools. Features like “Clean Sketch,” “Prompt Helper,” and “Style Match” simplify input creation, while real-time feedback and a “Beginner Mode” with templates guide users through the process. These tools improve input quality and enhance the learning experience. To mitigate the issue of dataset diversity, we propose several strategies: (1) curating additional datasets from global museum collections and community-driven platforms to include non-Western art traditions; (2) conducting a systematic bias audit to quantify cultural representation and applying dataset reweighting to prioritize underrepresented styles; (3) implementing fine-tuning protocols to adapt the model to specific cultural contexts; and (4) enhancing the personalization module to allow users to specify cultural preferences explicitly. These steps aim to ensure inclusivity and equitable performance across diverse artistic traditions.

Conclusion

This study presents a novel deep learning-based educational auxiliary system designed to enhance art creation and personalized learning through the integration of Generative Adversarial Networks (GANs). By fusing sketch inputs, textual prompts, and stylistic references, the system enables high-quality, contextually relevant artwork generation for educational use. The proposed architecture incorporates attention-based multi-modal fusion and an interpretable discriminator module, achieving state-of-the-art performance on both quantitative (FID = 34.2, SSIM = 0.81) and qualitative (user-rated realism = 4.7/5) benchmarks. The system demonstrates strong real-time responsiveness, maintaining latency below 300ms even under high concurrent usage, making it suitable for live classroom integration. Engagement studies revealed over 80% improvement in creativity, motivation, and confidence scores, with the majority of users adopting the tool in regular creative routines. Additionally, interpretability mechanisms such as visual attention maps and saliency overlays aid in demystifying the generative process, providing learners with valuable feedback loops and educators with insight into system behavior. While the framework proves to be scalable, efficient, and effective, it is not without limitations. These include dependence on input quality, potential stylistic bias in training data, and limited accessibility in resource-constrained environments. Nonetheless, the system sets a new standard for AI-assisted art education, empowering learners to experiment, reflect, and iterate in real time with AI as a creative co-pilot.

Future work

To enhance the GAN-based educational system, future work will focus on cultural inclusivity and reducing dataset bias by incorporating diverse art forms such as African, Indigenous, South Asian through expanded training data, partnerships with cultural institutions, and dataset audits. Reweighting will amplify minority styles, and fine-tuning will adapt models to local contexts. The personalization module will allow user defined cultural preferences and feedback for mismatches. Gradient-based saliency maps will improve transparency, while co-design workshops and pilot tests in non Western schools will guide iterative improvements for equitable, culturally sensitive art education. Support beginners system will include interactive tutorials, AI-driven suggestions, and error diagnostic tools. A community-driven template library will facilitate shared learning. For low-resource settings, lightweight versions of the system will be developed through model compression, ONNX support, and hybrid cloud-edge architecture to ensure accessibility on affordable hardware like Jetson Nano and Raspberry Pi. This approach will enable offline usage and reduce computational demands with features like “Light Mode.” Enhance semantic coherence and visual diversity, transformer based models like CLIP-ViT, Stable Diffusion, and VILT will be integrated into the system. These models will improve prompt alignment, output diversity, and multimodal input processing. The hybrid architecture will merge GAN-based sketch generation with transformer-driven text interpretation, supported by UI features like prompt sliders and attention visualizations. These advancements aim to make AI-powered art education more creative, relevant, and accessible.

Acknowledgements

This research is supported by the Social Science Foundation of Jiangsu Province, China (No.24YSD009), Project Leader (Yongjun He). This project is jointly supported by the Jiangsu Provincial Philosophy Social Sciences Planning Office and Nanjing University of the Arts. I hereby express my gratitude. This research project was approved for funding in December 2024. The research will be funded for a period of 3 years, and is expected to be completed by December 2027.

Author contributions

Yongjun He and Shijie Zhang wrote the main manuscript text and Yongjun He prepared figures. All authors reviewed the manuscript.

Data availability

The data used in this study, including student-generated sketches, training art images, and GAN output samples, were collected and curated internally and cannot be publicly shared due to licensing constraints and institutional agreements. However, relevant publicly available datasets are accessible for replication and benchmarking purposes: Quick, Draw! Dataset (Google): https://quickdraw.withgoogle.com/data. Sketchy Database (Georgia Tech): http://sketchy.eye.gatech.edu. Behance Artistic Media (BAM!) Dataset: https://bam-dataset.org. WikiArt Visual Art Encyclopedia: https://www.wikiart.org These datasets offer comparable sketch and artwork features, and may support further research on AI-driven generative systems in educational art environments.

Declarations

Competing interests

The authors declare no conflict of interest related to this study.

Ethical approval

This study was approved by the Institutional Ethics Committee of Nanjing University of the Arts and conducted in accordance with the Declaration of Helsinki. Informed consent was obtained from all participants, and for those under the legal age of consent, consent was provided by their parent or legal guardian. The ethics committee complies with GCP, ICH-GCP, and relevant Chinese regulations.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Guettala, M., Bourekkache, S., Kazar, O. & Harous, S. Generative artificial intelligence in education: Advancing adaptive and personalized learning. Acta Inf Prag13(3), 460–489 (2024). [Google Scholar]
  • 2.Kavyashree, K. N., & Shidaganti, G. Generative adversarial networks (gans) for education: State-of-art and applications. In: International conference on power engineering and intelligent systems (PEIS), pp. 355–370. Springer, Singapore (2024).
  • 3.Eerdenisuyila, E., Li, H. & Chen, W. The analysis of generative adversarial network in sports education based on deep learning. Sci. Rep.14(1), 1–14 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Kavyashree, K. N., & Shidaganti, G. Generative adversarial networks (gans) for education: State-of-art and applications. In: International conference on power engineering and intelligent systems (PEIS), pp. 355–370. Springer, Singapore (2024).
  • 5.Zhang, H. et al. A hybrid prototype method combining physical models and generative artificial intelligence to support creativity in conceptual design. ACM Trans Comput-Human Interact31(5), 1–34 (2024). [Google Scholar]
  • 6.Eerdenisuyila, E., Li, H. & Chen, W. The analysis of generative adversarial network in sports education based on deep learning. Sci. Rep.14(1), 1–14 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Gao, L. X. Dance creation based on the development and application of a computer three-dimensional auxiliary system. Int J Marit Eng1(1), 347–358 (2024). [Google Scholar]
  • 8.Sun, G., Xie, W., Niyato, D., Du, H., Kang, J., Wu, J., Sun, S., & Zhang, P. Generative AI for advanced UAV networking. IEEE Network (2024).
  • 9.Gao, M. & Pu, P. Generative adversarial network-based experience design for visual communication: An innovative exploration in digital media arts. IEEE Access12, 92035–92042 (2024). [Google Scholar]
  • 10.Wang, Y. & Liang, S. A fast scenario transfer approach for portrait styles through collaborative awareness of convolutional neural network and generative adversarial learning. J Circuits, Syst Comput33(07), 2450121 (2024). [Google Scholar]
  • 11.Li, Q., Tang, Y. & Chu, L. Generative adversarial networks for prognostic and health management of industrial systems: A review. Expert Syst Appl253, 124341 (2024). [Google Scholar]
  • 12.Yuan, Y., Li, Z. & Zhao, B. A survey of multimodal learning: Methods, applications, and future. ACM Comput Surv.57(7), 1–34 (2025). [Google Scholar]
  • 13.Zeshan, A. H. Transcending object: A critical evaluation of integration of AI with architecture
  • 14.Hatami, M. et al. A survey of the real-time metaverse: Challenges and opportunities. Future Internet16(10), 379 (2024). [Google Scholar]
  • 15.Gao, L. X. Dance creation based on the development and application of a computer three-dimensional auxiliary system. Int J Marit Eng1(1), 347–358 (2024). [Google Scholar]
  • 16.Su, H., Zhang, J., & Tang, S. Artificial intelligence innovations in visual arts and design education. In: Integrating technology in problem-solving educational practices, pp. 219–240. IGI Global, (2025).
  • 17.Jan, M. B., Rashid, M., Vavekanand, R. & Singh, V. Integrating explainable AI for skin lesion classifications: A systematic literature review. Stud Med Health Sci2(1), 1–14 (2025). [Google Scholar]
  • 18.Eerdenisuyila, E., Li, H. & Chen, W. The analysis of generative adversarial network in sports education based on deep learning. Sci. Rep.14(1), 30318 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Wang, C. et al. A traffic flow data restoration method based on an auxiliary discrimination mechanism-oriented GAN model. IEEE Internet Things J11(12), 22742–22753 (2024). [Google Scholar]
  • 20.Gao, W., Mei, Y., Duh, H. & Zhou, Z. Envisioning the incorporation of generative artificial intelligence into future product design education: Insights from practitioners, educators, and students. Des J28(2), 346–366 (2024). [Google Scholar]
  • 21.Vavekanand, R., Das, B. & Kumar, T. Daugsindhi: A data augmentation approach for enhancing Sindhi language text classification. Discov Data3(1), 22 (2025). [Google Scholar]
  • 22.Cai, Q., Zhang, X. & Xie, W. Art teaching innovation based on computer aided design and deep learning model. Comput Aided Des Appl21(S14), 124–139 (2024). [Google Scholar]
  • 23.Cai, Q., Zhang, X. & Xie, W. Art teaching innovation based on computer aided design and deep learning model. Comput Aided Des Appl21(S14), 124–139 (2024). [Google Scholar]
  • 24.Tang, W.Y., Xiang, Z.R., Yu, S.L., Zhi, J.Y., & Yang, Z. Psr-gan: A product concept sketch rendering method based on generative adversarial network and colour tags. Journal of Engineering Design, 1–23 (2025)
  • 25.Kavyashree, K.N., & Shidaganti, G. Generative adversarial networks (GANS) for education: State-of-art and applications. In: International conference on power engineering and intelligent systems (PEIS), pp. 355–370. Springer, Singapore (2024)
  • 26.Wang, X., Li, C., Sun, Z. & Hui, L. Review of GAN-based research on Chinese character font generation. Chin. J. Electron.33(3), 584–600 (2024). [Google Scholar]
  • 27.Mittal, U., Sai, S., & Chamola, V. A comprehensive review on generative AI for education. IEEE Access (2024)
  • 28.Almeda, S.G., Zamfirescu-Pereira, J.D., Kim, K.W., Mani Rathnam, P., & Hartmann, B. Prompting for discovery: Flexible sense-making for AI art-making with dreamsheets. In: Proceedings of the 2024 CHI conference on human factors in computing systems, pp. 1–17 (2024)
  • 29.Berryman, J. Creativity and style in GAN and AI art: Some art-historical reflections. Philos Technol37(2), 61 (2024). [Google Scholar]
  • 30.Cai, Q., Zhang, X. & Xie, W. Art teaching innovation based on computer aided design and deep learning model. Comput Aided Des Appl21(S14), 124–139 (2024). [Google Scholar]
  • 31.Xie, H., Song, W. & Kang, W. Learning an augmented RGB representation for dynamic hand gesture authentication. IEEE Trans Circuits Syst Video Technol34(10), 9195–9208 (2024). [Google Scholar]
  • 32.Lang, Q., Wang, M., Yin, M., Liang, S., & Song, W. Transforming education with generative AI (GAI): Key insights and future prospects. IEEE Transactions on Learning Technologies (2025)
  • 33.Fang, F. & Jiang, X. The analysis of artificial intelligence digital technology in art education under the internet of things. IEEE Access12, 22928–22937 (2024). [Google Scholar]
  • 34.Du, X. et al. Deepthink: Designing and probing human-AI co-creation in digital art therapy. Int. J. Hum Comput Stud.181, 103139 (2024). [Google Scholar]
  • 35.Uke, S., Surase, O., Solunke, T., Marne, S., & Phadke, S. Sketches to realistic images: A multi-gan based image generation system. In: 2024 5th International conference on data intelligence and cognitive informatics (ICDICI), pp. 1489–1496. IEEE, (2024)
  • 36.Wan, M. & Jing, N. Style recommendation and simulation for handmade artworks using generative adversarial networks. Sci. Rep.14(1), 28002 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Sobhan, A., Abuzuraiq, A.M., & Pasquier, P. Autolume 2.0: A GAN-based no-coding small data and model crafting visual synthesizer (2024)
  • 38.Tian, Q., & Li, Q. Combining creative adversarial networks with art design models and machine vision feedback optimization (2024)
  • 39.Huang, S. & Ismail, A. I. B. Generative adversarial network to evaluate the ceramic art design through virtual reality with augmented reality. Int. J. Intell. Syst. Appl. Eng12, 508–520 (2024). [Google Scholar]
  • 40.Kalita, S.S., Mahajan, P., & Sharanya, S. Generative adversarial network art generator for sculpture analysis. In: 2024 International conference on communication, computing and internet of things (IC3IoT), pp. 1–6. IEEE, (2024)
  • 41.Huang, X., Fan, Y., Huang, Q., Chen, J., & Fu, Y. Personalized learning pathways with generative adversarial networks in intelligent education management. In: Second international conference on big data, computational intelligence, and applications (BDCIA 2024), vol. 13550, pp. 828–835. SPIE, (2025)
  • 42.Gan, Y., Yang, C., Ye, M., Huang, R. & Ouyang, D. Generative adversarial networks with learnable auxiliary module for image synthesis. ACM Trans. Multimed. Comput. Commun. Appl.21(4), 1–21 (2025). [Google Scholar]
  • 43.Yeganeh, L. N., Fenty, N. S., Chen, Y., Simpson, A. & Hatami, M. The future of education: A multi-layered metaverse classroom model for immersive and inclusive learning. Future Internet17(2), 63 (2025). [Google Scholar]
  • 44.Wei, H. Exploring the teaching mode of integrating non-heritage culture into university animation art education from an epistemological philosophical perspective. Cultura: Int J Philos Culture Axiol21(1), 491–506 (2024). [Google Scholar]
  • 45.Vavekanand, R. From language to action: A study on the evolution of large language models to large action models (2024)

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data used in this study, including student-generated sketches, training art images, and GAN output samples, were collected and curated internally and cannot be publicly shared due to licensing constraints and institutional agreements. However, relevant publicly available datasets are accessible for replication and benchmarking purposes: Quick, Draw! Dataset (Google): https://quickdraw.withgoogle.com/data. Sketchy Database (Georgia Tech): http://sketchy.eye.gatech.edu. Behance Artistic Media (BAM!) Dataset: https://bam-dataset.org. WikiArt Visual Art Encyclopedia: https://www.wikiart.org These datasets offer comparable sketch and artwork features, and may support further research on AI-driven generative systems in educational art environments.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES