Main text
Artificial intelligence (AI) in education is experiencing a transformative shift, fueled by foundation models with unprecedented capabilities. These advancements are reshaping educational paradigms and addressing challenges such as diverse student needs, resource gaps, and engagement.1 This paper examines three key trends: the shift from perception to cognition, the transition from generalized to personalized learning, and the rise of multimodal systems, as shown in Figure 1. Together, these trends open up new opportunities to tackle persistent challenges in the education sector.
Figure 1.
Evolution trends in AI for education
Evolution trends in AI for education
From perception to cognition
The transition of AI from perception-based to cognition-driven systems represents a significant milestone in the AI-for-education field.2 Early AI applications were primarily focused on perception: understanding basic interactions and responding to student queries. However, with the advent of foundation models, AI has developed higher-order cognitive abilities, such as context understanding, knowledge reasoning, and complex problem solving. Organizations like DeepSeek have introduced models capable of reasoning at a level comparable to that of human instructors. These models can understand complex contexts, perform multi-step reasoning, and analyze student queries in depth rather than simply providing straightforward answers. For example, consider the use of AI tutors like DeepSeek’s model in biology courses. While AI can generate detailed explanations of cellular processes, students often struggle to trace the reasoning behind intricate biochemical pathways, which can lead to a superficial understanding. This shift transforms AI from a basic informational tool into an intelligent mentor, fostering independent thinking and creative problem-solving skills. However, current research on cognitive AI faces significant challenges, particularly regarding model transparency. The lack of clear reasoning paths complicates students’ understanding of the "how" and "why" behind conclusions, a crucial aspect in educational contexts. This limitation hinders reflective learning and deep conceptual understanding. Moreover, current AI systems struggle to integrate diverse knowledge sources to support complex knowledge construction, thus limiting the depth of cognitive growth for learners.
From generalized to personalized
AI for education is transforming learning by moving from a "one-size-fits-all" approach to a more individualized model. Traditional education has often struggled to meet the diverse needs of learners. With AI, particularly foundation models, learning content can be dynamically adjusted based on each student’s needs and progress. These models analyze student performance in real time, enabling tailored content and pacing that enhance learning outcomes. For example, platforms like Khan Academy use AI to personalize practice exercises based on individual student performance. Recent studies have shown that adaptive AI tutors can align teaching strategies with individual learning speeds and comprehension levels, improving academic performance.3 However, widespread personalized learning raises concerns about data privacy and fairness. Future research should focus on developing transparent and equitable algorithms to protect student data and reduce algorithmic bias. Besides, despite its potential, personalized learning faces challenges in creating dynamic learning paths, assessment methods, and motivational strategies. Existing AI systems struggle to design learning paths based on student interests, which can impact motivation and engagement. Additionally, personalized assessments remain limited, as students are primarily evaluated through traditional standardized tests.
From single modality to multimodality
The integration of multimodal AI represents a significant advancement in education. Traditional text-based systems were limited in conveying complex concepts. By combining text, audio, video, and images, multimodal AI provides richer, more immersive learning experiences.4 Innovations like GPT-4 Vision and Meta’s multimodal data integration are supporting diverse learning styles through multiple sensory channels. For example, in science education, AI integrates experimental videos, audio explanations, and interactive questions to enhance comprehension. Language learning tools such as Rosetta Stone use visual cues, speech recognition, and dialogue interactions to accelerate language acquisition. Platforms like Coursera incorporate video lectures, text, and quizzes to cater to various learning preferences. Medical tools like Virtual Anatomy allow students to explore 3D models with audio-guided explanations, offering a more comprehensive understanding of anatomy. These multimodal approaches improve accessibility, engagement, and learning outcomes. While multimodal AI simulates real-world interactions and enhances curiosity, challenges remain. Seamlessly integrating multiple information sources still limits immersion and effectiveness, particularly in fields like medical and art education. Moreover, real-time feedback based on facial expressions, voice tones, and body language remains underdeveloped, hindering personalized instruction and emotional engagement.
In summary, the rise of foundation models is transforming AI for education, creating new opportunities to reshape educational practices. The interconnected trends, including enhanced cognition, personalization, and multimodal integration, complement each other: cognition-driven AI enables personalized learning pathways, while multimodal integration supports diverse learning styles. This allows AI to engage learners more actively and provide individualized support tailored to each student’s needs. Additionally, multimodal technology creates dynamic, intuitive learning experiences.
Insights of AI for education
Knowledge-enhanced cognitive AI education
Current cognitive AI faces significant challenges in utilizing external knowledge sources for complex contextual understanding and advanced reasoning, limiting its ability to fully support diverse educational scenarios. Future research could focus on enhancing AI’s capabilities in this area, particularly by developing models that can perform multi-step reasoning and decompose complex tasks, enriched with domain-specific knowledge to better emulate human cognitive processes. The integration of knowledge graphs (KGs) with foundation models represents a promising research direction.5 By incorporating structured KGs into foundation models, AI can enhance its capacity for complex reasoning and knowledge integration, thereby better supporting deep learning in education. For instance, in a history learning application, a KG could map relationships between historical events, figures, and dates, enabling AI to provide more insightful and contextually rich explanations. Furthermore, integrating principles from the learning sciences into AI’s cognitive processes will be essential for guiding learners in constructing deep knowledge and fostering critical thinking, creativity, and a more profound understanding of subject matter.
Advancing multimodal integration
The shift from single modality to multimodality faces challenges in efficiency and coherence. Existing multimodal integration technologies struggle to seamlessly combine multiple information sources, limiting immersion and learning effectiveness. Future efforts should focus on improving the efficiency and coherence of multimodal integration, including developing more sophisticated data fusion techniques that can synchronize and process inputs from various modalities, such as text, audio, and visual data. For example, using transformer-based architectures capable of handling multimodal inputs simultaneously may enhance the integration process. Additionally, exploring multimodal applications across different disciplines is essential. In physics and chemistry education, AI-driven virtual laboratories that incorporate virtual experiments, animated simulations, and real-time feedback can help clarify complex concepts and enable students to engage in hands-on learning experiences remotely. Language acquisition can also benefit from virtual reality scenarios that simulate real-life conversations, providing learners with opportunities to practice speaking and listening skills in an immersive environment. Furthermore, integrating gesture recognition and eye-tracking technologies can offer more interactive and responsive learning interfaces.
Comprehensive evaluation techniques
The current evaluation of AI-generated content still faces significant challenges related to authenticity, reliability, and creativity. Advancing comprehensive evaluation techniques for AI-generated content requires mechanisms to assess authenticity and reliability to ensure educational accuracy and credibility. This is particularly critical in educational contexts, where both students and teachers rely on trustworthy content for effective learning. Future research should prioritize developing automated verification and fact-checking tools that leverage natural language processing and information retrieval techniques to cross-reference AI-generated responses with trusted data sources, thus mitigating the spread of misinformation. Assessing creativity is equally important, encompassing both divergent thinking (e.g., generating novel and diverse ideas) and convergent thinking (e.g., problem-solving effectiveness and logic). Developing standardized metrics to quantitatively assess these aspects is essential. For example, adapting the Torrance tests of creative thinking for AI-generated content offers a structured approach. Additionally, integrating user feedback and expert evaluations can refine assessment tools to better capture the nuances of creative output. Future research should focus on quantifying and evaluating the creativity of AI-generated content to inspire student imagination and innovation while adapting to the creative needs of various educational contexts. Furthermore, enhancing the transparency and interpretability of AI-generated content is crucial for helping teachers and students understand the sources and logic behind generated content. This transparency not only improves the credibility of AI-generated content in education but also allows educators to guide students in using this content for critical and inquiry-based learning. Implementing explainable AI (XAI) frameworks can provide insights into AI model decision-making, allowing educators to trace the reasoning behind specific content generation. Tools like attention maps and model-agnostic explanation methods can be integrated into educational AI systems to visualize reasoning pathways. By developing comprehensive evaluation techniques for AI-generated content, we can better align AI with educational goals, providing students with a more holistic and meaningful learning experience.
In conclusion, advancements in AI for education, driven by foundation models, are reshaping educational practices through enhanced cognition, personalization, and multimodal integration. Addressing the associated challenges will enable AI to provide more equitable, engaging, and effective learning experiences for all students.
Acknowledgments
This paper is funded by the NSFC (no. 62172393) and the Major Public Welfare Project of Henan Province (no. 201300311200).
Declaration of interests
The authors declare no competing interests.
Published Online: March 19, 2025
References
- 1.Kasneci E., Seßler K., Küchemann S., et al. ChatGPT for Good? On Opportunities and Challenges of Large Language Models for Education. Learn. Individ. Differ. 2023;103 doi: 10.1016/j.lindif.2023.102274. [DOI] [Google Scholar]
- 2.Xu Y., Wang F., An Z., et al. Artificial Intelligence for Science—Bridging Data to Wisdom. Innovations. 2023;4 doi: 10.1016/j.xinn.2023.100525. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Ayeni O.O., Al Hamad N.M., Chisom O.N., et al. AI in Education: A Review of Personalized Learning and Educational Technology. GSC Adv. Res. Rev. 2024;18:261–271. doi: 10.30574/gscarr.2024.18.2.0062. [DOI] [Google Scholar]
- 4.Wang Y., Li J., Cui Y. Science Communication with Science Fiction Movies. Innovation. 2024;5 doi: 10.1016/j.xinn.2024.100589. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Pan S., Luo L., Wang Y., et al. Unifying Large Language Models and Knowledge Graphs: A Roadmap. IEEE Trans. Knowl. Data Eng. 2024;36:3580–3599. doi: 10.1109/TKDE.2024.3352100. [DOI] [Google Scholar]

