Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Mar 28;15:10725. doi: 10.1038/s41598-025-95802-4

Development and effectiveness verification of AI education data sets based on constructivist learning principles for enhancing AI literacy

Seul-Ki Kim 1,, Tae-Young Kim 1, Kwihoon Kim 1,
PMCID: PMC11953269  PMID: 40155549

Abstract

This study confirmed the importance of AI education for fostering students’ AI literacy and derived the necessity of constructivist-oriented datasets that provide contextual relevance to students’ lives and real-world problem-solving experiences. By reconstructing the machine learning dataset development cycle through prior research, we developed datasets following each procedural step, then evaluated and refined them through expert panel interviews focusing on dataset quality metrics and characteristics of authentic activities. The datasets were deployed through educational programming platforms commonly used in AI education and designed for sustainable maintenance. To verify effectiveness, we analyzed usage metrics of the developed datasets and conducted comparative analysis of their impact on AI literacy through educational implementations. The research outcomes include development of four AI education datasets demonstrating potential to replace conventional materials like the Iris dataset. Implementation on major Korean AI education platforms confirmed high accessibility and utility, establishing these as crucial educational resources meeting classroom needs. Through application and effectiveness analysis, we verified that AI education datasets developed based on constructivism can: connect students’ prior knowledge with real-world experiences, deepen understanding of AI model learning processes, and provide authentic data-driven computing experiences - collectively contributing to comprehensive AI literacy enhancement.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-025-95802-4.

Keywords: Artificial intelligence education, Constructivism, Datasets, Authentic activity, Context

Subject terms: Computational science, Computer science, Information technology, Scientific data, Software

Introduction

The rapid advancement of Artificial Intelligence (AI) technology, particularly centered around Generative AI, is significantly impacting all domains including industry and society. Consequently, there is growing emphasis on the importance of education to cultivate AI literacy among future generations, enabling students to maximize AI’s benefits while minimizing its negative impacts13. While various definitions of AI literacy have been explored, they remain as diverse as the engineering perspectives on AI itself2,4,5.

In response to social and educational demands, continuous research into diverse educational methods for developing students’ AI literacy has yielded meaningful outcomes. Previous studies in AI education primarily focused on analyzing existing machine learning research to explore educational applicability, or emphasized the need for pedagogical approaches that address differences from traditional computing education. Additionally, research findings on effective educational programs and methodologies for diverse student populations have been published3,6,7.

Notably, multiple studies emphasize constructivism-based computing activities in AI education, paralleling approaches in computing education6,8. They advocate for providing problem-solving experiences using visual programming tools appropriate to students’ levels, thereby enhancing conceptual understanding through real-world problem resolution911. Furthermore, research highlights the necessity of studying data—AI’s core component—to enable students to solve practical problems and engage in level-appropriate learning experiences12,13.

Despite this emphasis on practical problem-solving experiences and datasets in AI education, literature reviews analyzing current AI education practices reveal significant gaps: insufficient research focused on providing authentic problem-solving experiences for students, and particularly inadequate studies related to datasets14,15. While existing research emphasizes realistic problem-solving experiences incorporating students’ lifestyles, cultural contexts, and interdisciplinary information, it fails to provide comparative analyses detailing how these experiences actually impact students’ AI literacy16,17. Finally, even studies utilizing appropriate datasets for AI education face limitations including reliance on low-reusability data confined to single tasks18, and lack of systematic processes for dataset selection and restructuring, rendering them impractical for actual educational settings19.

To overcome these limitations, this study aims to develop datasets that provide authentic problem-solving experiences closely tied to students’ lives from a constructivist perspective, and analyze their effectiveness through educational implementation. Specifically, we introduce systematic dataset development processes that can be utilized in frontline educational settings for creating and refining datasets across various subjects and formats. By deploying developed datasets and implementing them in educational programs, we verify their practical applicability in real classrooms and conduct detailed analyses of their impact on students’ AI literacy, ultimately deriving concrete implications for data-driven AI education.

Related works

AI literacy and AI education

The advancement of computing technologies, including AI, has elevated the importance of AI literacy as a core competency for students navigating future society. Just as engineering perspectives offer diverse definitions of AI, conceptions of AI literacy also vary considerably1,5.

Multiple studies have inductively defined AI literacy through analyses of prior AI-related research. Key findings from major studies include:

AI literacy has been conceptualized as a subset of digital literacy encompassing understanding of AI’s basic functions and ethical application in daily life. This construct includes four dimensions: knowing/understanding AI, using/applying AI, evaluating/creating AI, and AI ethics2,13. Other research defines it as the ability to comprehend, utilize, critically assess, and effectively communicate/collaborate with AI. Notably, scholars emphasize that computer literacy and information literacy should precede AI literacy development4. Furthermore, research categorizes AI literacy components through different lenses based on five fundamental questions: What is AI? What can AI do? How does AI work? How should we use AI? How do humans perceive AI5?

In defining AI education for fostering AI literacy, early research primarily focused on curriculum design to establish educational guidelines. Starting with foundational studies proposing five big ideas for AI education (perception, representation/reasoning, learning, natural interaction, and social impact), definitions have evolved to encompass three key aspects aligned with AI literacy development: understanding AI concepts, applying AI technologies/tools, and considering ethical/societal implications20,21.

While existing research shows variations in specific definitions, scholars consistently identify three core AI literacy components: understanding/applying AI technologies, creating/evaluating AI systems, and considering ethical dimensions. Correspondingly, AI education is defined as an instructional process that develops these competencies through systematic engagement with relevant knowledge, skills, and tools.

AI education method constructivism-based AI education and authentic activities

A review of literature on AI education reveals that both AI education and computational thinking education primarily employ constructivism-based problem/project-centered instruction and practical exercises, utilizing Python or visual programming tools6.

Particularly, constructivist theory—which emphasizes connecting learners’ existing knowledge and skills, and posits that most effective learning occurs through active construction of subjective representations—has proven effective for understanding AI principles/concepts and enhancing computational thinking8. From this constructivist perspective, AI education emphasizes computing practice-centered pedagogical approaches where students apply acquired foundational knowledge to real-world contexts, making learning relevant and meaningful. Educational methods such as design thinking and project-based learning have been proposed as formats for suggesting solutions to social or personal problems8,14.

To achieve this, it becomes crucial to provide practical, real-world examples that enable students to design and develop integrated intelligent systems themselves, rather than merely using pre-made AI-enabled software22,23. Providing authentic examples requires consideration of contexts close to learners’ lives, where learner-centered contexts in teaching/learning can connect to authentic activities—the process of solving transferable problem situations relevant to students’ lives24. Particularly, utilizing public/open data reflecting students’ life challenges or real-world contexts could transform activities in AI education curricula into authentic practices. This approach helps students understand information within their living environments, promotes recognition/development of digital/data technologies, and facilitates idea generation for life improvement19. However, existing research on data-centered activities reveals limitations: either using arbitrarily created datasets for projects that fail to consider learners’ life contexts18, or providing real-world data without systematic processes for educational adaptation through selection, refinement, and utilization16.

AI education datasets are difficult for teachers to source independently, requiring essential preprocessing steps of purification and restructuring for educational purposes. Additionally, public data faces limitations as updates may alter content, potentially affecting validity, accuracy, and usability for educational objectives19. Therefore, establishing systematic processes for dataset development, refinement, and evaluation of accuracy/usability becomes imperative to provide diverse AI education datasets supporting authentic activities in educational settings.

Development and evaluation of datasets for AI education

While no specific studies exist on dataset development for AI education, we can examine dataset development processes through software engineering research focused on AI and machine learning. A seminal study23 presents a detailed software engineering process emphasizing accountability in machine learning dataset development, proposing a framework that includes dataset development lifecycle and essential documentation as shown in Fig. 1.

Fig. 1.

Fig. 1

The dataset development lifecycle requires documentation for each stage.

The dataset development process comprises multiple phases. In the requirements analysis phase, stakeholders collaborate through deliberate intention examination and use case analysis to determine data requirements. The design phase involves research and consultation with subject matter experts to verify feasibility of meeting data requirements and determine optimal implementation strategies. During implementation, design decisions are translated into technical components including software systems, annotator guidelines, and labeling platforms, with potential recruitment and management of human expert evaluator teams. The testing phase involves dataset evaluation and usage decisions. Finally, the maintenance phase requires establishing various affordances including tools, policies, and designated ownership after dataset collection. The lifecycle in Fig. 1 demonstrates advantages for enhancing transparency and accountability in AI/ML ethics through its organic interconnection of documented, continuously evaluated stages. However, this framework requires pedagogical reconstruction for educational dataset development23.

Datasets for AI education must be refined/reconstructed according to educational objectives, with prior evaluation of validity, accuracy, and usability to ensure achievement of learning goals19. Research on quality management and evaluation primarily focuses on corporate data quality perspectives. Key studies propose that dataset quality assessment requires both subjective and objective evaluations, suggesting objective metrics including accessibility, data volume, and reliability. Similar approaches present evaluation elements and measurement methods for statistical quality assessment, incorporating importance, accuracy, timeliness, and punctuality25,26.

Research on educational dataset evaluation remains limited compared to engineering perspectives. Existing studies analyze prior research to establish evaluation criteria for AI education datasets, proposing standards from developmental, engineering, and instructional perspectives27. Table 1 summarizes key quality management and evaluation factors identified by researchers.

Table 1.

Key metrics for dataset evaluation and quality management by study.

Study Key metrics for dataset evaluation and quality management
Leo et al.36 Accessibility, Appropriate Amount of Data, Believability, Completeness, Concise Representation, Consistent Representation, Ease of Manipulation, Free-of-Error, Interpretability, Objectivity, Relevancy, Reputation, Security, Timeliness, Understandability, Value-Added
Bergdahl et al.37 Relevance, Accuracy, Timeliness and Punctuality, Accessibility and Clarity, Comparability, Coherence
Kim and Kim38 Data format, Data provision form, Data number system, Number of attributes, Accessibility, Accuracy, Fit of Size, Completeness, Conciseness, Consistency, Integrity, Objectivity, Relevancy, Timeliness, Subject goal, Teaching and learning environment, Copyright

While major indicators for dataset evaluation and quality management may use different terminology or present varying definitions/scopes for similar evaluation criteria, we consistently identify common elements including data accuracy, data consistency, dataset size, relevance to target problems, and dataset timeliness. These shared factors can serve as primary evaluation criteria for educational-purpose datasets.

Methods

Research procedure

This study aims to develop and validate AI education datasets that support authentic activities from a constructivist perspective to enhance students’ AI literacy. As educational materials provided to students, datasets for educational purposes require high accountability during development. Accordingly, we adapt and modify the machine learning dataset development lifecycle23. The machine learning dataset lifecycle serves as a framework that structures collaboration with experts to ensure educational validity and completeness while supporting developer responsibility and transparency. This approach offers the advantage of presenting considerations for the entire process from data collection to deployment, thereby preventing potential dataset flaws in advance. Furthermore, the proposed maintenance cycle enables periodic assessment and improvement of real-world data suitability, such as public data. For AI education dataset development, we employ a restructured process as shown in Fig. 2.

Fig. 2.

Fig. 2

Procedures for developing AI education dataset.

The key procedures for each stage are as follows.

First, we analyze current status of datasets used in AI education to identify issues, examine modeling algorithms employed in AI education, and derive dataset requirements for AI education.

Second, we design fundamental dataset elements including educational objectives and key specifications while considering real-life contexts close to students’ experiences to provide meaningful problem-solving experiences from a constructivist perspective.

Third, we explore and identify datasets with appropriate themes and structures through public data platforms and various dataset repositories. Each dataset’s quality is evaluated and restructured according to educational purposes to develop preliminary AI education datasets.

Fourth, we conduct testing processes to systematically verify dataset suitability for AI education. Expert interviews are conducted using dataset quality criteria and authentic activity characteristics assessment, with subsequent dataset revisions based on expert feedback. The developed datasets are directly applied to data processing and AI modeling to verify results and assess applicability.

Fifth, the finalized datasets are distributed and maintained in accessible formats for AI education implementation. Finally, we analyze usage statistics of deployed datasets and compare their impact on students’ AI literacy improvement through actual educational applications to evaluate effectiveness.

All methods and procedures in this study were conducted with prior approval from the Institutional Review Board of Korea National University of Education (IRB No. 202206-SB-0178-01) and in accordance with the Bioethics and Safety Act as well as the Helsinki Declaration. We obtained informed consent from all participants after fully disclosing the research purpose, methodology, and related details. For minor participants involved in the educational case application process, we secured prior consent from legal guardians in accordance with relevant regulations.

Development of datasets for AI education

Research participants

The research participants for AI education dataset development consist of experts involved in the testing phase, including data scientists or adversarial testers who can evaluate dataset usability, requirement compliance, safety, and outcome predictability, as detailed in Table 223.

Table 2.

Research participating experts details.

No. Occupation No. of years in the field Final education Major field
1 Elementary school teacher/Researcher 14 Ph.D. Computer science
2 Elementary school teacher 15 Master of Education Elementary school computer education
3 Middle school teacher 10 Master of Education Computer education
4 Middle school teacher/Researcher 17 Ph.D. Computer education
5 High school teacher 13 Master of Education Computer education
6 High school teacher 13 Master of Education Computer education
7 High school teacher 16 Master of Education Computer education

Experts selected for the role possess expertise in data analysis and AI education, have over 10 years of teaching experience, and have backgrounds in computer science and computer education. The group consists of 2 elementary school teachers, 2 middle school teachers, and 3 high school teachers, totaling 7 members.

Research tools

For the systematic development of datasets for AI education, during the dataset design phase we established four contextual domains from the mathematics domain of PISA 2022 as thematic criteria, as shown in Table 3. The mathematics domain contains items related to computational thinking alongside mathematical reasoning, demonstrating high relevance to AI education28. PISA 2022 explains context through the lens of an individual’s world where problems exist, emphasizing the importance of utilizing diverse contexts. Accordingly, this study employs four contextual domains for dataset theme exploration: personal, occupational, social, and scientific contexts.

Table 3.

Context in PISA 2023 mathematics.

Context Details
Personal Problems classified in the personal context category focus on activities of one’s self, one’s family, or one’s peer group. Personal contexts include (but are not limited to) those involving food preparation, shopping, games, personal health, personal transportation, sports, travel, personal scheduling, and personal finance
Occupational Problems classified in the occupational context category are centered on the world of work. Items categorized as occupational may involve (but are not limited to) such things as measuring, costing, and ordering materials for building, payroll/accounting, quality control, scheduling/inventory, design/architecture, and job-related decision-making
Societal Problems classified in the societal context category focus on one’s community (whether local, national, or global). They may involve (but are not limited to) such things as voting systems, public transport, government, public policies, demographics, advertising, national statistics, and economics
Scientific Problems classified in the scientific category relate to the application of mathematics to the natural world and issues and topics related to science and technology. Particular contexts might include (but are not limited to) such areas as weather or climate, ecology, medicine, space science, genetics, measurement, and the world of mathematics itself

Each context contains problem situations that students may encounter in class, with each problem’s background representing a distinct real-world scenario. We employ datasets encompassing diverse contextual problems as selection criteria to provide authentic experiential learning opportunities aligned with constructivist principles28.

In the dataset implementation phase, researchers reconstruct and refine each dataset using various quality evaluation indicators to ensure rigorous dataset composition. To establish objective criteria, we categorized common quality indicators from prior research - including size adequacy, accuracy, consistency, completeness, relevance, objectivity, and comprehensibility - into key evaluation metrics as shown in Table 42527.

Table 4.

Dataset review metrics—reconstructed dataset quality assessment.

Assessment metric Details
Suitability of Size(QA1) The dataset must contain an adequate amount of data necessary for AI modeling; it should neither be too small nor too large
Accuracy(QA2) The information represented by the data must be accurate
Consistency(QA3) Data should be presented in a consistent manner
Completeness(QA4) There should be no missing values, and all elements necessary for problem-solving must be included
Relevance(QA4) There should be a high relevance between the dataset and the problem to be solved
Objectivity(QA5) Information derived from the data should not be biased
Ease of Understanding(QA6) The data should be presented in a manner that is easy for the user to understand

First, the expert review process is conducted through interviews focusing on two main aspects. The first evaluation utilizes the quality assessment criteria from Table 4, which served as the foundation for dataset implementation, to systematically evaluate the developed datasets quality. The second evaluation employs criteria for assessing dataset appropriateness from a constructivist perspective, selecting authentic activity characteristics that can be applied to datasets as shown in Table 529. Each expert directly applies the developed datasets to visualization and AI modeling tasks over a designated period, then selects one of three response options (O: meets criteria, △: partially meets criteria requiring revision, X: fails to meet criteria) for each assessment item. Experts who select △ or X must provide detailed rationale for their judgments. After completing individual evaluations, all responses are synthesized through group discussions to derive consensus findings, which are then reviewed with researchers in a group interview format.

Table 5.

Dataset review metrics—reconstructed characteristics of authentic activities.

Characteristic of authentic activity Details
Relevance to the real world(AA1) Activities match as nearly as possible the real-world tasks of professionals in practice rather than decontextualized or classroom-based tasks.
Complex that require a long period of time(AA2) Activities are completed in days, weeks and months rather than minutes or hours. They require significant investment of time and intellectual resources.
Utilize diverse resources and investigate from multiple perspectives(AA3) The task affords learners the opportunity to examine the problem form a variety of theoretical and practical perspectives, rather than allowing a single perspective that learners must imitate to be successful
Integration with multiple disciplines(AA4) Activities encourage interdisciplinary perspectives and enable diverse roles and expertise rather than a single well-defined field or domain
Produce a finished artifact(AA5) Activities culminate in the creation of a whole product rather than an exercise or sub-step in preparation for something else.
Diversity of solutions and outcomes(AA6) Activities allow a range and diversity of outcomes open to multiple solutions of an original nature

Finally, the AI modeling results validate the accuracy by processing data or configuring AI models using the developed dataset according to its intended purpose. We systematically compare modeling outcomes through adjustments of key hyperparameters specific to each algorithm, focusing on accuracy metrics to verify alignment with both development objectives and educational goals.

Analysis of the effectiveness of datasets for AI education

Design of effectiveness analysis

To systematically analyze the effectiveness of the developed dataset application, we implement quantitative evaluation of dataset usage and conduct experimental studies to assess educational effectiveness, followed by comprehensive results evaluation.

First, the dataset usage evaluation involves deploying the developed dataset and quantitatively comparing its utilization frequency over a designated period.

For analyzing educational effectiveness, we designed the experiment as shown in Table 6. Participants were divided into two groups: the experimental group engaged with problem scenarios and educational programs utilizing the dataset developed in this study, while the control group used educational programs employing the most widely adopted datasets in AI education. We administered identical pre-test and post-test assessments to measure AI literacy improvement, with subsequent comparative analysis of results.

Table 6.

Design of dataset application experiment.

Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic

Inline graphic: Experimental Group

Inline graphic: Control Group

Inline graphic, Inline graphic: Pre-Test AI Literacy Measurement Test

Inline graphic, Inline graphic: Post-Test AI Literacy Measurement Test

Inline graphic: AI class using the dataset developed through this study

Inline graphic: AI class using the most used dataset

The educational programs utilizing each dataset incorporate decision tree concept instruction, which is effective for developing both transparent understanding of AI modeling processes and comprehension of underlying principles18,19. A summary of the program implemented for educational purposes is presented in Table 7.

Table 7.

Experimental AI education program summary.

Session Learning objectives Learning concepts
1–2 Learners will be able to visualize and analyze data to solve real-life problems.

Structure of dataset

Histogram, Scatterplot

Correlation analysis

3–4 Learners will understand and be able to apply the decision tree algorithm for classification.

Understanding decision trees

Classification using decision trees

Practice decision tree modeling methods

5–6 Learners will solve problems using decision trees and create, share, and improve their own program outputs

Decision tree modeling

Program development, sharing, improvement

The educational program was structured as six 45-min sessions organized in weekly blocks of two sessions over three weeks. Sessions 1–2 focused on understanding problem-solving using AI and data visualization. Sessions 3–4 consisted of understanding decision tree algorithms and hands-on practice with decision tree modeling. The final sessions 5–6 included problem solving through decision tree modeling, along with development, sharing, and refinement of program outputs as classroom products. Detailed information about all instructional activities and dataset utilization perspectives can be found in Appendix.

Application experiment participants

The students participating in the educational program for analyzing application effectiveness were selected through purposive sampling from two middle schools located in City C, South Korea, as detailed in Table 8. The sampling criteria were established as follows: first, participants must be students receiving computing and AI education limited to the national curriculum; second, they must be able to participate in six instructional sessions for program implementation. Accordingly, 76 students (33 male, 43 female) from Middle School A participated in the study. For statistical analysis of results, we conducted a comparative study with a control group consisting of 75 students (39 male, 36 female) from Middle School B, located in the same region and receiving identical instruction from the same teacher following the same curriculum.

Table 8.

Application experiment participants details.

Group Grade Male(Ratio) Female(Ratio) Total
Experimental group Middle School 1st 33(47.76) 43(52.24) 76
Control group Middle School 1st 39(48.26) 36(52.74) 75

Research tools

For systematic analysis of educational effectiveness, we employed an AI literacy measurement tool for middle school students that enables quantitative comparison of learners’ AI literacy improvement30. As shown in Table 9, this measurement instrument categorizes AI literacy into six sub-competencies: social impact of AI, AI implementation planning, AI problem solving, understanding of AI, data literacy, and AI ethics, comprising a total of 30 items. The tool demonstrated reliability across all sub-domains (Cronbach’s α = 0.861–0.939), with validity as a measurement instrument established through exploratory and confirmatory factor analyses.

Table 9.

AI literacy measurement tools.

Sub-competency Definition Number
Social impact of AI Understanding the societal impact of AI and ethical issues and practices that may arise from AI 1, 2, 3, 4, 5, 6, 7, 8
Execution plan with AI Understanding the classification, definition, characteristics, and principle of AI to solve problems using AI 9, 10, 11, 12, 13
Problem solving with AI A competency to define a problem for problem solving using AI, understand the steps for solving the problem, assign roles and tasks according to the Machine learning steps, and establish and manage the problem-solving process 14, 15, 16, 17, 18
Understanding of AI A competency to generate ideas for problem solving, prepare data, select AI models, design and develop AI programs, test the performance of AI and evaluate efficiency. 19, 20, 21, 22, 23, 24
Data literacy A competence for understanding data, data exploration, collection, data interpretation and evaluation, data management and use to solve problems using AI 25, 26, 27, 28
AI Fairness Understanding misinformation, diversity, bias in AI, and AI fairness 29, 30

The survey responses were encoded on a 5-point Likert scale ranging from 5 (Strongly Agree) to 1 (Strongly Disagree). For systematic analysis of response patterns, we compared distribution characteristics across subdomains using quartile ranges (Q1, median, Q3) and modal values. Differences between experimental and control groups were analyzed using the Mann-Whitney U test, a non-parametric statistical method appropriate for small sample sizes or ordinal data measurement, which is particularly prevalent in behavioral science research31.

To quantitatively compare the effectiveness of the developed dataset, we employed Cliff’s Delta as a non-parametric effect size metric. Cliff’s Delta demonstrates minimal bias across diverse conditions and maintains consistent standard errors, making it suitable for objective comparison of effect magnitudes. This statistic ranges from − 1 to 1, where 0 indicates identical distributions between compared groups. While effect size interpretation depends on research context and analytical subject characteristics, we adopted Cohen’s d thresholds as reference values: 0.20 for “small effect,” 0.50 for “medium effect,” and 0.80 for “large effect”32,33.

Development dataset for AI education

Analysis of requirements for AI education dataset

To analyze requirements for AI education datasets, we first investigated current dataset usage trends. The UCI ML Repository, which provides various types of datasets for AI modeling research, offers 664 different datasets. As shown in Fig. 3, users can check information such as appropriate modeling algorithms, number of variables, and access frequency. Based on access frequency, the most frequently used datasets were identified as ‘Iris’, ‘Dry Bean Dataset’, ‘Heart Disease’, ‘Rice’, and ‘Adult’34.

Fig. 3.

Fig. 3

Usage of key datasets from the UCI ML repository.

To analyze the usage of specific datasets for educational purposes, we examined the dataset usage status in “Entry,” South Korea’s representative educational programming language that provides practical functions for AI education. As shown in Fig. 4, Entry is a visual programming language. This visual programming language helps reduce the difficulties associated with learning syntax and maintains students’ interest while they understand and learn the basic concepts of AI35,36. Additionally, Entry provides basic resources for AI education and offers 19 datasets through the “Data Analysis” feature. In the Entry platform (https://playentry.org), you can view datasets by clicking the following menu options sequentially: [Create] - [Analyze data] - [Load tables] - [Add tables] - [Select tables]. It also enables AI modeling practices such as data visualization, linear regression, binary classification, multi-class classification, and clustering, and provides datasets with specified purposes for AI modeling, such as ‘Iris’, ‘Boston Housing’, ‘Palmer Penguins’, and ‘Titanic’37.

Fig. 4.

Fig. 4

Educational programming language entry and datasets in the visual environment.

We analyzed program outputs utilizing AI modeling features and datasets in Entry between December 31, 2020, and December 31, 2021, deriving the total dataset usage count from these artifacts. To analyze overall usage patterns, we visualized the utilization status of the top 10 most frequently used datasets as shown in Fig. 5.

Fig. 5.

Fig. 5

Usage of AI education datasets in entry.

The top 10 most frequently used datasets in Entry were identified as ‘Iris’, ‘Population by City’, ‘Boston Housing’, followed by ‘Consumer Price Index’ in descending order. Detailed analysis of the visualized chart for AI modeling datasets reveals consistent usage of Iris and Boston Housing datasets, with Iris being used 7499 times and Boston Housing 6619 times during the study period. This figure demonstrates significantly higher usage compared to other AI modeling datasets that failed to rank within the top 10.

Both the UCI ML repository (a platform providing datasets for AI modeling) and Entry (an educational programming language platform) showed Iris as the most utilized dataset. The Iris dataset, composed of continuous independent variables and categorical dependent variables, is particularly suitable for multiclass classification tasks. Its high usage frequency suggests it serves as a representative dataset for AI modeling practice.

While widely-used datasets for AI modeling practice offer the advantage of easily accessible examples for various AI modeling and computing activities, they exhibit limitations including lack of relevance to students’ daily lives, difficulty in connecting to real-world contexts, and inability to provide authentic practical experiences. Notably, the Boston dataset has been discontinued in major machine learning libraries like scikit-learn due to ethical concerns, necessitating the development of alternative datasets37.

Design and implement for AI education datasets

This study develops datasets for AI education by benchmarking Entry, a widely adopted educational programming platform with significant classroom impact. Through analysis of Entry’s technical specifications, we established requirements for datasets suitable for supervised and unsupervised learning implementations, as detailed in Table 10. The platform supports essential modeling algorithms including Linear Regression, Logistic Regression, k-Nearest Neighbors (kNN), Support Vector Machines (SVM), Classification and Regression Trees (CART), and Clustering. For model configuration, users can define up to 6 continuous variables as independent features, while dependent variables may incorporate either continuous or categorical data types depending on the learning task.

Table 10.

Requirements of modeling methods and variables available in entry.

Modeling method Independent variables Dependent variables
Supervised learning Linear-regression Continuous variables Continuous variables
Binary classification Continuous variables Categorical variables (2 categories)
Multiple classification Continuous variables Categorical variables (more than 2 categories)
Unsupervised learning Clustering Continuous variables that allow for the formation of clusters and the derivation of meaning through the characteristics among the data, without the need for a dependent variable.

To provide AI educational datasets contextualized to students’ daily lives, we explored and structured preliminary dataset drafts as shown in Table 11, incorporating contextual frameworks from PISA 2022 Mathematics. We investigated public data platforms and diverse dataset sources while verifying appropriate variable inclusion for respective modeling methods. All datasets explicitly specify applicable licenses for educational purposes, with particular emphasis on exploring publicly available data centered around daily life topics likely to engage student interest. For certain datasets requiring specialized context, researchers directly collected and structured original data to complete the draft dataset compositions.

Table 11.

Details of dataset development draft.

Modeling
method
Raw dataset
(context)
Source Key variables Size (Row*Colum)

Linear

Regression

Mosquito activity index in Seoul(Societal) Seoul Metropolitan Government website39 Date, average, waterfront, residential, parks 1127*5
Weather observations data(Scientific)

Weather Data

Open Portal40

Date, average temperature, minimum temperature, maximum temperature, daily precipitation, average wind speed, etc. 1091*9

Binary

Classification

Baseball game results (Occupational) Collected by researchers41 Team name, opponent name, ballpark, number of hits, game result, etc. 1369*28

Multiple

Classification

Body information and T-shirt size (Personal) Collected by researchers Height, weight, BMI, waist circumference, T-shirt size 28*5
Clustering Earthquake information (Scientific)

Korea Meteorological

Administration

Weather Nuri42

Time of occurrence, magnitude, depth, maximum magnitude, latitude, longitude, location 4958*7

The draft datasets were systematically restructured according to AI modeling methodologies and educational objectives to facilitate effective utilization in AI education, as detailed in Table 12.

Table 12.

Dataset reconstruction metrics and methods.

Modeling
method
Dataset Dataset review metrics Reconfiguration methods Size (Row*Colum)

Linear

Regression

Mosquito activity index

in Seoul

Completeness

Objectivity

Ease of Understanding

Data join

Extract and reconstruct key variables

1090*13
Weather observations data Erase of Understanding

Binary

Classification

Baseball game results

Ease of Understanding

Objectivity

Create synthetic data

Extract and reconstruct key variable

517*16

Multiple

Classification

Body information and T-shirt size

Suitability of Size

Objectivity

Create synthetic data

Data augmentation

225*5
Clustering Earthquake information Suitability of Size Filtering key rows 655*7

The datasets were primarily restructured according to AI modeling methods and educational objectives. For Linear Regression datasets, it was necessary to ensure completeness by specifying independent and dependent variables. The Seoul Mosquito Activity Index and Synoptic Meteorological Observation datasets were joined by date after establishing variable relationships through synthesis of prior research on meteorological environments and mosquito populations42,43. Key variables were extracted and reorganized to enhance student comprehension and provide successful modeling experiences16.

The baseball game results dataset was collected from sports information websites, then completely synthesized using statistical simulation methods based on original data to minimize team-specific bias and enhance objectivity44. The dataset was further restructured by extracting key variables influencing the dependent variable for student accessibility.

The body measurements and t-shirt size dataset, initially collected from students, was replaced with a fully synthetic version to address privacy concerns and improve size appropriateness through synthetic data generation techniques44. This approach enhanced objectivity while resolving issues with limited original data scale.

The earthquake location dataset was restructured by removing entries below magnitude 2, which are classified as non-impactful seismic events based on domain expertise, to improve size appropriateness41.

Testing AI education dataset

Experts review

The evaluation results of the draft datasets through data quality assessment and authentic activity characteristics analysis, along with group interview findings, are summarized in Table 13.

Table 13.

Expert interview key comments.

QA1 QA2 QA3 QA4 QA5 QA6 QA7 AA1 AA2 AA3 AA4 AA5 AA6

Mosquito

Activity

index

O Δ O O O O O O O O O O O
Data inputted in uniform values over certain sections were identified, potentially impacting statistical outcomes, necessitating preprocessing before provision.

Baseball

Game results

O O O O O O Δ O O O O O O

The dataset features a relatively large number of variables with complex names, indicating a need for simpler expressions.

While offering a variety of variables to explore different outcomes, making it a good dataset, providing examples for classroom use is deemed essential.

T-shirt sizes O O O O O O O O Δ O O O Δ

The simple composition of some dataset may not satisfy the characteristic of ‘Complex Tasks’. However, they are considered very appropriate for students beginning to learn with easy AI modeling like decision trees. Deleting some columns to offer a simplified form to students is suggested.

The accuracy of models based on the dataset appears too high, and the clusters of data according to the dependent variable are too distinct, necessitating some modifications.

Earthquake information O O O O O O Δ O Δ O O O O

The structure of rows and columns seems overly simplistic, potentially resulting in lower usability.

The purpose of using the dataset for clustering should be clearly presented along with the dataset provision. Inferring based on the current rows and columns may be challenging for learners.

Regarding overall feedback on the datasets, experts frequently noted that the developed AI education datasets showed high applicability due to their relevance to students’ daily lives, while emphasizing the need to provide concrete usage examples. Several reviewers suggested intentionally incorporating elements like data preprocessing activities to encourage diverse approaches and outcome variations among students.

The detailed specifications of the finalized AI education datasets, reflecting expert interview outcomes, are presented in Table 14.

Table 14.

Finalized AI education dataset.

Modeling method Name Context Key Independent variables Dependent variables Size (Row*Colum)

Linear

Regression

Mosquito

activity

index

Societal Average temperature, daily precipitation, average wind speed, average relative humidity, total solar radiation

Average mosquito

activity index

1090*13

Binary

Classification

Baseball game results Occupational Run batted In, triples, home runs, stolen base, double play, left on base Wins and losses 517*15

Multiple

Classification

T-shirt sizes Personal Height, weight T-shirt size 225*3
Clustering Earthquake information Scientific Latitude, longitude 4,958*5

In the ‘Mosquito activity index’ dataset, some data fields were found to contain uniformly input values during the data collection process. While some experts recommended preprocessing these values before providing the dataset to students, we ultimately preserved the uniformly input values to facilitate practical data preprocessing exercises in educational settings44.

For the ‘Baseball game results’ dataset, we removed the ‘Team name’ column containing categorical information and revised variable names based on expert recommendations to enhance student comprehension.

The ‘T-shirt sizes’ dataset was recognized as particularly suitable for introductory AI education, especially for transparent understanding of decision tree models. Experts noted that variables like BMI showed high correlation with other factors, potentially causing multicollinearity issues if used as independent variables. Since addressing this through preprocessing might exceed students’ current capabilities, we simplified the dataset to essential ‘Height’ and ‘Weight’ variables. Additionally, we modified some data points to create overlapping size categories, addressing concerns about excessive model accuracy from overly distinct clusters.

For the ‘Earthquake information’ dataset, we addressed structural simplicity concerns by reintroducing preprocessed data points below magnitude 2 (previously excluded) and structuring the dataset to demonstrate clustering differences through preprocessing activities.

Review AI modeling accuracy

The AI education datasets developed through a constructivist lens, which are closely connected to students’ daily lives, must be effectively utilized for their intended educational purposes and should ultimately lead to the development of integrated intelligent systems as tangible outcomes of learners’ computational activities22. Prior to implementation, it is essential to evaluate the accuracy and usability of outputs - key factors that often hinder effective education using real-world data19. To address this, we conducted comprehensive testing of the developed datasets through modeling and evaluation using appropriate performance metrics including accuracy measures.

The ‘Mosquito activity index’ dataset is designed for linear regression analysis using continuous dependent and independent variables. In the Entry programming environment, setting one dependent and one independent variable allows visual confirmation of results, significantly enhancing students’ understanding of AI modeling principles. We selected ‘average mosquito activity index’ as the dependent variable and ‘average ground temperature’ as the independent variable based on their statistically significant correlation, implementing the model using Scikit-Learn’s LinearRegression. To validate the modeling results, we visualized the data and regression line as shown in Fig. 6, reserving 20% of the data for testing. We employed Mean Squared Error (MSE) and R-squared (R²) values, standard metrics for linear regression accuracy assessment, with results detailed in Table 15. Notably, we compared model accuracy between the original dataset containing uniformly input values and its preprocessed version to validate our initial dataset construction rationale.

Fig. 6.

Fig. 6

Visualization results of linear regression of mosquito activity index dataset.

Table 15.

Linear regression accuracy measurement results.

Accuracy metric Before pre-processing After pre-processing
MSE-Train 162.96 157.52
MSE-Test 169.17 134.36
R2 -Train 0.79 0.76
R2-Test 0.78 0.81

The comparative analysis revealed enhanced performance on test data after preprocessing, demonstrating that the refined linear model exhibits greater generalizability and explanatory power (R²-Test = 0.81). This dataset’s structure allows for modeling with various combinations of two or more independent variables, enabling comparative analysis of results and encouraging diverse student outcomes through multiple analytical approaches. These characteristics confirm the dataset’s effectiveness for both linear regression applications and comprehensive education about regression techniques, including preprocessing considerations.

The ‘Baseball game results’ dataset proves suitable for binary classification tasks using various dependent variables to predict game outcomes. Within the Entry programming environment, we implemented binary classification through TensorFlow, offering optional use of Adam Optimizer or SGD (Stochastic Gradient Descent) Optimizer. Through correlation analysis and variance inflation factor examination, we selected six independent variables (‘runs scored’, ‘triples’, ‘home runs’, ‘stolen bases’, ‘strikeouts’, and ‘double plays’) while excluding those showing multicollinearity. Using Keras framework, we constructed a neural network comprising a single fully connected layer with 32 neurons and a Sigmoid activation function. We evaluated both optimization approaches by reserving 20% of data for testing, visualizing accuracy/loss trajectories in Fig. 7. To ensure rigorous validation, we employed comprehensive metrics including accuracy, precision, recall, and F1-Score, supplemented by averaged results from 1,000 iterative modeling trials as detailed in Table 16.

Fig. 7.

Fig. 7

An accuracy and loss graph according to the optimizer of the binary classification model.

Table 16.

Accuracy according to the optimization function of the binary classification model.

Accuracy metric SGD optimizer Adam optimizer
Accuracy 0.89 0.94
Precision 0.78 0.90
Recall 0.97 0.95
F1-Score 0.87 0.92
1,000 Repeated Accuracy Mean 0.88 0.91
1,000 Repeated Accuracy Std 0.02 0.04

The analysis revealed consistently high accuracy across all available optimization functions in the Entry programming environment. The 1,000 iterative measurements demonstrated robust mean accuracy with low standard deviation, confirming the dataset’s effectiveness for teaching binary classification concepts while allowing students to freely configure independent variables and explore diverse modeling approaches.

The ‘T-shirt sizes’ dataset is optimized for multiclass classification using categorical dependent variables. The Entry environment implements CART (Classification and Regression Tree) methodology for this purpose, where we designate the categorical ‘t-shirt size’ variable as the dependent feature and reserve 20% of data for testing. Using Scikit-Learn’s DecisionTreeClassifier, we established modeling parameters by setting the minimum leaf node count to 5 (matching the unique category count in the dependent variable) and systematically increasing maximum tree depth from 1 to 10. For each depth configuration, we performed 1,000 modeling iterations to calculate mean, maximum, and minimum accuracy values, as detailed in Table 17.

Table 17.

Accuracy according to the decision tree maximum depth hyper parameter.

Max Depth Min. accuracy Max. accuracy Mean accuracy
1 0.42 0.42 0.42
2 0.75 0.75 0.75
3 0.84 0.84 0.84
4 0.84 0.87 0.85
5 0.84 0.87 0.85
6 0.84 0.87 0.85
7 0.84 0.87 0.85
8 0.84 0.87 0.85
9 0.84 0.87 0.85
10 0.84 0.87 0.85

Analysis of the decision tree models revealed that maximum tree depth plateaued at 7, with no further depth increases observed beyond this threshold. Accuracy evaluation demonstrated two distinct patterns: shallow trees (depth = 1) showed limited classification capability across all dependent variables (accuracy = 0.42), while deeper configurations (depth ≥ 4) achieved peak performance (accuracy = 0.87). This progression confirms the ‘T-shirt sizes’ dataset’s effectiveness for teaching decision tree principles and implementing multiclass classification models.

The ‘Earthquake information’ dataset serves as an unsupervised learning resource featuring magnitude estimates for seismic intensity and geospatial coordinates (latitude/longitude) for cluster analysis. Using the Entry platform’s k-Means implementation with Scikit-Learn, we conducted cluster modeling experiments with varying group quantities (2–9 clusters). To objectively determine optimal clustering, we calculated inertia values—the sum of squared distances between cluster centers and their member points. We performed comparative analysis using both raw data and a preprocessed subset containing only seismically significant events (magnitude ≥ 2.0), with visualization results shown in Fig. 8.

Fig. 8.

Fig. 8

Visualization of cluster and Inertia values through k-Means before and after preprocessing.

Visual analysis of inertia values revealed distinct clustering patterns and centroid positions between preprocessed and raw data when using 5–7 clusters. This demonstrates the dataset’s educational value for implementing AI-driven decision-making processes in classroom settings, as students can critically compare different clustering outcomes. The dataset’s effectiveness for cluster modeling education was thereby confirmed.

Maintanance AI education dataset

To enhance accessibility and educational utility of the developed datasets, we implemented distribution through the Entry programming platform following standardized procedures. As shown in Fig. 9, educators and students can access datasets through Entry’s practice interface using the workflow: [Table] → [Load Data Table] → [Add Table], ensuring consistency with other educational datasets available on the platform.

Fig. 9.

Fig. 9

Publishing datasets via entry.

The dataset interface incorporates expert recommendations from the testing phase, particularly addressing dataset quality assessment and practical application requirements. As demonstrated in Fig. 10, each dataset includes: (1) Basic description, (2) Key variable explanations, (3) Column/row metadata, and (4) Usage examples—implementing expert guidance that “datasets should be easily understandable from a quality assessment perspective” and “must enable creation of complete outputs reflecting real-world activities”36.

Fig. 10.

Fig. 10

Dataset Presented via [Table] Menu of the Entry.

We established a maintenance framework featuring multiple feedback channels: an integrated bulletin board within Entry and a dedicated web portal with usage guides. This infrastructure allows users to submit improvement suggestions, which researchers can implement through collaborative review with the Connect Foundation (Entry’s governing organization). Approved modifications undergo immediate integration into the programming environment through automated deployment pipelines.

Results

Analyzing dataset usage

To analyze the usage of the developed datasets, the total number of times the datasets have been published for AI modeling from January 1, 2023, to December 31, 2023, was summarized as shown in Table 18.

Table 18.

Usage of datasets for AI modeling.

Modeling
method
Dataset Usage by
Modeling method
Sum by
Modeling method
Total sum

Linear

Regression

Boston Housing 4,372 5,809 29,492
Mosquito activity index 1,437

Binary

Classification

Titanic 1,935 2,666
Baseball game results 731

Multiple

Classification

T-shirts sizes 6,752 14,685
Iris 5,953
Palmer Penguins 1,980
Clustering Middle school locations 2,127 6,332
Elementary School Locations 1,916
High School Locations 1,722
Earthquake information 567

The Entry platform recorded 29,492 total AI modeling instances, with multiclass classification being the most frequently used technique (14,685 instances), aligning with the needs analysis from our AI educational dataset development process. The ‘T-shirt sizes’ dataset dominated multiclass classification usage with 6,752 instances, followed by ‘Iris’ (5,953) and ‘Palmer Penguins’ (1,980), maintaining this order in overall utilization. Linear regression accounted for 5,809 modeling cases, featuring the ‘Boston housing’ dataset (4,372 uses) and our developed ‘Mosquito activity index’ dataset (1,437 uses), showing relatively strong adoption despite being newer. Binary classification models were implemented 2,666 times, primarily using the ‘Titanic’ dataset (1,935 instances) and our ‘Baseball game results’ dataset (731 instances). Cluster analysis accounted for 6,332 implementations, with the ‘Middle school locations’ dataset being most popular (2,127 uses) and our ‘Earthquake information’ dataset receiving the least usage (567 instances). The overall usage patterns of datasets employed in AI modeling are visualized in Fig. 11.

Fig. 11.

Fig. 11

Visualization dataset usage for AI modeling.

Following deployment through Entry, the ‘T-shirt sizes’ dataset demonstrated sustained dominance as the most frequently used resource for AI modeling, showing higher adoption rates than all comparable datasets. While the ‘Mosquito activity index’ dataset trailed the established ‘Boston housing’ dataset in usage numbers, it nevertheless achieved notable adoption levels. Both ‘Baseball game results’ and ‘Earthquake information’ datasets showed relatively lower engagement. Our usage analysis reveals that the ‘T-shirt sizes’ dataset effectively supplanted the previously dominant ‘Iris’ dataset for AI education purposes, demonstrating strong user preference and validating its design as a superior educational resource.

Results of applying the dataset education program

To validate the effectiveness of the developed datasets, we implemented an AI literacy enhancement program using the datasets as independent variables with experimental and control groups. A pre-test using standardized AI literacy assessments was administered to both groups to establish baseline equivalence through systematic verification. The encoded response data from both groups were analyzed across AI literacy subdomains, with Table 19 presenting quartile values (Q1, Median, Q3) and Mann-Whitney U test statistics for intergroup homogeneity assessment.

Table 19.

AI literacy pre-test results.

Sub-competency Group n Q1 Median Q3 Mann-Whitney U
z p
Social impact of AI Experimental 76 2.38 3.00 3.38 − 0.804 0.416
Control 75 2.50 3.00 3.56
Execution plan with AI Experimental 76 2.35 3.00 3.40 − 0.391 0.693
Control 75 2.40 3.00 3.80
Problem solving with AI Experimental 76 2.35 3.00 3.05 − 0.929 0.346
Control 75 2.50 3.00 3.60
Understanding of AI Experimental 76 2.33 3.00 3.13 − 0.588 0.552
Control 75 2.42 3.00 3.66
Data literacy Experimental 76 2.50 3.00 3.31 − 0.579 0.557
Control 75 2.50 3.00 3.50
AI Fairness Experimental 76 2.43 3.00 3.33 − 0.592 0.552
Control 75 2.47 3.00 3.55

Both control and experimental groups showed identical median scores of ‘Neutral’ across all sub-competencies (Median = 3.00). The first quartile (Q1) values fell within the range of ‘Disagree’ to ‘Neutral’ (2.33 ≤ Q1 ≤ 2.50), while the third quartile (Q3) values spanned from ‘Neutral’ to ‘Agree’. The experimental group showed slightly higher Q3 results (3.50 ≤ Q3 ≤ 3.80) compared to the control group (3.05 ≤ Q3 ≤ 3.40), indicating a marginal difference in upper score distribution.

The interquartile range analysis revealed comparable distributions between groups, with the experimental group spanning 2.33–3.40 and the control group 2.50–3.80. Although the experimental group’s range was slightly lower, both groups maintained similar distribution patterns. Mann-Whitney U tests confirmed no statistically significant differences in any of the six AI literacy sub-competencies between groups (p > .05), establishing baseline equivalence. Response frequency histograms for pre-test results by sub-competency are shown in Fig. 12.

Fig. 12.

Fig. 12

Visualization Pre-Test response frequency by AI Literacy sub-competency.

All sub-competencies showed highest frequency in the ‘Neutral’ category, with ‘Disagree’ being the second most common response. This pattern confirms that most students initially perceived their AI literacy at ‘Disagree’ to ‘Neutral’ levels, with no substantial differences between control and experimental groups prior to intervention.

Following the research design, the experimental group received instruction using the ‘T-shirt sizes’ dataset with contextual problem scenarios, while the control group used the traditional ‘Iris’ dataset for multiclass classification training. Both groups completed six instructional sessions before post-test administration using identical assessment tools, with results detailed in Table 20.

Table 20.

AI literacy Post-test results.

Sub- competency Group n Q1 Median Q3 Mann-Whitney U
z p
Social impact of AI Experimental 76 4.00 4.38 4.50 3.556 0.000
Control 75 3.25 3.88 4.19
Execution plan with AI Experimental 76 4.00 4.30 4.60 3.301 0.000
Control 75 3.40 4.00 4.00
Problem solving with AI Experimental 76 4.00 4.40 4.60 3.863 0.000
Control 75 3.40 3.60 4.10
Understanding of AI Experimental 76 4.00 4.33 4.50 3.716 0.000
Control 75 3.33 3.67 4.25
Data literacy Experimental 76 4.00 4.25 4.50 4.081 0.000
Control 75 3.00 3.75 4.25
AI Fairness Experimental 76 3.63 4.00 4.45 3.035 0.000
Control 75 3.13 3.50 4.03

Analysis focusing on median values revealed that across all AI literacy sub-competencies, the experimental group’s medians fell within the ‘Agree’ to ‘Strongly Agree’ range (4.00 ≤ Median ≤ 4.40), while the control group’s medians remained in the ‘Neutral’ to ‘Agree’ range (3.50 ≤ Median ≤ 3.88). Examining the distribution of 50% participants around the median through Q1 and Q3 values, the experimental group showed a wider spread from 3.63 to 4.60, encompassing responses from ‘Neutral’ to ‘Strongly Agree’. The control group exhibited a similar distribution range (3.00-4.25) but demonstrated generally lower response patterns concentrated around ‘Neutral’ and ‘Agree’.

Mann-Whitney U tests for group homogeneity showed statistically significant differences across all sub-competencies (p < .001). The experimental group’s overall higher distribution pattern suggests significantly improved AI literacy compared to the control group. This outcome appears attributable to the dataset variable difference, indicating that our developed datasets successfully provided authentic learning experiences that enabled students to contextualize knowledge and skills through real-world correspondences during cognitive processing45. Following the same analytical approach as the pre-test results, the post-test results showing response frequencies of both experimental and control groups were visualized through histograms by sub-competency as shown in Fig. 13.

Fig. 13.

Fig. 13

Visualization Post-Test response frequency by AI Literacy sub-competency.

In the experimental group, the ‘Agree’ response category showed the highest frequency across all sub-competencies, followed by ‘Strongly Agree’ as the second most frequent response. The control group demonstrated relatively lower effectiveness, with ‘Agree’ being the most frequent response only for the ‘Social impact of AI’ and ‘Execution plan of AI’ sub-competencies, while ‘Neutral’ remained the most frequent response for other competencies. Comparative frequency analysis between groups empirically validates through competency comparisons that constructivism-based authentic activities in AI education are more effective for learning AI principles/concepts and enhancing thinking skills8,14. The hierarchical approach using purpose-built datasets appears to resolve context-free limitations of arbitrary datasets, enabling expansion of personal experiences into community issues and global problems through multi-contextual problem-solving13,18.

To quantitatively compare the effectiveness of the developed datasets, we analyzed Cliff’s Delta values using post-test results from both groups. The Cliff’s Delta values for each AI literacy sub-competency are presented in Table 21.

Table 21.

Cliff’s Delta for AI literacy sub-competencies in the experimental and control groups.

Sub-competency Cliff’s Delta Sub-competency Cliff’s Delta
Social impact of AI 0.345 Understanding of AI 0.366
Execution plan with AI 0.304 Data literacy 0.397
Problem solving with AI 0.364 AI Fairness 0.288

Analysis of Cliff’s Delta values revealed positive effect sizes for the experimental group across all sub-competencies when compared to the control group. Detailed examination of individual sub-competencies showed that ‘Data literacy’ demonstrated the largest effect size (Cliff’s Delta = 0.397), followed by ‘Understanding of AI’ (Cliff’s Delta = 0.366) and ‘Problem solving with AI’ (Cliff’s Delta = 0.364). This empirically validates previous research indicating that activity-oriented datasets help students contextualize information from their living environments while promoting digital/data technology awareness and development19.

‘AI Fairness’ showed the smallest effect size (Cliff’s Delta = 0.288), and the relatively large difference of 1.09 compared to the most effective ‘Data literacy’ component highlights both the importance and current limitations in teaching ethical aspects of AI—a crucial element of AI education—suggesting the need for specialized datasets. Particularly, as various studies emphasize the importance of AI ethics, there’s a need for approaches to ethics education based on AI’s foundational principles, including addressing data bias11.

Discussions and implication

Discussions and implication

The constructivist perspective - which is closely related to students’ real-life contexts, connectable to prior knowledge, and capable of providing authentic activities through problem-solving experiences—can be effectively applied to AI education68,46. Previous research has also established that in AI education, it is crucial to utilize real-world datasets from students’ immediate environments to help them understand AI model learning processes, generalize from data, and provide computing experiences through data utilization1618. However, difficulties in exploring educational datasets have been identified, as datasets primarily used in AI education cannot connect to students’ prior knowledge and consequently fail to provide proper problem-solving experiences19,36.

This study holds significance in that it restructured the machine learning dataset development process to explore and develop AI educational datasets that enhance students’ problem-solving experiences and enable authentic activities in teaching-learning processes. The datasets were evaluated based on quality assessment metrics and characteristics of authentic activities, then restructured into the most educationally appropriate form. Particularly, usage analysis revealed that our developed datasets show potential to replace the Iris dataset, which was previously most widely used for teaching classification concepts36.

Despite the numerous advantages of using real-world data in dataset-based education, previous research has identified limitations in the difficulty of appropriately refining and manipulating datasets to fit educational content and environments, hindering their practical application14,19,47. The datasets developed in this study secured educational suitability through expert reviews during testing phases and verified applicability through AI modeling results. Furthermore, their implementation on Entry—the most widely used educational programming platform in South Korea—provides high accessibility for teachers and students.

Analysis of educational program implementation results using the developed datasets as variables confirmed that constructivist-designed educational datasets have statistically significant impacts on all AI literacy sub-competencies. Moreover, this study holds significance in practically verifying previous research findings through dataset development and application, and these discoveries could effectively enhance students’ AI literacy14.

Limitations and future directions

This study has several limitations. The developed datasets were initially distributed through platforms predominantly used in South Korea, and some datasets (Seoul Mosquito, Earthquake Occurrence Status) contain region-specific contexts of Korean phenomena, presenting geographical constraints. Future research should generalize our dataset development process to collect and create AI education datasets with geographically neutral characteristics at scale. Additionally, we plan to develop libraries or web services that generate statistically similar synthetic datasets from bulk datasets to enhance the utility of AI educational datasets.

A methodological limitation exists in our use of PISA 2022 Mathematics contexts as the primary framework for systematic dataset theme exploration. While PISA 2022 Mathematics shares common ground with computational thinking and AI education, establishing effective contextual frameworks for AI education requires prior research into AI-specific contextual structures.

Further discussion is needed regarding optimal methods for dataset distribution to students and essential requirements for educational datasets. Although our developed datasets achieved suitability through expert interviews assessing variable complexity and individual data restructuring, establishing fundamental requirements for purpose-built educational datasets requires additional research to define optimal dataset formats and types.

Finally, while the developed datasets generally enhanced AI competencies, certain sub-competencies showed relatively lower effectiveness. Particularly in domains like AI Fairness - though appropriate for dataset-based education - challenges exist in locating suitable datasets and addressing potential ethical issues in real dataset applications. The increasing prevalence of AI-related ethical dilemmas in society underscores the urgent need for specialized educational datasets optimized for AI ethics training.

Conclusion

This study focused on developing constructivist-aligned datasets for AI education to enhance students’ AI literacy. To address the limitations of conventional AI education datasets that fail to connect with students’ lived experiences, inhibit utilization of prior knowledge, and provide inadequate authentic activities, we restructured the machine learning dataset development process to create specialized educational datasets. Through rigorous selection of contextually relevant themes, evaluation using data quality assessment metrics and authentic activity characteristics, and subsequent restructuring, we developed four optimized AI education datasets.

The datasets were deployed through educational programming platforms to enhance accessibility. Usage metrics analysis revealed their potential to replace existing datasets commonly used in AI education. Comparative analysis of an AI literacy enhancement program utilizing our datasets demonstrated significantly greater effectiveness in improving students’ AI competencies compared to conventional datasets.

A key contribution lies in fulfilling the educational community’s demand for purpose-built datasets. Our datasets demonstrated superior educational suitability and achieved the highest utilization rate among various AI education datasets. By distributing these high-quality resources through a web-based programming platform (Entry), we provided practical teaching materials accessible to educators and institutions nationwide.

This study expands theoretical discussions in AI education by proposing a systematic dataset development methodology. While previous research emphasized the importance of life-connected problem-solving experiences in AI education, it lacked concrete methodological frameworks and tangible dataset examples. Our work pioneers a software engineering-informed development process featuring rigorous quality control measures and structured procedures specifically designed for educational resource creation.

The application of constructivist-aligned datasets empirically reaffirms the importance of AI education that connects with students’ lived experiences through problem-solving activities. Despite persistent emphasis on real-world relevance in AI education, few studies have conducted direct comparative effectiveness analyses. Our experimental research using dataset-driven educational interventions with life experience connections provides empirical validation through comparative outcome analysis.

Finally, our findings offer policy implications for educational authorities and researchers. As AI literacy becomes increasingly crucial in our rapidly evolving technological landscape, we advocate for parallel development of industry-focused and pedagogically oriented datasets. Establishing national curriculum standards and shared educational resources based on purpose-built datasets could substantially support school-level implementation of quality AI education.

This study concentrated on the development of datasets from a constructivist perspective to enhance AI literacy among students. A significant limitation recognized in conventional AI education is the disconnect between the datasets used and students’ life experiences, which restricts the utilization of prior knowledge and the provision of Authentic activities. To overcome this, the dataset development process for machine learning was restructured to create AI educational datasets. Themes encompassing the context of students’ lives and the real world were carefully selected, assessed, and restructured based on data quality assessment metrics and Authentic activity characteristics, culminating in four AI educational datasets ideally suited for AI education.

The developed datasets were published through a programming language platform to increase usability, and their potential to replace existing datasets commonly used in AI education was identified through usage evaluations. Moreover, these datasets were utilized to design and implement an AI literacy enhancement program, which proved to be more effective in improving students’ AI competencies compared to traditional datasets.

A key strength of this study is the development of datasets for AI education, which meets the demands of the educational field. The datasets developed have proven to be suitably performant for use in AI education and are significant due to their high utilization rate among various datasets employed for this purpose.

Additionally, the practical application of these datasets in education programs reiterates the importance of AI education that is connected to students’ life experiences and provides practical problem-solving activities. This form of education can effectively improve students’ AI competencies, underscoring the need for meticulous instructional design centered around educational datasets from a constructivist perspective.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1 (49.7KB, docx)

Author contributions

S. Kim and K. Kim oversaw the entire research process, conducted the study, wrote the manuscript, and led the development of research outputs. T. Kim collaborated with relevant institutions to officially service the research results, validated the findings, and reviewed the manuscript.

Data availability

The datasets from this study can be accessed in the “Create” - “Datasets” menu on the visual programming language platform Entry (https://playentry.org).

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Seul-Ki Kim, Email: tmfrlska85@gmail.com.

Kwihoon Kim, Email: kimkh@knue.ac.kr.

References

  • 1.Yim, I. H. Y. & Su, J. Artificial intelligence (AI) learning tools in K-12 education: A scoping review. J. Comput. Educ.10.1007/s40692-023-00304-9 (2024). [Google Scholar]
  • 2.Ng, D. T. K., Leung, J. K. L., Chu, S. K. W. & Qiao, M. S. Conceptualizing AI literacy: An exploratory review. Comput. Educ. Artif. Intell. 100041. 10.1016/j.caeai.2021.100041 (2021). 2.
  • 3.Samala, A. D. et al. Unveiling the landscape of generative artificial intelligence in education: A comprehensive taxonomy of applications, challenges, and future prospects. Educ. Inform. Technol.10.1007/s10639-024-12936-0 (2024). [Google Scholar]
  • 4.Druga, S., Vu, S. T., Likhith, E. & Qiu, T. Inclusive AI literacy for kids around the world. In Proceedings of FabLearn 2019, 104–111. (2019). 10.1145/3311890.3311904
  • 5.Long, D. & Magerko, B. What is AI literacy? Competencies and design considerations. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, 1–16. (2020). 10.1145/3313831.3376727
  • 6.Martins, R. M. & Von Gresse, C. Findings on teaching machine learning in high school: A Ten—Year systematic literature review. Inf. Educ.10.15388/infedu.2023.18 (2022). [Google Scholar]
  • 7.Yau, K. W. et al. A phenomenographic approach on teacher conceptions of teaching artificial intelligence (AI) in K-12 schools. Educ. Inform. Technol.28 (1), 1041–1064. 10.1007/s10639-022-11161-x (2023). [Google Scholar]
  • 8.Chow, W. A Pedagogy that uses a kaggle competition for teaching machine learning: An experience sharing. In 2019 IEEE International Conference on Engineering, Technology and Education (TALE), 1–5. (2019). 10.1109/TALE48000.2019.9226005
  • 9.Dağ, F. Prepare pre-service teachers to teach computer programming skills at K-12 level: Experiences in a course. J. Comput. Educ.6 (2), 277–313. 10.1007/s40692-019-00137-5 (2019). [Google Scholar]
  • 10.Estevez, J., Garate, G., Guede, J. L. & Graña, M. Using scratch to teach undergraduate students’ skills on artificial intelligence. IEEE Access.7, 179027–179036. 10.1109/ACCESS.2019.2956136 (2019). [Google Scholar]
  • 11.Tedre, M. et al. Teaching machine learning in K–12 classroom: Pedagogical and technological trajectories for artificial intelligence education. IEEE Access.9, 110558–110572. 10.1109/ACCESS.2021.3097962 (2021). [Google Scholar]
  • 12.Chiu, T. K. F. A holistic approach to the design of artificial intelligence (AI) education for K-12 schools. TechTrends65 (5), 796–807. 10.1007/s11528-021-00637-1 (2021). [Google Scholar]
  • 13.Yang, W. Artificial intelligence education for young children: Why, what, and how in curriculum design and implementation. Comput. Educ. Artif. Intell.3, 100061. 10.1016/j.caeai.2022.100061 (2022). [Google Scholar]
  • 14.Chiu, T. K. F. & Chai, C. Sustainable curriculum planning for artificial intelligence education: A self-determination theory perspective. Sustainability12 (14), 5568. 10.3390/su12145568 (2020). [Google Scholar]
  • 15.Marques, L. S., Von Wangenheim, G., Hauck, J. C. R. & C., & Teaching machine learning in school: A systematic mapping of the state of the Art. Inf. Educ. 283–321. 10.15388/infedu.2020.14 (2020).
  • 16.Biehler, R. & Fleischer, Y. Introducing students to machine learning with decision trees using CODAP and Jupyter notebooks. Teach. Stat.43 (S1). 10.1111/test.12279 (2021).
  • 17.Vartiainen, H. et al. Machine learning for middle schoolers: Learning through data-driven design. Int. J. Child-Comput. Interact.29, 100281. 10.1016/j.ijcci.2021.100281 (2021). [Google Scholar]
  • 18.Evangelista, I., Blesio, G. & Benatti, E. Why are we not teaching machine learning at high school? A proposal. 2018 World Engineering Education Forum - Global Engineering Deans Council (WEEF-GEDC), 1–6. (2018). 10.1109/WEEF-GEDC.2018.8629750
  • 19.Bosnić, I., Čavrak, I. & Zuiderwijk, A. Introducing open data concepts to STEM students using Real-World open datasets. In 2021 44th Int. Convention Inform. Communication Electron. Technol. (MIPRO). 1530, 1535. 10.23919/MIPRO52101.2021.9596998 (2021). [Google Scholar]
  • 20.Touretzky, D. S. & Gardner-McCune, C. Artificial intelligence thinking in K–12. In (eds Kong, S. C. & Abelson, H.) Computational Thinking Education in K–12 (153–180). The MIT Press. 10.7551/mitpress/13375.003.0013 (2022).
  • 21.Sanusi, I. T., Oyelere, S. S., Vartiainen, H., Suhonen, J. & Tukiainen, M. A systematic review of teaching and learning machine learning in K-12 education. Educ. Inform. Technol.28 (5), 5967–5997. 10.1007/s10639-022-11416-7 (2023). [Google Scholar]
  • 22.Langley, P. An integrative framework for artificial intelligence education. Proc. AAAI Conf. Artif. Intell.33 (01), 9670–9677. 10.1609/aaai.v33i01.33019670 (2019). [Google Scholar]
  • 23.Hutchinson, B. et al. Towards accountability for machine learning datasets: Practices from software engineering and infrastructure. In Proc. 2021 ACM Conf. Fairness Account. Transpar.560, 575. 10.1145/3442188.3445918 (2021). [Google Scholar]
  • 24.Honebein, P. C., Duffy, T. M. & Fishman, B. J. Constructivism and the design of learning environments: Context and authentic activities for learning. In (eds Duffy, T. M., Lowyck, J., Jonassen, D. H. & Welsh, T. M.) Designing Environments for Constructive Learning (87–108). (Springer, 1993). 10.1007/978-3-642-78069-1_5. [Google Scholar]
  • 25.Bergdahl, M. et al. Handbook on Data Quality Assessment Methods and Tools (Ehling, Manfred Körner, 2007).
  • 26.Leo, L., Lee, P. & Wang, R. Y. Y. W., Data Qual. Assess.45(4), 211–218. 10.1145/505248.506010 (2002). [Google Scholar]
  • 27.Kim, S. & Kim, T. A study on educational dataset standards for K-12 artificial intelligence education. J. Korean Assoc. Comput. Educ.25 (1), 29–40. 10.32431/kace.2022.25.1.003 (2022). [Google Scholar]
  • 28.PISA 2022: Mathematics Framework. (2023). https://pisa2022-maths.oecd.org/
  • 29.Reeves, T. C., Herrington, J. & Oliver, R. Authentic activities and online learning. HERDSA 2002 Quality Conversations. (2002). https://researchportal.murdoch.edu.au/esploro/outputs/conferencePaper/Authentic-activities-and-online-learning/991005543775607891
  • 30.Kim, S. W. & Lee, Y. The artificial intelligence literacy scale for middle school students. J. Korea Soc. Comput. Inform.27 (3), 225–238. 10.9708/JKSCI.2022.27.03.225 (2022). [Google Scholar]
  • 31.Ruxton, G. D. The unequal variance t-test is an underused alternative to student’s t-test and the Mann–Whitney U test. Behav. Ecol.17 (4), 688–690 (2006). [Google Scholar]
  • 32.Cliff, N. Dominance statistics: Ordinal analyses to answer ordinal questions. Psychol. Bull.114 (3), 494 (1993). [Google Scholar]
  • 33.Hess, M. R. & Kromrey, J. D. Robust confidence intervals for effect sizes: A comparative study of Cohen’sd and Cliff’s delta under non-normality and heterogeneous variances. Annual Meeting of the American Educational Research Association, 1. (2004). https://www.academia.edu/download/53994708/cohen.pdf
  • 34.Datasets—UCI Machine Learning Repository. (n.d.). Retrieved February 28. from (2023). https://archive.ics.uci.edu/datasets
  • 35.Noone, M. & Mooney, A. Visual and textual programming Languages: A systematic review of the literature. J. Comput. Educ.5 (2), 149–174. 10.1007/s40692-018-0101-5 (2018). [Google Scholar]
  • 36.Sun, D. et al. Block-based versus text-based programming: A comparison of learners’ programming behaviors, computational thinking skills and attitudes toward programming. Educ. Tech. Res. Dev.72 (2), 1067–1089. 10.1007/s11423-023-10328-8 (2024). [Google Scholar]
  • 37.Entry. Entry. https://playentry.org/ (2023).
  • 38.Sklearn boston. Scikit-Learn. (2023). https://scikit-learn/stable/modules/generated/sklearn.datasets.load_boston.html
  • 39.Mosquito Activity Index. (2023). https://news.seoul.go.kr/welfare/mosquito
  • 40.Weather data open portal. (2023). https://data.kma.go.kr/data/grnd/selectAsosRltmList.do?pgmNo=36
  • 41.STATIZ. (2023). http://www.statiz.co.kr/main.php
  • 42.Korea Earthquake Information. (2023). https://www.weather.go.kr/w/eqk-vol/search/korea.do
  • 43.Jang, J. Association of mosquito density and climate factors: Mosquito surveillance data in goyang gyeonggi province during 2008–2012 [Master Thesis, Korea University]. (2014). https://www.riss.kr/link?id=T13542322&ssoSkipYN=Y
  • 44.Wilke, A. B. B., Medeiros-Sousa, A. R., Ceretti-Junior, W. & Marrelli, M. T. Mosquito populations dynamics associated with climate variations. Acta Trop.166, 343–350. 10.1016/j.actatropica.2016.10.025 (2017). [DOI] [PubMed] [Google Scholar]
  • 45.Nowok, B., Raab, G. M. & Dibben, C. Synthpop: Bespoke creation of synthetic data in R. J. Stat. Softw.74 (11). 10.18637/jss.v074.i11 (2016).
  • 46.Anderson, J. R., Reder, L. M. & Simon, H. A. Situated learning and education. Educ. Res. 25 (4), 5–11. 10.3102/0013189X025004005 (1996). [Google Scholar]
  • 47.Machmud, M. T., Wattanachai, S. & Samat, C. Constructivist gamification environment model designing framework to improve ill-structured problem solving in learning sciences. Educ. Tech. Res. Dev.71 (6), 2413–2429. 10.1007/s11423-023-10279-0 (2023). [Google Scholar]
  • 48.Giannakoulas, A. & Xinogalos, S. Studying the effects of educational games on cultivating computational thinking skills to primary school students: A systematic literature review. J. Comput. Educ.10.1007/s40692-023-00300-z (2023). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1 (49.7KB, docx)

Data Availability Statement

The datasets from this study can be accessed in the “Create” - “Datasets” menu on the visual programming language platform Entry (https://playentry.org).


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES