Abstract
Student performance is crucial to the success of tertiary institutions. Especially, academic achievement is one of the metrics used in rating top-quality universities. Despite the large volume of educational data, accurately predicting student performance becomes more challenging. The main reason for this is the limited research in various machine learning (ML) approaches. Accordingly, educators need to explore effective tools for modelling and assessing student performance while recognizing weaknesses to improve educational outcomes. The existing ML approaches and key features for predicting student performance were investigated in this work. Related studies published between 2015 and 2021 were identified through a systematic search of various online databases. Thirty-nine studies were selected and evaluated. The results showed that six ML models were mainly used: decision tree (DT), artificial neural networks (ANNs), support vector machine (SVM), K-nearest neighbor (KNN), linear regression (LinR), and Naive Bayes (NB). Our results also indicated that ANN outperformed other models and had higher accuracy levels. Furthermore, academic, demographic, internal assessment, and family/personal attributes were the most predominant input variables (e.g., predictive features) used for predicting student performance. Our analysis revealed an increasing number of research in this domain and a broad range of ML algorithms applied. At the same time, the extant body of evidence suggested that ML can be beneficial in identifying and improving various academic performance areas.
1. Introduction
Student academic performance is the most critical indication of educational advancement in any country. Essentially, students' academic achievement is influenced by gender, age, teaching staff, and students' learning. Predicting student academic success has gained a great deal of interest in education. In other words, student performance refers to the extent to which students achieve both immediate and long-term learning objectives [1]. Excellent academic record is an essential factor for a high-quality university based on its rankings. As a result, its ranking improves when an institution has a strong track record and academic achievements. From the student's perspective, maintaining outstanding academic performance increases the possibilities of securing employment, as excellent academic achievement is one of the primary aspects evaluated by employers [2].
The use of information technology (IT) in education can support institutions to achieve an improved educational outcome. For instance, in learning, artificial intelligence (AI) has a wide range of applications. AI-based technologies in education have grown in popularity to attract attention while improving quality and enhancing traditional teaching methods. For example, it facilitates gathering vast amounts of student data from multiple sources such as web-based education system (WBS) and intelligent tutorial system (ITS). Besides, these technological systems can provide data regarding students' grades, academic progress, online activities, and class attendance. Despite this, it is still challenging for educators to effectively apply these techniques to their specific academic problems due to the high volumes of data and rising complexity. As a result, it becomes difficult to accurately assess students' performance [3]. Therefore, the data obtained should be examined appropriately to identify factors that predict student success in the future.
Predicting and analyzing student performance are critical to assisting educators in recognizing students' weaknesses while helping them improve their grades. Likewise, students can improve their learning activities, and administrators can improve their operations [3, 4]. The timely prediction of student performance allows educators to identify low-performing individuals and intervene early in the learning process to apply the necessary interventions. ML is a novel approach with numerous applications that can make predictions on data [5]. ML techniques in educational data mining aim to model and detect meaningful hidden patterns and useable information from educational contexts [6]. Moreover, in the academic field, the ML approaches are applied to large datasets to represent a wide range of student characteristics as data points. These strategies can benefit various fields by achieving various goals, including extracting patterns, predicting behavior, or identifying trends [7], which allow educators to deliver the most effective methods for learning and to track and monitor the students' progress.
Our study was mainly motivated due to the lack of systematic and comprehensive surveys to assess the prediction of student academic performance using different ML models. Therefore, the main purpose of this work was to survey and summarize the key predictive features and the ML algorithms used to predict students' academic performance. The study's findings support mapping and assessing existing knowledge, research gaps, and future suggestions on further research carried out in this context.
The next section focuses on the methodology used in the systematic survey. Section 2 provides a detailed summary of the results, while Section 4 discusses them. Lastly, the conclusion and future work are outlined in Section 5.
2. Methods and Materials
This work is conducted to assess the main ML algorithms and key attributes in student performance prediction. Several approaches [8–13] were followed, along with various strategies and steps proposed by references [10, 11] in performing this survey work. These include (a) formulation of research questions, (b) eligibility criteria, (c) information source/search strategy, and finally (d) study selection.
2.1. Research Questions
Forming the right research question is important to ascertain the key studies that are related to the prediction of student performance. Steps proposed in reference [13] were followed in order to formulate the right research questions (e.g., PICO framework), which represents the population, intervention, context, and outcome. Table 1 summarizes the criteria of research questions.
Table 1.
PICO criteria | Description |
---|---|
Population | Male/female students; above 17 years; all educational levels. |
Intervention | Machine learning (ML) algorithms. |
Context | Academic institutions; university; college; high school. |
Outcome | Model accuracy; key predictive features and models. |
Accordingly, this work is conducted to answer the following research questions:
Q1: What are the key predictive features used in assessing the student performance?
Q2: What are the key ML algorithms used in the prediction of student performance?
Q3: What are the outcomes and accuracies of those ML algorithms?
2.2. Eligibility Criteria
We included studies that were (a) written in English, (b) published between 2015 and 2021, (c) from both conference proceedings and academic journals, (d) directly related to the prediction student performance focusing on ML, and (e) at any educational levels (Table 1). Furthermore, we excluded studies that were (a) not written in English, (b) in a form of traditional, conceptual, and systematic reviews, (c) other artificial intelligence (AI) methods such as deep learning (DL), and finally (d) not having empirical or experimental data.
2.3. Information Source and Search Strategy
A systematic and comprehensive search was performed to address the formulated research questions. For this objective, six online databases were searched in August 2021, including IEEE Xplore, ACM Digital Library, ScienceDirect, Scopus, Web of Science, and Google Scholar. A follow-up search was conducted at the beginning of October 2021 to identify any recently published works.
We used different terms of keywords, developed by Kitchenham et al. [14], and combined appropriately as follows: “prediction” OR “forecasting” OR “estimation” AND “student performance” OR “student academic performance” OR “academic achievement” OR “academic outcome” AND “machine learning” OR “ML” OR “data mining” OR “educational data mining.”
2.4. Study Selection
Two stages were performed for the screening and selection of the studies. Firstly, the selection of studies was based on the title and abstract screening, with regards to the eligibility criteria. Secondly, the selection of studies was based on a full-text assessment (see Figure 1).
We considered studies for full-text evaluation whenever there were any doubts. Disagreements between co-authors were reached by consensus. Furthermore, EndNote X20 software was utilized to remove duplicates and manage all citations.
Our search yielded 1128 papers. After eliminating duplicates, 767 papers remained. Six hundred of them were excluded based on title and abstract screening. The full text of the remaining 102 articles was considered and evaluated. Of these, 58 failed to meet the inclusion and exclusion criteria. The remaining thirty-nine relevant studies were evaluated for this review. Figure 1 illustrates the screening and selection procedures.
3. Results
3.1. Characteristics of the Included Studies
A total of twenty-six articles (66.7%) were published in academic journals, and thirteen articles (33.3%) were published in conference proceedings.
The number of articles has significantly increased in recent years; this indicates that predicting students' performance through ML methods is attracting the attention of various scholars. As shown in Figure 2, most of the included articles were published between 2018 (n = 9, 23%) and 2019 (n = 14, 35%).
According to the authors' affiliation countries, most published research was from India (n = 13, 33.3%), Saudi Arabia (n = 5, 12.8%), Pakistan (n = 4, 10.6%), and the other countries are between 1 and 2 articles (see Figure 3). Notably, over half of the studies (n = 36, 58%) on academic achievement in higher education analyzed data from an individual university.
Thirty-one percent (n = 14) of the ML methods used in predicting the student performance were artificial neural networks and support vector machine (n = 7, 15%). The remaining articles used decision tree, Naive Bayes, and K-nearest neighbor (n = 6, 13%). Figure 4 represents the distribution of ML approaches used in the prediction. Regarding the classifiers used, most of the selected studies applied only one classifier and did not compare with others methods. Besides, six studies each tested four, three, and two classifiers. The highest number of classifiers used in studies wasten (n = 3). The majority of studies involving ANN mainly used one classifier.
Furthermore, the dataset applied in the studies ranged from 22 ([15]) to 20,000 ([16]). Especially, five studies ([17–21]) did not report the number of datasets used in their experiments. In most studies (n = 34), the datasets were divided and applied in both training and testing phases. However, five studies did not report the stages employed in their experiments.
3.2. Key Attributes Used in Predicting Student Performance
We grouped the attributes into seven categories: demographic, academic, internal assessment, communication, behavioral, psychological, and family/personal attributes (see Table 2). The most frequently used attributes were attendance and CGPA, which fall under the academic group. Twenty out of thirty articles have utilized the academic group to predict the performance of the students. This is because CGPA has significant academic potential.
Table 2.
Attribute category | Attributes | Frequency | Study reference |
---|---|---|---|
Demographic | Gender; age; nationality; place of birth; marital status; guardian; address; transport | 21 | [3, 17, 19, 20, 22–38] |
Academic | CGPA; stage ID; grade ID; section ID; topic; semester; program; attendance; final grade | 20 | [15, 17, 19, 20, 22–27, 30–32, 34, 37, 39–41, 41, 42], References [36–38, 43–45]. |
Internal assessment | Coursework; assignments; quizzes; lab test; midterms; examinations; daily study time; plagiarism counts; virtual learning access; group presentation; personal report | 15 | [3, 15, 18, 19, 21, 36, 37, 39, 40, 42, 43, 46–49] |
Family/personal | Parent status; parent survey; parent satisfaction; family size; parent education; parent job; income; travel time; Study time; free time; health | 12 | [3, 20, 22, 23, 26, 28, 33–37, 39, 50] |
Behavioral | Raised hands; visited resources; announcement view; discussion | 5 | [3, 20, 22, 26, 34, 51] |
Communication | Messages; emails; response time; login/Logout time; time spent; number of words; voting system | 4 | [18, 25, 43, 46] |
Psychological | Personality; motivation; contextual influences; learning strategies; socio economic status; approach to learning | 2 | [40, 52] |
The second most used attributes were gender, age, and nationality, which fall under the demographic group. Eighteen out of thirty-nine articles have used demographic attributes such as gender. The rationale behind thisis because male and female students have different learning styles [53]. Various studies have found that female students possess a more optimistic style of learning, positive attitudes, more discipline, and were self-motivated [54, 55]. Therefore, it is noticeable that gender has more significant influence on academic performance prediction.
Parent's status, survey, satisfaction, education, and income on the contrary, were the third most frequent attributes used in the prediction. These attributes fall under family/personal group, which has been used in eleven articles. Table 2shows the remaining attributes by category, name, and frequency.
3.3. ML Models Used in Predicting Student Performance
Accurate predictive modelling can be achieved by several techniques such as regression, classification, and clustering. However, we observed that classification is one of the most popular techniques used in predicting the academic performance. Several methods under a classifier have been used as listed in Table 3. Among these were artificial neural network (ANN), decision tree (DT), support vector machine (SVM), K-nearest neighbor (KNN), Naive Bayes (NB), and linear regression (LinR). The algorithms are highlighted in the subsections.
Table 3.
Algorithm | Average accuracy (%) | Study |
---|---|---|
Artificial neural network (ANN) | 85.9 | [17, 18, 22–25, 25, 26, 36, 39, 46, 56–58] |
Decision tree (DT) | 85 | [29–31, 36, 41, 59] |
Support vector machine (SVM) | 83.4 | [1, 16, 20, 27, 28, 40, 52] |
K-nearest neighbor (KNN) | 80.7 | [32–35, 43, 50] |
Naive Bayes (NB) | 83 | [3, 15, 19, 42, 49, 60] |
Linear regression (LinR) | 55.5 | [37, 38, 44, 45, 47, 48, 51] |
3.3.1. Decision Tree (DT)
DT is often used due to its clarity and simplicity in discovering and predicting data. Many researchers noted that decision trees are easy to comprehend because they are built on IF-THEN rules [16, 61]. DT was used in six studies. The highest accuracy was 98.2% ([41]), while the lowest accuracy was 66% ([31]). The accuracy results of DT models are listed in Table 4.
Table 4.
Study | Year | Predictive features | Accuracy (%) |
---|---|---|---|
[41] | 2016 | Student ID, graduation GPA, high school score, general aptitude test (GAT), educational attainment test (EAT), and courses | 80 |
[59] | 2019 | Final examination, continuous assessment, schooling marks, quizzes, assignments, class test, and midterm examinations | 98.2 |
[29] | 2019 | Gender, school name, travel time, age, hobbies, health details, and address | 97.9 |
[30] | 2019 | Student demographics, student grades, subjects, school-related information, and social activities | 95.8 |
[31] | 2019 | Gender, age, family size, health, marital status, work status, school grade, university type, faculty type, scholarship, transportation, traveling time, credit hours, study time, and GPA | 66 |
[36] | 2020 | Gender, age, address location, parent job, Travel time, study time, free time, failures, activities, health, and abstance | 72.26 |
3.3.2. Linear Regression (LinR)
Linear regression defines the relationship of two variables through the data's adaptation of the regression line. As listed in Table 5, all seven articles had an average level of accuracy in predicting the student's performance. The highest accuracy level was 76.2% [51], and the lowest was 50% [48] in using LinR models.
Table 5.
Study | Year | Predictive features | Results |
---|---|---|---|
[51] | 2015 | Total playing time, number of videos played, number of rewinds, number of pauses, number of fast forwards, and number of slow play rate use | Accuracy = 76.2% |
[44] | 2016 | Course-specific subdata | RMSE = (0.63, 0.72), Precisition = 26.86%. |
[47] | 2018 | Exercises, homeworks, and quizzes | pMSE = 198.68, pMAPC = 0.81 |
[48] | 2018 | Number of views/post of student, course information, student information, submitted assignments, and progress of assignments | Accuracy = 50% |
[45] | 2018 | Summative evaluation attributes | Accuracy = 69% |
[37] | 2020 | Gender, age, parent education, family size, test preparation, father job, mother job, absent days, parent status, travel time, and academic scores | — |
[38] | 2020 | Final grades | — |
3.3.3. Artificial Neural Networks (ANNs)
The nonlinear and complex interaction between different input and output variables can be solved by using ANNs [62]. Our search yielded fourteen articles that used the ANN approach to predict the academic performance, as shown in Table 6. All ANN models in this work gave good results, with the maximum accuracy of 98.3% [18] and the lowest accuracy of 64.4%.
Table 6.
Study | Year | Predictive features | Accuracy |
---|---|---|---|
[17] | 2015 | Gender, location, type of school, high school score, CGPA, number of credits, and results | 84.6% |
[39] | 2016 | Test mark, class and lab performance, attendance, assignment, study time, previous result, family education, living area, drug addiction, affair, social media, and final year results | 88% |
[18] | 2016 | Online quizzes, email communication, content creation, and content interaction | 98.3% |
[22] | 2018 | Grades, gender, nationality, place of birth, section ID, topic, raised hand, discussion, class in 1st and 2nd terms, attendance, and parent satisfaction | 85.4%, |
[23] | 2018 | Gender, attendance, results, economic status, and parental education | - |
[24] | 2019 | Gender, CGPA, English, Chinese, math, science, and proficiency test | 84.8% |
[25] | 2019 | Gender, content score, time spent, homework score, and attendance | 80.5% |
[46] | 2019 | CourseID, total of learning sessions, length of sessions, total of assessments of semester 1, grades, quizzes, and emails sent | 97.4% |
[26] | 2019 | Gender, nationality, place of birth, StageID, GradeID, SectionID, topic, semester, relation, raised hands, discussion, parent survey and satisfaction, and attendance | 73.5% |
[36] | 2020 | Gender, age, address location, parent job, travel time, study time, free time, failures, activities, health, and abstance | 64.40% |
[56] | 2021 | Gender, region, educational level, age range, neighborhood crime rate (IMD), number of times they have previously participated in the course, enrolled credits, disability, and the final exam result (passed/failed). In addition, the number of times the student has interacted with any of the online course contents has been counted throughout the courses | 78.20% |
[63] | 2020 | Gender, content score, time spent, number of entries to content, homework score, attendance, and archived courses | 80.47% |
[57] | 2021 | 123 variables | 82.10% (high) 70.89% (low) |
[58] | 2021 | 116 features for the production and 84 for the learning phase | 80.76% and 86.57% |
3.3.4. Naive Bayes (NB)
Naive Bayes is highly scalable and requires several linear attributes to learn certain problems. We found six articles that applied the NB method in predicting the academic performance. The highest accuracy was 96.9% [49] and the lowest was 65.1% [42]). Table 7shows the accuracy results of NB methods.
Table 7.
Study | Year | Predictive features | Accuracy (%) |
---|---|---|---|
[42] | 2015 | Attendance, internal grade, computer skills, school level, mobile, tuition, type of school, type of board, and gender | 65.1 |
[3] | 2016 | Age, section, program, method, place of birth, transport, subject, motivation level, homework, tuition, parent education, attendance, communication, GPA, quiz, assignment, lab test, and final exam | 86 |
[60] | 2017 | List of subjects and grades | 83.6 |
[19] | 2018 | Gender, age, admission, attendance, study mode, program, education status, book resources, and quiz | 72.4 |
[15] | 2018 | CGPA, high risk, coursework, examination, plagiarism count, campus access, and off-campus access | 90 |
[49] | 2015 | Number of views/post of student, course information, student information, submitted assignments, and progress of assignments | 96.9 |
3.3.5. K-Nearest Neighbor (KNN)
KNN stores and classifies classes based on a certain measure of similarity, such as distance function. As listed in Table 8, all six articles produced a high level of accuracy in predicting the student's performance. Notably, the highest accuracy was 95.8% [50], and the lowest was 69% [42].
Table 8.
Study | Year | Predictive features | Accuracy (%) |
---|---|---|---|
[32] | 2017 | Gender, age, knowledge score, skill score, CGPA, group heterogeneity, and label class | 95.5 |
[33] | 2017 | School, gender, address, family size, parent status, parent job, guardian, support, activities, nursery, internet, and romantic relationship | 93 |
[50] | 2018 | Parent income, semester, family members, and CGPA | 95.8 |
[34] | 2019 | Nationality, gender, place of birth, parent responsibility, stages, grades, SectionID, topic, attendance, semester, raised hand, visited resource, discussion, and parent satisfaction | 69 |
[35] | 2019 | Gender, age, school, address, parent status, parent education, parent job, family size, guardian, travel time, and study time | 88 |
[43] | 2020 | Absence, virtual learning access, voting system result, presentation result, and personal report result | 74 |
3.3.6. Support Vector Machine (SVM)
SVM is suitable for handling small datasets and has a greater generalization ability compared with other methods. Our search yielded seven articles that used the SVM approach. The maximum accuracy of the seven studies was 91.3% [40], and the lowest accuracy was 66% [20]. Futhermore, the accuracy of SVM is presented in Table 9.
Table 9.
Study | Year | Predictive features | Accuracy (%) |
---|---|---|---|
[40] | 2016 | Attendance, class time, class length, instructor knowledge, instructor appearance, performance, assignments, exams, course materials, communication, motivation, learning outcomes, and grades | 91.3 |
[16] | 2018 | Specialization, subject, programming skills, analytical skills, personal details, memory, workshops, certifications, and sports | 90.3 |
[27] | 2019 | Gender, race, grades, and subjects | 77 |
[20] | 2019 | Gender, nationality, place of birth, relation, StageID, SectionID, GradeID, topic, semester, raised hands, visited resources, announcement view, discussion, parent satisfaction, and attendance | 66 |
[52] | 2019 | Motivation, personality, learning strategies, socio-economic status, learning approach, and psychosocial influences | 90 |
[28] | 2019 | Performance, subjects, parental status, family size, location, and address | 79.4 |
[36] | 2020 | Gender, age, address location, parent job, Travel time, study time, free time, failures, activities, health, and abstance | 71.2 |
Figure 5 illustrates the level of accuracy achieved by each approach in predicting student performance from 2015 to 2021. The maximum level of accuracy was achieved by using ANN models (98.3%).
The DTon the contrary, produced the second-highest accuracy (98.2%), followed by NB (97%) and KNN (95.8%). Furthermore, SVM, produced an accuracy of 91.3%. While, LinR had the lowest prediction accuracy compared to other methods (76%).
4. Discussions
This systematic survey focused on the existing ML techniques and critical variables used in predicting the academic performance of students, as well as the most accurate prediction algorithms. Table 3shows the prediction accuracy using classification methods grouped by algorithms for all selected studies from 2015 to 2021. Based on the data gathered in this work, supervised learning was the most extensively employed technique for predicting student performance, as it produces accurate and consistent findings. The ANN model, for instance, was the most widely applied by various scholars in fourteen studies and delivered the most reliable predictions. Furthermore, SVM, DT, LR, NB, and RF were well-studied algorithmic methods that produced good results. Similar to reference [64], unsupervised learning remains an unappealing approach for researchers, given their low accuracy in predicting students' performance in the current literature.
ANN demonstrated a remarkable accuracy (98.3%) in predicting student performance when combined with critical variables such as CGPA, gender, age, parent status, parent income, and family size. As a result, family status, parent's income, and family size can significantly affect student achievement. The DT is rated second with an average performance accuracy of 98.2%. GPA, grades, and demographics are the factors that led to the highest accuracy in predicting students' success in most of the studies that used DT. It can be concluded that DT can handle both forms of data and perform well in massive datasets, and the relationship between variables is simple to understand [65, 66].
NB has a performance accuracy of about 97%. According to these findings, demographic and academic characteristics are the best predictors of students' academic achievements, utilizing this approach. As a result, while using NB to predict student academic success, criteria such as gender, grades, results, and attendance should be addressed. The relevant variables included assignment course/subject and grades, while KNN had an average accuracy of 95%. The grade variable appears in ANN and DT as well. When applying Naive Bayes as a prediction method, the attributes used were significant. Furthermore, SVM has a performance accuracy of around 91%. From our analysis, the most appropriate attributes for predicting students' academic achievement using SVM are motivation, personality, learning tactics, and results. These criteria are considered significant in determining student academic success.
Finally, the method with the lowest prediction accuracy, with an average of 76%, was linear regression. Even though multiple factors were used in several studies, no significant variableswere identified. Gender, age, and final grades used in LinRstudies were also employed in KNN, DT, ANN, and NB. We presume that age and final grades were significant predictors of student performance.
To sum, prediction accuracy is determined by the traits or features employed throughout the prediction process [2]. As a result, we assume that ANN and DT approaches provided the best prediction accuracy due to the influence of primary qualities. According to earlier research [2], the CGPA factor increased accuracy in forecasting students' performance using the DT approach. Although the work of [15] has demonstrated that additional factors can influence a student's CGPA, more research is needed to identify the factors that substantially impact the CGPA. Academic features were the most commonly used variables, obtaining a score of 81% accuracy. It demonstrates that summative performance criteria such as CGPA, final grades, program, attendance, and topic are essential in forecasting student performance. This varies from a recent review by [64], revealing that GPA scores or ranges were employed less frequently in studies predicting student performance despite its importance.
5. Conclusions
Student performance prediction can assist educators in identifying student deficiencies towards improving their scores and enhancing learning. This study aimed to look at the latest ML algorithms and variables used to predict student academic performance. In our analysis, we identified 39studies from 2015 to 2021. Accordingly, the study findings showed a considerable rise of studies in this context recently. Furthermore, academics variables (e.g., CGPA and attendance), internal evaluations (e.g., quiz and assignment), demographics (e.g., gender), and family/personal characteristics significantly affect the prediction of students' performance.
Based on performance metrics, we conclude that the KNN classifier is an outstanding predictor of student achievement, followed by the DT technique. Predicting student academic achievement with high accuracy, on the other hand, demands a thorough grasp of the aspects and characteristics influencing student achievement. Given this, it is demonstrated that there are numerous potential areas for improvement in the design of the measurement devices used in instructor performance evaluation. Overall, this is still a developing subject, and future studies are expected to include more algorithms for greater accuracy.
Our analysis suggests that first, a new set of inputs and a more robust and extensive dataset are necessary for greater accuracy. Second, it is suggested that data to be gathered from multiple institutions to combine the environment-dependent qualities are not addressed in the extant literature. Third, for a more efficient classification technique, improving the ideal selection of qualities is necessary based on their connection. Finally, to thoroughly assess a model's performance, precision and recall need to be measured.
Acknowledgments
The authors extend their appreciation to the Deputyship for Research & Innovation, Ministry of Education in Saudi Arabia for funding this research work through the project number “IF-2020-102”.
Data Availability
The data supporting this review are from previously reported studies and datasets, which have been cited. The processed data are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that there are no conflicts of interest.
References
- 1.Yadav S. K., Pal S. Data mining: a prediction for performance improvement of engineering students using classification. 2012;2(2):51–56. http://arxiv.org/abs/1203.3832 . [Google Scholar]
- 2.Shahiri A. M., Husain W., Rashid N. a. A. A review on predicting student’s performance using data mining techniques. Procedia Computer Science . 2015;72:414–422. doi: 10.1016/j.procs.2015.12.157. [DOI] [Google Scholar]
- 3.Mueen A., Zafar B., Zafar B., Manzoor U. Modeling and predicting students’ academic performance using data mining techniques. International Journal of Modern Education and Computer Science . 2016;8(11):36–42. doi: 10.5815/ijmecs.2016.11.05. [DOI] [Google Scholar]
- 4.Ashraf A., Anwer S., Khan M. G. Am. Sci. Res. J. Eng. Technol. Sci, 1. Vol. 44. October: 2018. A comparative study of predicting student ’ s performance by use of data A comparative study of predicting student ’ s performance by use of data mining techniques; pp. 122–136. [Google Scholar]
- 5.Kushwaha S., Bahl S., Bagha A. K., et al. Significant applications of machine learning for covid-19 pandemic. Journal of Industrial Integration and Management . 2020;05(04):453–479. doi: 10.1142/S2424862220500268. [DOI] [Google Scholar]
- 6.Salah Hashim A., Akeel Awadh W., Khalaf Hamoud A. Student performance prediction model based on supervised machine learning algorithms. IOP Conference Series: Materials Science and Engineering . 2020;928(3) doi: 10.1088/1757-899X/928/3/032019.032019 [DOI] [Google Scholar]
- 7.Gray C. C., Perkins D. Utilizing early engagement and machine learning to predict student outcomes. Computers & Education . 2019;131:22–32. doi: 10.1016/j.compedu.2018.12.006. [DOI] [Google Scholar]
- 8.Romero C., Ventura S. Educational data mining: a review of the state of the art. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) . 2010;40(6):601–618. doi: 10.1109/TSMCC.2010.2053532. [DOI] [Google Scholar]
- 9.Magdalene Delighta Angeline D. Association rule generation for student performance analysis using apriori algorithm. SIJ Trans. Comput. Sci. Eng. its Appl, . 2013;1(1):12–16. doi: 10.9756/sijcsea/v1i1/01010252. [DOI] [Google Scholar]
- 10.Baashar Y., Alkawsi G., Mustafa A., et al. Toward predicting student’s academic performance using artificial neural networks (ANNs) Applied Sciences . 2022;12:p. 1289. doi: 10.3390/app12031289. [DOI] [Google Scholar]
- 11.Baashar Y., Alhussian H., Patel A., et al. Customer relationship management systems (CRMS) in the healthcare environment: a systematic literature review. Computer Standards & Interfaces . 2020;71(April) doi: 10.1016/j.csi.2020.103442.103442 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Dixon-Woods M., Agarwal S., Jones D., Young B., Sutton A. Synthesising qualitative and quantitative evidence: a review of possible methods. Journal of Health Services Research and Policy . 2005;10(1):45–53. doi: 10.1177/135581960501000110. [DOI] [PubMed] [Google Scholar]
- 13.Moher D., Liberati A., Tetzlaff J., Altman D. G., Group P. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. BMJ . 2009;339(jul21 1) doi: 10.1136/bmj.b2535. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Kitchenham B., Pretorius R., Budgen D., et al. Systematic literature reviews in software engineering - a tertiary study. Information and Software Technology . 2010;52(8):792–805. doi: 10.1016/j.infsof.2010.03.006. [DOI] [Google Scholar]
- 15.Hasan R., Palaniappan S., Raziff A. R. A., Mahmood S., Sarker K. U. Student academic performance prediction by using decision tree algorithm. Proceedings of the 2018 4th International Conference on Computer and Information Sciences (ICCOINS); August 2018; Kuala Lumpur, Malaysia. pp. 1–5. [DOI] [Google Scholar]
- 16.Sripath Roy K., Roopkanth K., Uday Teja V., Bhavana V., Priyanka J. Student career prediction using advanced machine learning techniques. International Journal of Engineering & Technology . 2018;7(2.20):26–29. doi: 10.14419/ijet.v7i2.20.11738. [DOI] [Google Scholar]
- 17.Naser S. A., Zaqout I., Ghosh M. A., Atallah R., Alajrami E. Predicting student performance using artificial neural network: in the faculty of engineering and information technology. International Journal of Hospitality Information Technology . 2015;8(2):221–228. doi: 10.14257/ijhit.2015.8.2.20. [DOI] [Google Scholar]
- 18.Zacharis N. Z. Predicting student academic performance in blended learning using artificial neural networks. International Journal of Artificial Intelligence & Applications . 2016;7(5):17–29. doi: 10.5121/ijaia.2016.7502. [DOI] [Google Scholar]
- 19.Helal S., Li J., Liu L., et al. Predicting academic performance by considering student heterogeneity. Knowledge-Based Systems . 2018;161:134–146. doi: 10.1016/j.knosys.2018.07.042. [DOI] [Google Scholar]
- 20.Francis B. K., Babu S. S. Predicting academic performance of students using a hybrid data mining approach. Journal of Medical Systems . 2019;43(6):6. doi: 10.1007/s10916-019-1295-4. [DOI] [PubMed] [Google Scholar]
- 21.Kumar V., Garg M. L. Comparison of Machine Learning Models in Student Result Prediction . Vol. 870. Singapore: Springer; 2019. [Google Scholar]
- 22.Mondal A., Mukherjee J. An approach to predict a student’s academic performance using recurrent neural network (RNN) International Journal of Computers and Applications . 2018;181(6):1–5. doi: 10.5120/ijca2018917352. [DOI] [Google Scholar]
- 23.Arunachalam A., Velmurugan T. Analyzing student performance using evolutionary artificial neural network algorithm. International Journal of Engineering and Technology . 2018;7(26):67–73. doi: 10.14419/ijet.v7i2.26.12537. [DOI] [Google Scholar]
- 24.Lau E. T., Sun L., Yang Q. Modelling, prediction and classification of student academic performance using artificial neural networks. SN Applied Sciences . 2019;1(9):1–10. doi: 10.1007/s42452-019-0884-7. [DOI] [Google Scholar]
- 25.Aydoğdu Ş. Predicting student final performance using artificial neural networks in online learning environments. Education and Information Technologies . 2020;25(3):1913–1927. doi: 10.1007/s10639-019-10053-x. [DOI] [Google Scholar]
- 26.Li F., Zhang Y., Chen M., Gao K. Which factors have the greatest impact on student’s performance. Journal of Physics: Conference Series . 2019;1288(1):012077–1. doi: 10.1088/1742-6596/1288/1/012077. [DOI] [Google Scholar]
- 27.Erickson V. L. Data-driven models to predict student performance and improve advising in computer science. Proceedings of the 2019 International Conference on Frontiers in Education: Computer Science & Computer Engineering (FECS’19); July 2019; Las Vegas, NV, USA. [Google Scholar]
- 28.Sekeroglu B., Dimililer K., Tuncal K. Student performance prediction and classification using machine learning algorithms. Proceedings of the 2019 8th International Conference on Educational and Information Technology; March 2019; Cambridge, UK. pp. 7–11. Part F148151. [DOI] [Google Scholar]
- 29.Deepika K., Sathyanarayana N., Sathyanarayana N. Relief-F and budget tree random forest based feature selection for student academic performance prediction. International Journal of Intelligent Engineering and Systems . 2019;12(1):30–39. doi: 10.22266/IJIES2019.0228.04. [DOI] [Google Scholar]
- 30.Imran M., Latif S., Mehmood D., Shah M. S. Student academic performance prediction using supervised learning techniques. International Journal of Emerging Technologies in Learning (iJET) . 2019;14(14):92–104. doi: 10.3991/ijet.v14i14.10310. [DOI] [Google Scholar]
- 31.Alsalman Y. S., Khamees Abu Halemah N., Alnagi E. S., Salameh W. Using decision tree and artificial neural network to predict students academic performance. Proceedings of the 2019 10th International Conference on Information and Communication Systems (ICICS); June 2019; Irbid, Jordan. pp. 104–109. [DOI] [Google Scholar]
- 32.Assegaf B. Student academic performance prediction on problem based learning using support vector machine and K-nearest neighbor. J. Telemat. Informatics . 2017;5(1):22–28. doi: 10.12928/jti.v5i1.22-28. [DOI] [Google Scholar]
- 33.Al-Shehri H., Al-Qarni A., Al-Saati L., et al. Student performance prediction using support vector machine and K-nearest neighbor. Proceedings of the 2017 IEEE 30th Canadian Conference on Electrical and Computer Engineering (CCECE); May 2017; Windsor, ON, Canada. pp. 17–20. [DOI] [Google Scholar]
- 34.Vijayalakshmi V., au fnm, Venkatachalapathy K. Comparison of predicting student’s performance using machine learning algorithms. International Journal of Intelligent Systems and Applications . 2019;11(12):34–45. doi: 10.5815/ijisa.2019.12.04. [DOI] [Google Scholar]
- 35.Turabieh H. Hybrid machine learning classifiers to predict student performance. Proceedings of the 2019 2nd International Conference on new Trends in Computing Sciences (ICTCS); October 2019; Amman, Jordan. pp. 1–6. [DOI] [Google Scholar]
- 36.Iqbal M., Herliawan I., Ridwansyah R., et al. Implementation OF particle swarm optimization based machine learning algorithm for student performance prediction. JITK (Jurnal Ilmu Pengetah. dan Teknol. Komputer) . 2021;6(2):195–204. [Google Scholar]
- 37.Sravani B., Bala M. M. Prediction of student performance using linear regression. Proceedings of the 2020 International Conference for Emerging Technology (INCET); June 2020; Belgaum, India. pp. 1–5. [DOI] [Google Scholar]
- 38.Alshanqiti A., Namoun A. Predicting student performance and its influential factors using hybrid regression and multi-label classification. IEEE Access . 2020;8 doi: 10.1109/ACCESS.2020.3036572.203827 [DOI] [Google Scholar]
- 39.Sikder M. F., Uddin M. J., Halder S. Predicting students yearly performance using neural network: a case study of BSMRSTU. Proceedings of the 2016 5th International Conference on Informatics, Electronics and Vision (ICIEV); May 2016; Dhaka, Bangladesh. pp. 524–529. [DOI] [Google Scholar]
- 40.Agaoglu M. Predicting instructor performance using data mining techniques in higher education. IEEE Access . 2016;4:2379–2387. doi: 10.1109/ACCESS.2016.2568756. [DOI] [Google Scholar]
- 41.Altujjar Y., Altamimi W., Al-Turaiki I., Al-Razgan M. Predicting critical courses affecting students performance: a case study. Procedia Computer Science . 2016;82:65–71. doi: 10.1016/j.procs.2016.04.010. [DOI] [Google Scholar]
- 42.Kaur P., Singh M., Josan G. S. Classification and prediction based data mining algorithms to predict slow learners in education sector. Procedia Computer Science . 2015;57:500–508. doi: 10.1016/j.procs.2015.07.372. [DOI] [Google Scholar]
- 43.Wakelam E., Jefferies A., Davey N., Sun Y. The potential for student performance prediction in small cohorts with minimal available attributes. British Journal of Educational Technology . 2020;51(2):347–370. doi: 10.1111/bjet.12836. [DOI] [Google Scholar]
- 44.Polyzou A., Karypis G. Grade prediction with models specific to students and courses. International Journal of Data Science and Analytics . 2016;2(3-4):159–171. doi: 10.1007/s41060-016-0024-z. [DOI] [Google Scholar]
- 45.Eagle M., Carmichael T., Stokes J., Blink M. J., Stamper J., Levin J. Predictive student modeling for interventions in online classes. Proceedings of the 11th International Conference on Educational Data Mining, EDM 2018; July 2018; Buffalo, NY, USA. [Google Scholar]
- 46.Altaf S., Soomro W., Rawi M. I. M. Student performance prediction using multi-layers artificial neural networks. Proceedings of the 2019 3rd International Conference on Information System and Data Mining - ICISDM 2019; April 2019; University of Houston, Houston, TX, USA. pp. 59–64. [DOI] [Google Scholar]
- 47.Yang S. J. H., Lu O. H. T., Huang A. Y. Q., Huang J. C. H., Ogata H., Lin A. J. Q. Predicting students’ academic performance using multiple linear regression and principal component analysis. Journal of Information Processing . 2018;26:170–176. doi: 10.2197/ipsjjip.26.170. [DOI] [Google Scholar]
- 48.Nguyen V. A., Nguyen Q. B., Nguyen V. T. A model to forecast learning outcomes for students in blended learning courses based on learning analytics. Proceedings of the 2nd International Conference on E-Society, E-Education and E-Technology - ICSET 2018; August 2018; Taipei, Taiwan. pp. 35–41. [DOI] [Google Scholar]
- 49.Vasić D., Kundid M., Pinjuh A., Šerić L. Predicting student’s learning outcome from Learning Management system logs. Proceedings of the 2015 23rd International Conference on Software, Telecommunications and Computer Networks (SoftCOM); September 2015; Split, Croatia. pp. 210–214. [Google Scholar]
- 50.Kurniadi D., Abdurachman E., Warnars H. L. H. S., Suparta W. The prediction of scholarship recipients in higher education using k-Nearest neighbor algorithm. IOP Conference Series: Materials Science and Engineering . 2018;434(1) doi: 10.1088/1757-899X/434/1/012039.012039 [DOI] [Google Scholar]
- 51.Guo S., Wu W. Modeling student learning outcomes in MOOCs. Proceedings of the 4th International Conference on Teaching, Assessment, and Learning for Engineering; December 2015; Zhuhai, China. pp. 105–133. [Google Scholar]
- 52.Burman I., Som S. Predicting students academic performance using support vector machine. Proceedings of the 2019 Amity International Conference on Artificial Intelligence (AICAI); 2019; pp. 756–759. [DOI] [Google Scholar]
- 53.Bin Mat U., Buniyamin N., Arsad P. M., Kassim R. An overview of using academic analytics to predict and improve students’ achievement: a proposed proactive intelligent intervention. Proceedings of the 2013 IEEE 5th Conference on Engineering Education (ICEED); February 2013; Dubai, United Arab Emirates. pp. 126–130. [DOI] [Google Scholar]
- 54.Meit S. S., Borges N. J. B. Cubic, and hugo seibel, “personality differences in incoming male and female medical students. Contemp. Educ. Technol, . 2007;9152(304):1–6. [Google Scholar]
- 55.Simsek A., Balaban J. Learning strategies of successful and unsuccessful university students. Contemporary Educational Technology . 2010;1(1):36–45. doi: 10.30935/cedtech/5960. [DOI] [Google Scholar]
- 56.Rivas A., González-Briones A., Hernández G., Prieto J., Chamoso P. Artificial neural network analysis of the academic performance of students in virtual learning environments. Neurocomputing . 2021;423:713–720. doi: 10.1016/j.neucom.2020.02.125. [DOI] [Google Scholar]
- 57.Rodríguez-Hernández C. F., Musso M., Kyndt E., Cascallar E. Artificial neural networks in academic performance prediction: systematic implementation and predictor evaluation. Computers & Education: Artificial Intelligence . 2021;2:p. 100018. doi: 10.1016/j.caeai.2021.100018.100018 [DOI] [Google Scholar]
- 58.Giannakas F., Troussas C., Voyiatzis I., Sgouropoulou C. A deep learning classification framework for early prediction of team-based academic performance. Applied Soft Computing . 2021;106 doi: 10.1016/j.asoc.2021.107355.107355 [DOI] [Google Scholar]
- 59.Gupta J., Garg K. Reflections on blended learning in management education: a qualitative study with a push-pull migration perspective. FIIB Business Review . 2021 doi: 10.1177/23197145211013686.231971452110136 [DOI] [Google Scholar]
- 60.Asif R., Merceron A., Ali S. A., Haider N. G. Analyzing undergraduate students’ performance using educational data mining. Computers & Education . 2017;113:177–194. doi: 10.1016/j.compedu.2017.05.007. [DOI] [Google Scholar]
- 61.Quadri M., Kalyankar N. V. Drop out feature of student data for academic performance using decision tree techniques. Glob. J. Comput. . 2010;10(2):2–5. http://computerresearch.org/stpr/index.php/gjcst/article/viewArticle/128 . [Google Scholar]
- 62.Arsad P. M., Buniyamin N., Manan J.-l. A. A neural network students’ performance prediction model (NNSPPM). Proceedings of the 2013 IEEE International Conference on Smart Instrumentation, Measurement and Applications (ICSIMA); November 2013; Kuala Lumpur, Malaysia. pp. 26–27. [DOI] [Google Scholar]
- 63.Aydoğdu Ş. Predicting student final performance using artificial neural networks in online learning environments. Education and Information Technologies . 2020;25(3):1913–1927. [Google Scholar]
- 64.Rastrollo-Guerrero J. L., Gómez-Pulido J. A., Durán-Domínguez A. Analyzing and predicting students’ performance by means of machine learning: a review. Applied Sciences . 2020;10(3):p. 1042. doi: 10.3390/app10031042. [DOI] [Google Scholar]
- 65.Mishra S., Mallick P. K., Tripathy H. K., Bhoi A. K., González-Briones A. Performance evaluation of a proposed machine learning model for chronic disease datasets using an integrated attribute evaluator and an improved decision tree classifier. Applied Sciences . 2020;10(22):8137–8235. doi: 10.3390/app10228137. [DOI] [Google Scholar]
- 66.Mayilvaganan M., Kalpanadevi D. Comparison of classification techniques for predicting the cognitive skill of students in education environment. Proceedings of the 2014 IEEE International Conference on Computational Intelligence and Computing Research; December 2014; Coimbatore, India. pp. 113–118. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data supporting this review are from previously reported studies and datasets, which have been cited. The processed data are available from the corresponding author upon request.