Abstract
This study aims to employ the supervised machine learning algorithms to examine factors that negatively impacted academic performance among college students on probation (underperforming students). We used the Knowledge Discovery in Databases (KDD) methodology on a sample of N = 6514 college students spanning 11 years (from 2009 to 2019) provided by a major public university in Oman. We used the Information Gain (InfoGain) algorithm to select the most effective features and ensemble methods to compare the accuracy with more robust algorithms, including Logit Boost, Vote, and Bagging. The algorithms were evaluated based on the performance evaluation metrics such as accuracy, precision, recall, F-measure, and ROC curve, and then validated using 10-folds cross-validation. The study revealed that the main identified factors affecting student academic achievement include study duration in the university and previous performance in secondary school. Based on the experimental results, these features were consistently ranked as the top factors that negatively impacted academic performance. The study also indicated that gender, estimated graduation year, cohort, and academic specialization significantly contributed to whether a student was under probation. Domain experts and other students were involved in verifying some of the results. The theoretical and practical implications of this study are discussed.
Keywords: Data Mining, Education Data Mining, Predictive models, Supervised learning, Higher education, Student Academic performance, Academic under probation, Oman
Introduction
One of the strategic directions of Oman’s 2040 vision is to develop national competencies and capabilities by preparing the citizens with high-level scientific and practical competencies (Belwal et al., 2020; Al-Kindi & Al-Khanjari, 2020; Oman 2040 vision, 2020; Al-Busaidi et al., 2022). The Oman 2040 vision aims to equip students with the required knowledge and skills so that they develop the confidence to face global challenges in light of rapid technological developments in various life aspects (Hammad & Al-Harthi, 2021; Anil & Batdi, 2022; Al Muqarshi, 2022; Shoyukhi et al., 2022). To achieve this goal, the challenges of the educational sector should be identified, studied, and analyzed to improve the quality of university graduates. One of the key issues facing the educational sector is the rise in the number of students on academic probation (Bowman & Jang, 2022). Underperforming students on probation stimulated several researchers to document implications on students, higher educational institutions, decision-makers, and overall national strategies (Al Hamdi & Edakkalayil, 2022). Thus, it is crucial to study and evaluate the academic performance of students to highlight and predict the significant factors that cause poor academic performance. While several studies have used machine learning (ML) to predict academic performance, few studies aim to explore the struggling and underperforming students categorized as on probation. Moreover, previous studies that employed ML in the educational data mining (EDM) did not employ ensemble algorithms. They considered only individual algorithms and ignored the strengths of combing multiple algorithms to work together to tackle a certain problem. In this study, we propose understanding the prediction power using ensemble algorithms.
Educational Data Mining (EDM) is a relatively new field that studies the application of data mining techniques in the educational field (Wook et al., 2017; Nilashi et al., 2022; Kumar & Sharma, 2017). The EDM approach is characterized by four main themes, including prediction of students’ performance, aiding instructors and students to support their decisions based on analyzed data, defining the learning progress of the students, and differentiating different algorithms used to achieve optimized results (Hussain et al., 2021; Du et al., 2020; Al-Emran et al., 2022). One of the key issues already addressed within EDM research is student retention. This is because student retention reflects the quality, reputation, and overall performance of higher educational institutions. College dropouts may be due to the major selection, probation for more than one semester, marriage, leaving the higher education institution, financial challenges (Shyamala, 2008), and language challenges (Al-Mahrouqia & Karadsheh, 2016).
Other documented factors include educational background, student interaction with their instructors and peers, and prior academic and family-related issues (Zhu et al., 2022). The current study aims to understand the most critical factors affecting the performance of students under probation to help educators prioritize interventions to help students graduate on time. The justification for focusing on students under probation is due to the increase in the number of students on probation in 2019 compared to the previous four years (see Fig. 1). As illustrated, Fig. 1 presents the statistics obtained from the Deanship of Admission and Registration related to the business college at Sultan Qaboos University between 2009 and 2019. In the first three years, the number of students on probation increased rapidly from 2735 cases in 2009 to 3269 in 2011. This number reached its peak in 2012 when the number of students on probation reached 3432. From 2013 to 2015, the number of students on probation fluctuated around 3300 cases. In 2019, the number of students on probation increased again to 3115 after staying steady between 2015 and 2018.
Fig. 1.
Students on probation for academic years of 2009–2019 (SQU Annual Statistics Book, 2019–2020)
Students who fall on academic probation must return to their normal status within two academic semesters to avoid academic dismissal (SQU Academic Procedure, 2020). Thus, this paper highlights the factors that lead to poor academic performance. Recognizing the common predictors of academic probation might help decision-makers to take the necessary steps to help students engage in the proper academic recovery process. Further, identifying the most critical factors that reduce academic performance might increase student awareness of these factors to maintain their progress toward graduation.
Whether machine learning (ML) can predict student performance effectively is no longer a question of debate: machine learning models have shown significant prediction power, and the role of ML is undeniable (Sekeroglu et al., 2021). However, ML models require further development, especially in data selection and analysis (Sekeroglu et al., 2021). Calls in the literature to further develop procedural steps in ML modeling and data selection were proposed due to the shortcomings in previous studies.
In reviewing the literature, we noticed that the earlier studies that used ML to predict student performance relied on few algorithms with a limited sample size, single-point data collection (e.g., one year) or limited methods for model evaluation. For example, previous work used limited algorithms such as linear support vector machines (SVM) (e.g., Naicker et al., 2020), Artificial Neural Networks (ANN) (Al-Sharafi et al., 2022) and SVM (Hussain et al., 2019), J48 DT (Imran et al., 2019), DNN (Waheed et al., 2020), transfer learning and DNN (Tsiakmaki et al., 2020), RNN and SVM (Wang et al., 2020), and Fast Correlation Based Filter (FCBF) and SVM (Zaffar et al., 2020). Moreover, the previous studies relied on limited model evaluation estimates such as ROC area under the curve (AUC) (e.g., Naicker et al., 2020, Xing and Du, 2018) or accuracy and recall parameters (Imran et al., 2019; Naicker et al., 2020; Tsiakmaki et al., 2020; Wang et al., 2020).
Due to these limitations, Yakubu and Abubakar (2022) called future researchers to use multiple algorithms for comparison purposes, such as logistic regression and other ML algorithms. They went on to suggest that future studies should include more variables such as demographic features. To address these gaps, the current study contributes to the body of literature by using several variables (e.g., gender, specialization, study duration, age, cohort, graduation, etc.) along with multiple algorithms for comparison (e.g., J48, Random Forest, Random Tree, Naïve Bayes, Multi Perceptron, SVM, k-nearest neighbors (K-NN), etc.) and several evaluation parameters (e.g., precision, recall, ROC).
The contribution of this study is classified into two types: theoretical and practical contribution. Concerning the first type, this study contributed positively in the form of a theoretical contribution to the educational data mining field by identifying the factors associated with academic probation of undergraduates at SQU from fall 2009 to summer 2019. For the second type, four essential practical contributions were proposed. First, it was recommended to guide policymakers to improve the design of educational policies based on the identified factors. Second, to provide recommendations for university faculty members to aid students at risk of low academic performance and those already under academic probation to overcome the situation swiftly and efficiently. Third, the current study provides recommendations for students to prevent low academic achievements in the university. Finally, a reusable model that can be used to predict student probation statuses by the university at any point in time is also presented. Based on the prediction outputs of the model, the associated educational stakeholders can make the required decisions accordingly.
The structure of the paper is as follows. The following section reviews the literature-related work. The third section examines the algorithms used in the experiments, followed by the research methodology and dataset description. We will then describe the results, discussion, and research contributions. Finally, we conclude the study with research limitations and further research directions.
Related work
Vast technological advances have changed our daily activities. In all sectors, many processes and systems have been automated and digitalized to cope with these transformations, including the educational field (Maqableh et al., 2021). The implementation of such technologies revealed a vast amount of data. These data cannot be easily analyzed with traditional statistical approaches built with many distribution and statistical assumptions, but data mining techniques are robust against these assumptions and limitations. Hence, applying data mining techniques in the educational field will help discover hidden factors within the complex datasets and help decision-makers make decisions accordingly.
Students’ performance was analyzed through machine learning to support decision-makers in the admission process. Ibrahim and Al-Barwani (1993) investigated the validity of using secondary school certifications to predict student academic performance in higher education. The study targeted 606 students from the 1988–1989 cohort and 625 students from the 1989–1990 cohort. The study revealed that factors like gender, college, major, motivation, and tendencies affect the overall student academic achievement. However, the researchers insisted that it is unfair to consider secondary school certification as the only predictor for academic achievements.
AlGhanboosi and Kadhim (2004) studied the problems of academic supervision from the instructors’ and students’ perspectives. The study’s sample size was 300 students and 105 faculty members from different colleges. The study revealed that the students face many difficulties in the university, most crucial of which are related to academic advising. Moreover, the research claimed that the faculty members of social colleges faced challenges in academic advising and other difficulties compared to their colleagues in science colleges.
Moosa and Ibrahim (2008) focused on the causes behind falling under probation from students’ perspectives, highlighting the difficulties they face, their reactions, and how to overcome these challenges. The study sample consisted of 61 male and female students. Moosa and Ibrahim (2008) found that the most crucial factors leading to probation were the inability of the students to acclimatize to a new university lifestyle, a lack of proper learning style, absenteeism, the failure to manage the time effectively, and an inability to assume responsibility. The researchers claimed that these factors were due to students’ families who did not raise their children to take responsibility or act independently. In addition, academic supervision for those students was not conducted effectively. Therefore, they suggested to implement awareness sessions for students, academic faculty, and academic supervisors that discussed these factors and how to overcome them.
Another study conducted by AlHarthi et al. (2011) predicted students’ difficulties outside the university campus, considering some demographic variables. The study targeted a random sample of 597 students outside a university campus. It concluded that the most significant challenges were high rent prices, low student allowance insufficient for covering student necessities, transportation, commuting from and to the university, as well as other health, academic, and social challenges.
With a sample size of 620, Al-Mahrouqia and Karadsheh (2016) investigated personal and academic reasons, university challenges, and students’ justification for falling under probation. English language difficulties, an inability to study well before the exams, lack of time management skills and difficulties in focusing and comprehension during the lectures were the main reasons leading to probation (Al-Mahrouqia & Karadsheh, 2016).
Abdul-Wahab et al. (2019) studied the reasons behind the unwillingness of students to attend faculty office hours. The study concluded that students spent limited hours meeting faculty during office hours, and some did not visit faculty at all during office hours. The authors explained that this was due to busy student schedules, conflict in availability between students and their academic advisors, finding the required information without needing the academic advisor, and academic faculties’ discouragement about students attending set office hours.
Sarfra et al. (2022) conducted a study to identify the association between Unified Theory of Acceptance and Use of Technology (UTAUT) hypotheses and the academic performance of the students in terms of the student attitude. The study’s sample size was 1050 students. The results revealed that trust in technology strengths the relationship between the UTAUT and the academic performance. In fact, student attitude facilitates the association between UTAUT hypotheses and the academic performance. To predict the students’ performance Deeva et al. (2022) employed the iBCM (interesting Behavioral Constraint Miner) to build a predictive models based for early identification of students’ behavior (De Smedt et al., 2019). The authors believed that using demographic data such as age, gender, previous education, achievements, and personality might led to bias results as most of this data is missing. Thus, the authors limited their study to the e-learning platform to identify underperforming students early in the courses. It has been found that the proposed model is predictive with an accuracy of 90%. It managed in discovering informative pattern about the students’ performance in the e-learning platform. Furthermore, Jiao et al. (2022) employed AI predictive models to explore the best prediction model for student academic performance. They found out that the prediction model was critical when it comes to evaluating the learning performance of students but with some limitations.
The student population increased by almost 7.6% between 2010 and 2020 at Sultan Qaboos University (SQU). This increment put additional pressure on management to ensure the efficiency and effectiveness of the education quality, student retention, and academic achievement. Additionally, several student-related challenges have increased, which affected the students’ academic performance and resulted in them falling under probation or withdrawing from the university. According to SQU regulation, to fall under probation, one of the following three conditions should apply (SQU Academic Procedure, 2019): (a) Grade Point Average (GPA) is less than 2.00; (b) GPA is 2.00 or more, but the semester GPA is less than 1.00; (c) GPA is 2.00 or more, but the semester GPA is less than 2.00 for two consecutive semesters.
Having looked into the studies conducted earlier in SQU, it can be noticed that the college student challenges covered so far can be categorized under academic, social, economic, or psychological factors. These studies showed a descriptive analysis of the factors affecting student academic achievements in general. Some of the studies relied on statistical data and were analyzed using Statistical Package for the Social Sciences (SPSS). However, none of the studies conducted in SQU utilized the data mining approach by developing a model to predict the student academic situation based on the historical data available. In addition, none studied the degree to which the identified factors affect the probability of falling under probation. Thus, there is a need to carry out further investigation to highlight the factors affecting students’ academic performance. With this aim in mind, we employed advanced analytics approaches such as machine learning to effectively extract hidden patterns from the available data to support decision-makers.
Supervised learning
Data mining (DM) is defined as the process of employing machine learning (ML) to extract useful patterns and knowledge from large complex datasets (Mellor et al., 2018; Kulin et al., 2021). Mainly, there are three different types of machine learning algorithms: supervised, unsupervised and semi- supervised algorithms. This study focus on the supervised machine learning as it is considered as the most suitable technique for the prediction (Sekeroglu et al., 2021). Supervised learning is a predictive technique that has the ability to train the built model to classify the dataset and predict outcomes accurately. It trains the built model extract hidden insights using the given data dataset. ML develops algorithms based on historical data splits into training and testing subsets (Khan & Ghosh, 2018; Kulin et al., 2021). The training subset is used to build a model to predict the target class (Khan, 2019). By the passage of time and by feeding more data to the built model, ML can enhance its internal program to perform tasks more efficiently (Kulin et al., 2021). There are several supervised machine learning algorithms and the most common and powerful ones are described below:
Artificial Neural Network is one of the most popular supervised algorithms used in data mining (illustrated in Fig. 2). It is designed to solve complex problems by simulating the human brain (Khanna et al., 2016; Mengash, 2020). It consists of several connected units, which are called artificial neurons. These units input, process, and output the data for further processing (Shah et al., 2019). The output of each unit is calculated based on the weighted sum of its inputs. Therefore, these weights are used to determine whether the neuron signal is strong or weak (Tomasevic et al., 2020).
Fig. 2.

Artificial Neural Network Layers (Chugh et al., 2019)
Decision Tree is considered as one of the simplest visualized classification algorithms (Mengash, 2020). It uses a tree-shaped graph to demonstrate the relationship between the features used within the datasets based on its corresponding consequences (Tomasevic et al., 2020). Each node represents each feature, and each branch represents a possible value corresponding to that node (Mengash, 2020). As shown in Fig. 3, the branches are opened based on the relevant frequencies (Rivas et al., 2020). The leaf of the tree represents the target variable to be achieved. One of the major disadvantages of this algorithm is overfitting; however, this issue is accounted for by the random forest algorithm (Tomasevic et al., 2020).
Fig. 3.

Decision Tree graph demonstration (Akbari and Solnik, 2021)
J48 Decision Tree is an open-source Java implementation of C4.5 algorithm in a tool called Waikato Environment for Knowledge Analysis (WEKA), (Nahar et al., 2021; Jalota & Agrawal, 2019). It is an extension of the ID3 algorithm (Du et al., 2020). It aims to build the decision tree based on information entropy, where each attribute is split into subsets based on information gain (Jia, 2013). The final decision is represented on a leaf based on the highest information gain calculated.
Support Vector Machine (SVM), depicted in Fig. 4, is a supervised learning method that classifies the features based on their categories by dividing them as wide as possible from consequences (Tomasevic et al., 2020). It can be used in small databases and less time is required to build the model than other classification algorithms (Mengash, 2020). SVM can be used in linear and non-linear classification by using the “kernel trick” to map the inputs into high dimensional space (Jia, 2013).
Fig. 4.

Support Vector Machine (García et al., 2016)
Naïve Bayes is one of the simplest probabilistic supervised algorithms based on the Bayes’ theorem (Yang, 2019; Mengash, 2020). This classifier predicts each feature’s probability and assigns it to its belonging class (Tomasevic et al., 2020). Simplicity, scalability, and applicability to real-world problems are common characteristics of the Naïve Bayes algorithm: see Figs. 5 and 6.
Fig. 5.

Naïve Bayes Classifier (S)
Fig. 6.

Naïve Bayes Equation (Gamal, 2020)
K-Nearest Neighbors (k-NN), also called lazy learner (Wang, 2011), is a simple but effective supervised algorithm where an object is assigned to the most common class based on the neighbors’ votes, as depicted in Fig. 7. It uses the distance to predict the class, and once the data is normalized, better accuracy is gained. It has many advantages: it can be implemented quickly, it supports nonparametric analysis (Ni & Nguye, 2009), and the time required to build the model is based on k (Wang, 2011). However, if the data has lots of outliers or missing data then this algorithm is ineffective (Triguero et al., 2019).
Fig. 7.

Classifying New Data Using K-NN Algorithm (Sang, 2022)
Methodology
The KDD methodology is employed in this study to conduct the experiments (see Fig. 8). This method is widely used by scientific researchers in the data mining field (Kalavathy et al., 2007; Rahman, 2014; Thonnard & Dacier, 2008; Rahman et al., 2016; Mariscal et al., 2010; Orriols-Puig et al., 2013). It is an iterative process that consists of eight interactive phases: problem specification, resourcing, data cleansing, preprocessing, data mining, evaluation, interpretation, and exploitation (Debuse et al., 2000). It is always possible to navigate between the phases and return to the previous one to fine-tune the parameters to optimize the result.
Fig. 8.

KDD Stages (Du et al., 2020)
Dataset
The dataset used in this study was provided by the Deanship of Admission and Registration databases at SQU. As shown in Table 1, it is a structured dataset that has mixed type features. Initially, the sample consisted of 33 features and 37,599 students, and the data represented student cohorts from fall 2009 to summer 2019. This large sample size facilitated the development of better prediction models with higher accuracy percentages. Due to the COVID-19 pandemic and online learning, data from 2020 onwards were excluded from the study.
Table 1.
Data Dictionary
| Attribute Name | Type | Missing Values% | Unique |
|---|---|---|---|
| Student No. | Numeric | 0% | 100% |
| Cohort | Numeric | 0% | 0% |
| College | Nominal | 0% | 0% |
| College code | Nominal | 0% | 0% |
| Major | Nominal | 8565 (23%) | 0% |
| Major code | Nominal | 8565 (23%) | 0% |
| Minor | Nominal | 34,251 (91%) | 0% |
| Spec | Nominal | 34,923 (93%) | 0% |
| Degree | Numeric | 0% | 0% |
| Status | Nominal | 0% | 0% |
| Load status | Nominal | 1 (0%) | 0% |
| Gender | Nominal | 0% | 0% |
| Country | Nominal | 2 (0%) | 0% |
| Governorate | Nominal | 526 (1%) | 0% |
| Wellayah | Nominal | 287 (1%) | 0% |
| CGPA | Numeric | 145 (0%) | 0% |
| Estimated Graduation year | Numeric | 0% | 0% |
| From HEAC | Nominal | 1480 (4%) | 0% |
| Admission category | Nominal | 36,930 (98%) | 0% |
| Birth date | Nominal | 5 (0%) | 2% |
| Actual Graduation date | Nominal | 20,158 (54%) | 1% |
| Withdrawal | Nominal | 33,535 (89%) | 2% |
| Marital status | Nominal | 0% | 0% |
| SQU hostel | Nominal | 0% | 0% |
| Percentage (secondary school score) | Nominal | 0% | 0% |
| Probation student | Nominal | 0% | 0% |
Data cleansing
To achieve the best results from the provided data, several data cleansing steps were performed. All irrelevant parameters which contributed no value for prediction of the target class were removed. This included:
Student IDs; it is a unique attribute for each individual.
All student information after summer 2019.
“Degree” column; all students were studying towards a bachelor’s degree.
“From HEAC”; almost all Omani students come from HEAC (Higher Education Admission Center).
“SQU Hostel”; due to data unavailability.
“Admission category”; it contains 98% of missing data.
“College name”; it has another representation, “College code.“
“Major name”; it has another representation, “Major Code.“
Actual Graduation Date (54% missing data).
Cumulative GPA is highly correlated to whether a student falls under probation.
Status is highly correlated with determining if a student falls under probation.
Load Status, GPA; highly correlated with determining if a student falls under probation.
Withdrawal; probation is for students who are still studying at the university, not those who have withdrawn.
Graduation; probation is for students who are still studying at the university, not those who have graduated.
Moreover, the years of study feature was added to the dataset based on a calculation of the years between the student cohort and the graduation year. However, the number of years for students who are still to graduate was calculated by deducting students’ cohorts from 2021. This feature helped identify if years of study (studying duration) affected the student academic achievements regardless of the year a student was registered in the university or the graduation year. In addition, the values of student’s attributes such as major, minor, specialization, and probation were entered as either YES or NO. This is because these attributes initially had lots of missing data due to their inapplicability to all students. Therefore, changing their values to (YES/NO) provided a better contribution to build the prediction model. The “Birth Date” feature was replaced with a new feature called “Age”. This is because each student’s birth date is unique, but many students have the same age and can be grouped into ranges. It is worth mentioning that there were a few cases in which the data were not accurate. For example, in some cases, student CGPA depicted that the student was under probation, but the “Probation” status was set to NO. Hence, the values in the class were amended manually. Some other attributes also had some missing data, such as Study Years (8%) and Secondary School Percentage (11%). Hence, the replacement of missing values was applied to these attributes to autofill them based on existing patterns within the data set. Following this, extreme values and outliers were identified for the target class, then removed accordingly. Such cleansing resulted in a dataset with 14 features and N = 6154.
Feature selection
The Information Gain (InfoGain) algorithm was used to highlight the most effective features in the given dataset. Figure 9 illustrates the result of this algorithm. This study considered the following most effective parameters based on InfoGain: number of study years (study duration), secondary school percentage, cohort, major, and estimated graduation year. The remaining features had minimal or no impact on the prediction process. Therefore, they were eliminated.
Fig. 9.
Significant Features Using InfoGain
Validation
Several validation methods are available to validate the developed model, including a simple split of the dataset into train-test subsets and cross-validation. For this research, the 10 cross-validation was used as it provides less bias and fewer variance results (Gareth et al., 2013; Brownlee, 2018). Further, cross-validation consists of four main steps. Firstly, it shuffles the dataset randomly. Secondly, it splits the dataset into k number of groups (folds) almost equal in size. The first fold of data was used as a testing sample, and the remaining folds were used as training samples. Thirdly, for each iteration, a model was built using training data, evaluated using a test dataset, the evaluation scores were retained, and the model was computed. Fourthly, once all the models were evaluated k times, a validation summary was provided (Brownlee, 2018). Hence, this model offers the likelihood of each fold being used as a testing dataset in one iteration and a training dataset in the remaining iterations (Refaeilzadeh et al., 2009). In the current study, we used WEKA to estimate 10-fold cross-validation, to determine the accuracy rate of J48 and to identify the bagging algorithm accuracy. An equal splitting of the sample into two groups (training and testing datasets) allowed us to apply 10-fold cross-validation to the training data.
Evaluation
A crucial issue in predictive analysis is to measure and evaluate the supervised algorithms’ quality. Five evaluation methods were used to assess the generated model and select the optimal algorithm, including accuracy, precision, recall, F-measure (F1), and ROC Curve.
Accuracy: the ratio between the accurately classified values and the total dataset (shown in Eq. (1)). However, it is not a reliable method if the dataset is highly imbalanced (Vidiyala, 2020).
![]() |
1 |
Precision: the fraction of the predicted positive cases which are positive (Powers, 2020). Precision formula is presented in Eq. (2)
![]() |
2 |
Recall: is the fraction of the actual positive cases which were accurately predicted as positive, shown in Eq. (3), (Powers, 2020).
![]() |
3 |
F-measure: used when precision and recall are equally important to the model (Vidiyala, 2020). F-measure is illustrated in Eq. (4).
![]() |
4 |
ROC Curve: plotted using the commutative distribution function of true positive and false positive values. True positive values are plotted on the y-axis, and false positive values are plotted on the x-axis (See Fig. 10). The area generated under the curve is then used to assess the model performance.
Fig. 10.

ROC Curve
Experimental results and discussion
After selecting the most effective features, several supervised machine algorithms were used to construct predictive models: seven individual and three ensemble algorithms. The ensemble algorithms aim to combine more than one algorithm within it, so the strength of another algorithm will overcome the downside of one algorithm. The individual algorithms included J48, Random Forest, Random Tree, Naïve Bayes, K-NN, SVM and ANN. The ensemble algorithms included Vote, Logit Boost and Bagging. As for the voting algorithm, the results of these seven algorithms aggregated in the voting mechanism. Figure 11; Table 2 present the results of the experiments. The optimal individual algorithm is J48, with an accuracy of 82.4%. The enameling algorithm, which scored the second-best, was close to this accuracy (82.2%).
Fig. 11.
Accuracy Results of the Selected Algorithms
Table 2.
Accuracy of the Algorithms
| Types | Individual Algorithms | Combination of Algorithms | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Algorithms | J48 | Random Forest | Random Tree | Naïve Bayes | Multi Perceptron | SVM | K-NN (IBK) | Vote | Logit Boost | Bagging |
| Accuracy | 82.4% | 78.3% | 76.1% | 71.4% | 80.9% | 70.8% | 76.1% | 81.3% | 80.7% | 82.2% |
Tables 3 and 4 list the validation percentages of J48 and Bagging algorithms, respectively. It can be noticed that the average Precision, F-Measure, F-Measure and ROC Area are very high for both algorithms. These figures indicate that the accuracy of the winning algorithms is valid.
Table 3.
Accuracy of J48
| Class | Precision | Recall | F-Measure | ROC Area |
|---|---|---|---|---|
| No | 0.801 | 0.811 | 0.806 | 0.862 |
| Yes | 0.843 | 0.834 | 0.838 | 0.862 |
| Weighted Avg. | 0.824 | 0.824 | 0.824 | 0.862 |
Table 4.
Accuracy of Bagging
| Class | Precision | Recall | F-Measure | ROC Area |
|---|---|---|---|---|
| No | 0.802 | 0.805 | 0.803 | 0.882 |
| Yes | 0.839 | 0.836 | 0.838 | 0.882 |
| Weighted Avg. | 0.822 | 0.822 | 0.822 | 0.882 |
Based on the experimental results, the main identified factors of failing under probation are major, cohort, estimated graduation year, and the secondary school score. At the College of Economics and Political Science, students are classified as major and non-major students. Major students have already specialized in one of the college’s eight specializations, whereas non-major students are still required to complete the required courses. It has been found that non-majored students from cohorts 2017 and earlier and from cohorts post-2019 have fallen under probation. Further, the non-majored students from cohorts 2017 onwards who are estimated to graduate after 2023 and have scored more than 86.1% at secondary school are expected to fall under probation. Concerning the major students, they are likely to fall under probation if: (1) they are in their fourth academic year and expected to graduate in 2023 or earlier, (2) they have studied for five years or less and are estimated to graduate after 2023, (3) they study for five to six years in the university and scored 93% or less in their high school, (4) they are from cohorts 2009 and earlier, have studied for five to six years in the university and scored more than 93% in their high school, and (5) they are students with a known major who study in the university for more than six years.
As per the KDD process, the obtained experimental results should be reviewed and validated by domain experts. Following this, the results explained above were discussed with five academic faculty members and ten male and female students chosen randomly from different colleges from SQU. Some of the faculty members chosen currently hold or have previously held managerial positions in SQU, such as Dean of Student Affairs and Head of the Center for Community Services and Continuing Education. This process ensured that the feedback provided was not purely from an academic perspective but also from managerial and administrative points of view. The interview focused on the top three factors identified as per DM analysis: study duration, secondary school percentage, and gender.
The experimental results revealed that students who achieved low total marks in high school are likely to fall under academic probation. This is because the educational level at university requires an essential foundation laid in secondary school subjects. On the other hand, it was found that some students with very high scores still fall under probation. This finding is aligned with the study by Ibrahim and Al-Barwani (1993) in which the researchers highlighted the secondary school issue. Thus, the secondary school grade should be the only criterion for the admission and allocation of students to different university colleges. As per the domain expert, this is the students themselves as they become overconfident. No matter the difficulty level of the courses, students tend to believe that with minimum effort, they will achieve high scores. The interviewees (students) confirmed this fact as they have experienced it themselves. Another factor contributing negatively to student academic performance is the allocation system to the different university colleges or majors. Due to the seat availability limitation, students are asked to rank their preferred colleges. The system then allocates students based on their competitive secondary school. Therefore, some students might not get their first, or even second, preference and end up in a college they do not like. The same philosophy is applied to the majoring procedures. Students rank their preferences, and then the system assigns them to the different majors based on their pre-major marks. There is always a chance that some students might find themselves in a major they do not prefer.
Academic achievement is also affected by gender. This confirmed the finding of Ibrahim and Al-Barwani (1993), which raised the gender issue. However, the employment of machine learning provides insights into such a parameter. We found that male students are more susceptible to this type of influence as they constituted the majority of students who fell under probation. There are several reasons for this academic failure due to gender difference.
First and foremost is the English language barrier, especially for students who completed their secondary school in public schools. This is because the language of instruction in public schools is mainly Arabic. Out of eight Arabic courses, there is only one single English course. Students believe that such a course is insufficient to enrich them with adequate English fluency.
Concerning the study duration, generally, students fall under academic probation in their first or second year or if they exceed the number of study years proposed by the university. This finding is aligned with the study by Khan and Ghosh (2018). Usually, first-year students register for courses without consulting their academic advisors; instead, they seek academic advice from other senior students. Consequently, they miss the pre-requisite courses and, in some cases, fail to pass some courses, which may result in falling behind the academic schedule. Students justify this by arguing that some academic advisors are new; hence, they do not have updated adequate knowledge about the degree plan, major requirements, degree audit, transfer policy, and other procedures. They also added that although the university provides the faculty members with several workshops, these workshops are not effective as they are limited to the university’s basic rules and regulations. This finding is consistent with results found by AlGhanboosi and Kadhim (2004) and Moosa and Ibrahim (2008). The researchers highlighted the concern of the academic advisor in addressing students’ academic issues and providing the appropriate advice. The study of Abdul-Wahab et al. (2019) also raised the same concern but from a different perspective. The researchers believed that a timetabling conflict is the main obstacle to not visiting the academic advisor. They also added that some of the academic advisors are not friendly.
Psychological instability is another factor that domain experts have raised. They claimed that male students face this issue more than female students for several reasons. On-campus female students are provided with food, entertainment facilities, study rooms, and other cleaning and health services. For cultural background reasons, off-campus female students spend most of their time at home. Such time can be utilized for family-related issues or studying. Hence, females generally have a better academic life.
On the other hand, male students exhibit psychological instability in their academic life due to living in off-campus accommodations. Due to financial commitments, students rent sharing rooms. Such rooms normally are overcrowded; hence, they are considered an unhealthy environment for studying. Daily transportation, paying water and electricity bills, and cooking meals are other challenges for the male students. This result aligns with similar results documented in the literature (Moosa & Ibrahim, 2008). Besides, AlHarthi et al. (2011) addressed the concern of off-campus accommodation. However, due to the implementation of machine learning in our study, the results give comprehensive details about such an issue.
Implications of the study
Data mining techniques are powerful in identifying hidden knowledge from huge datasets. When implemented correctly in the educational sector, they can reveal huge benefits for the higher educational institution, students, education policymakers, and the country’s prosperity.
This study contributed positively to the educational data mining field by identifying the factors associated with academic probation of undergraduates at SQU from fall 2009 to summer 2019. The findings can help policymakers, experts, and educators better understand the reasons behind the academic failure that can lead to probation status. This can help management improve their strategic decision-making about the admission requirements to the Sultan Qaboos University by accounting for some other factors besides high school performance. Additionally, the study can help find the most appropriate approach for colleges’ admission and pre-major criteria. Academic advisors could also benefit from the findings of this study. For example, academic supervisors can increase their awareness of factors that negatively impact students’ performance based on large data and tailor their advising strategies accordingly. Similarly, students can understand and avoid the factors that cause poor academic performance. On top of this, the proposed model can be used to predict the academic progress of the students at the university once they are enrolled. Based on the prediction outputs of the model, the associated educational stakeholders can make the necessary decisions accordingly.
It is recommended for university policymakers to revisit the acceptance criteria, such as the current threshold of secondary school scores. Based on the conducted experiments in this study, a secondary school score of at least 86% should be used as admission criteria to the College of Economics and Political Science to mitigate the number of probation cases. The Deanship of Student Affairs and Deanship of Admission and Registration at SQU should collaborate with concerned internal and external parties to resolve issues of students who cannot continue their undergraduate studies due to social barriers. It is recommended that students under probation with a GPA of 1.90 or above should be given one more chance to improve their academic issues. Faculty members should emphasize the importance of attending academic advising sessions for students who are at risk or are under probation. Faculty should monitor their advisees’ academic performance, meet them twice per semester, and set action plans accordingly to prevent any obstacles to academic progress. They should dedicate continuous follow-up sessions with those students who are under probation and at risk of falling under probation. This will help students leave the probation period efficiently. Further, parents should be involved, when possible, in their child’s academic situation. If parents are adequately informed, they could collaborate and provide students with a much better studying environment.
Comparative analysis
To highlights the novelty and robustness of our proposed work compared to others, Table 5 presents the comparative analysis of our work with some previous works. According to the comparative analysis, it is clear that our proposed work outperforms the existing works in terms of novelty, the implantation of machine learning (ML) algorithms, and the sophisticated evaluation method. The ML algorithms include J48, Random Forest, Random Tree, Naïve Bayes, Multi Perceptron SVM, K-NN (IBK), Vote, Logit Boost, and Bagging, Whereas the evaluation methods employed accuracy, precision, recall, F-measure (F1), and ROC Curve. Our results confirmed among the ML algorithms considered, the optimal result was obtained by the J48 algorithm.
Table 5.
Comparative Analysis of our Work with Existing Works
| Authors | Title | Analysis / Study Performed | Methodology / Parameters | Results |
|---|---|---|---|---|
| Ibrahim and Al-Barwani (1993) | A study of Omani Secondary School Certificate Examination as a Predictor of Academic Performance of Sultan Qaboos University | Analyzing Students’ performance | Data collected from the admission and registration database | The study revealed that factors like gender, college, major, motivation, and tendencies affect the overall student academic achievement |
| AlGhanboosi and Kadhim (2004) | Problems of Academic Supervision at Sultan Qaboos University from Professors and Students Perspectives | Studying the problems of academic supervision | Survey | The study revealed that the students face many difficulties in the university, most crucial of which are related to academic advising |
| Moosa and Ibrahim (2008) | Academic Observation as Perceived by Students: Causes, Reactions, and Remedies. | Identifying the causes of falling under probation | Survey and Interview | The most crucial factors leading to probation were the inability of the students to acclimatize to a new university lifestyle, a lack of proper learning style, absenteeism, the failure to manage the time effectively, and an inability to assume responsibility. |
| AlHarthi et al. (2011) | Predicting the difficulties faced by students living outside the university campus in light of some demographic variables | Predicting students’ difficulties outside the university campus | Survey | Most significant challenges were high rent prices, low student allowance insufficient for covering student necessities, transportation, commuting from and to the university, as well as other health, academic, and social challenges. |
| Al-Mahrouqia and Karadsheh (2016) | Sultan Qaboos University Students Reasons of Being under Observation. | Investigating factors that cause falling under probation | Survey and web-based questionnaire | English language difficulties, an inability to study well before the exams, lack of time management skills and difficulties in focusing and comprehension during the lectures were the main reasons leading to probation |
| Abdul-Wahab et al. (2019) | Students’ Reluctance to Attend Office Hours: Reasons and Suggested Solutions | Studying the reasons behind the unwillingness of students to attend the office hours. | Survey and web-based questionnaire | Student have busy schedules, conflict in availability between students and their academic advisors, finding the required information without needing the academic advisor, and academic faculties’ discouragement students to attend the office hours. |
| Sarfra M., Khawaja K. F., Ivascu, L. (2022) | Factors affecting business school students’ performance during the COVID-19 pandemic: A moderated and mediated model | Identifying the factors that affect students’ performance | Survey and web-based questionnaire | The results indicate that student attitude mediates the relationship between UTAUT constructs and student academic performance, with trust in technology strengthening the relationship. |
| Deeva G. et al. (2022) | Predicting student performance using sequence classification with time-based windows | to build a predictive models based for early identification of students’ behavior | Data obtained from the e-learning model. | The proposed model has an accuracy of 90%. It managed in discovering informative pattern about the students’ performance in the e-learning platform. |
| Our proposed work | Using Machine Learning to Predict Factors Affecting Academic Performance: The Case of College Students on Academic Probation | Identifying factors that negatively impacted academic performance among college students on probation | Machine learning techniques, KDD methodology, and evaluation methods based on accuracy, precision, recall, F-measure (F1), and ROC Curve. | The study revealed that gender, estimated graduation year, cohort, and academic specialization significantly contributed are the main factors that determine students’ academic performance. |
Conclusion
In this research, different data mining algorithms were used to identify the factors contributing to students falling under academic probation. Ensemble algorithms were included to optimize the overall prediction accuracy. The feature selection identified the study duration, secondary school percentage, cohort, major and estimated graduation year, as the most significant factors. It has been illustrated that the J48 algorithm is the optimal algorithm for use. The experimental results revealed that study duration, secondary school score, and gender are the main identified factors resulting in undergraduate students falling under probation in SQU from fall 2009 to summer 2019. Primarily, falling under probation has two drivers: major and non-major students. The portion of male students under probation is higher than the female students. It has been highlighted that the English language is a barrier to students’ performance. Domain experts were interviewed to justify the obtained results. It is recommended that the university should provide efficient workshops for faculty members on effective advisory techniques. Domain experts suggested including soft skills such as emotional intelligence for academic advisors. These skills will not only help faculty members to better conduct advisory sessions, but are beneficial for providing professional academic advising. Students are advised to meet their academic advisor regularly to gain appropriate advice and ensure that they are on schedule.
Limitations and Future Research Direction
Although the aim of this study was achieved, some limitations should be acknowledged. Firstly, data related to the previous two years is not listed due to the COVID-19 pandemic. Secondly, there were missing values is some of the provided data. Future work should include more features/attributes that are not covered in this study, like transfer from/to colleges, courses taken and grades received, background data of students and their families, and any other factor not covered in this research. In addition, more algorithms can be used to optimize the predictive results. The associated rules could be combined to result in the highest probability of falling under probation. Moreover, identifying the reasons behind students failing common courses may constitute another contributor for probation. Finally, we recommend future studies to develop machine learning models that can predict the quality of education, educational inclusiveness, and accessibility to all students regardless of their socio-economic background to derive better policies or interventions.
Al-Busaidi, A. S., Dauletova, V., & Al-Wahaibi, I. (2022). The role of excessive social media content generation, attention seeking, and individual differences on the fear of missing out: a multiple mediation model. Behaviour & Information Technology, 1–21.
Biographies
Lamees Al-Alawi
has over 6 years of working experience as a system analyst in Petroleum Development Oman, a leading Oil and Gas company in the Sultanate. She earned her master degree with honor-class in Information systems, specializing in Data Mining in 2021. Her dissertation entitled as: Applying Data Mining Techniques to Predict the Factors Associated with Academic Probation of Undergraduates at Sultan Qaboos University. She has a published paper entitled as Applying NIST SP 800-161 in Supply Chain Processes Empowered by Artificial Intelligence, which was presented in 2021 22nd International Arab Conference on Information Technology (ACIT). Lamees won the first place of the Student Excellence Award of Information Technology in Oman organized by EY in 2015. Her research interests include Artificial Intelligence, Machine Learning, Data Mining and Data Science.
Dr. Jamil Al-Shaqsi
has over 19 years of teaching and research experience in academia in Oman and the UK. Dr. Jamil won a Covid-19 research grant funded by the Ministry of Higher Education, Scientific Research and Innovation in 2021. It is entitled: Predicting and Strengthening Antibodies that Boost Immunity Against Covid-19 Disease to Develop Self-therapies by Using Machine Learning Techniques. He also won a research grant funded by The Research Council Oman in 2015. It is entitled: Mobile Speed Radar. He was awarded by HE Sheikh Tamim the Ameer of Qatar due to Excellence in the Field of Computer Science in the Gulf, 2015. Dr. Jamil won the first prize in a regional competition organized by the Institute of Engineering and Technology in the United Kingdom in 2010. His research interests include Artificial Intelligence, Machine Learning, Data Mining and Application Development. He conducts researches on machine learning and ensemble methods for data mining. He proposed a novel 3-staged clustering algorithm and an ensemble clustering algorithm in the area of data mining. In addition, he proposed a new similarity measure to calculate the similarity between data samples in a given dataset. He was part of the Mathematics and Algorithms (MAG) research group at the University of Each Anglia. Has given master level labs and seminars in the School of Computer Science at University of East Anglia (UEA). He was a member of the Institute of Engineering and Technology (IET) in the UK. He has published and presented papers in many local and international conferences.
Dr. Ali Tarhini
is an Associate Professor in the Department of Information Systems at Sultan Qaboos University, Sultanate of Oman. He holds a Master degree in E-commerce from University of Essex, and a PhD in Information Systems from Brunel University London, UK. His research interests include Consumer Behavior, Social Media, Artificial Intelligence (AI)-enabled Customers Service, social media, cutting-edge technology, Knowledge Management and cross-cultural issues in IT (at individual and national culture, cross-cultural studies). Dr. Tarhini has published over 60 research papers in international journals including the Computers in Human Behavior, Information Technology & people, Journal of Enterprise Information Management, Information Systems Frontiers, Journal of Management Development, Management Research Review, Production Planning & Control, British Journal of Educational Technology, Journal of Global Information Management, Journal of Computing in Higher Education, Interactive Learning Environments, Educational Technology Research and Development.
Dr. Adil S. Al-Busaidi
is an assistant professor of organizational communication, Director of Innovation and Technology Transfer Center at Sultan Qaboos University, Project Manager of the transition to the Entrepreneurial University project, and former Director of Research & Innovation at Smart City Platform. His interests revolve around research methodologies on computational social science, causal modeling, scale development, and psychometric testing. He developed R programming-based applications for social media opinion mining, sentiment analysis, social network analysis, and sample size calculation.
Data Availability
The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request.
Declarations
Conflict of interest
None
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Lamees Al-Alawi, Email: lamis.Alalawi@gmail.com.
Jamil Al Shaqsi, Email: alshaqsi@squ.edu.om.
Ali Tarhini, Email: ali.tarhini@hotmail.co.uk.
Adil S. Al-Busaidi, Email: abusaid@squ.edu.om
References
- Abdul-Wahab SA, Salem NM, Yetilmezsoy K, Fadlallah SO. Students’ reluctance to attend Office hours: Reasons and suggested solutions. Journal of Educational and Psychological Studies [JEPS] 2019;13(4):715–732. doi: 10.53543/jeps.vol13iss4pp715-732. [DOI] [Google Scholar]
- Akbari A, Ng L, Solnik B. Drivers of economic and financial integration: A machine learning approach. Journal of Empirical Finance. 2021;61:82–102. doi: 10.1016/j.jempfin.2020.12.005. [DOI] [Google Scholar]
- Al-Busaidi, A. S., Dauletova, V., & Al-Wahaibi, I. (2022). The role of excessive social media content generation, attention seeking, and individual differences on the fear of missing out: a multiple mediation model. Behaviour & Information Technology, 1–21.
- Al-Emran, M., Al-Nuaimi, M. N., & Arpaci, I. (2022). Towards a wearable education: Understanding the determinants affecting students’ adoption of wearable technologies using machine learning algorithms.Education and Information Technologies,1–20.
- Al Hamdi, S. S. N., & Edakkalayil, L. A. (2022). Measuring Students’ Performance in Face To Face and Online Learning-An Empirical Evidence From Oman in the Pre and During the Covid-19 Pandemic Period. Proceedings of the fourth international conference on teaching, learning and Education, Berlin, Germany, 11–13 March 2022.
- AlHarthi H, Kadhim A, et al. Predicting the difficulties faced by students living outside the university campus in light of some demographic variables. Journal of Qualitative Educational Research. 2011;18(3):306–430. [Google Scholar]
- Al-Kindi, I., & Al-Khanjari, Z. (2020, August). A Novel Architecture of SQU SMART LMS: The New Horizon for SMART City in Oman. In 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT) (pp. 751–756). IEEE.
- Al Muqarshi, A. (2022). Outsourcing, national diversity and transience: the reality of social identity in an ELT context in Omani higher education. International Journal of Qualitative Studies in Education, 1–17.
- Al-Mahrouqia R, Karadsheh MA. Sultan Qaboos University students reasons of being under Observation. Humanities and social sciences. 2016;43(3):2343–2360. [Google Scholar]
- Al-Sharafi, M. A., Al-Emran, M., Iranmanesh, M., Al-Qaysi, N., Iahad, N. A., & Arpaci, I. (2022). Understanding the impact of knowledge management factors on the sustainable use of AI-based chatbots for educational purposes using a hybrid SEM-ANN approach.Interactive Learning Environments,1–20.
- AlGhanboosi S, Kadhim A. Problems of Academic Supervision at Sultan Qaboos University from Professors and students perspectives. Journal of Education. 2004;10(2):39–75. [Google Scholar]
- Anil, Ö., & Batdi, V. (2022). Use of augmented reality in science education: A mixed-methods research with the multi-complementary approach.Education and Information Technologies,1–39.
- Belwal R, Belwal S, Sufian AB, Al Badi A. Project-based learning (PBL): Outcomes of students’ engagement in an external consultancy project in Oman. Education + Training. 2020;63(3):336–359. doi: 10.1108/ET-01-2020-0006. [DOI] [Google Scholar]
- Bowman, N. A., & Jang, N. (2022). What is the Purpose of Academic Probation? Its Substantial Negative Effects on Four-Year Graduation.Research in Higher Education,1–27.
- Brownlee, J. (2018). August 3, 2020). A Gentle Introduction to k-fold Cross-Validation. Online resources.
- Chugh S, Gulistan A, Ghosh S, Rahman BMA. Machine learning approach for computing optical properties of a photonic crystal fiber. Optics express. 2019;27(25):36414–36425. doi: 10.1364/OE.27.036414. [DOI] [PubMed] [Google Scholar]
- De Smedt J, Deeva G, De Weerdt J. Mining behavioral sequence constraints for classification. IEEE Transactions on Knowledge and Data Engineering. 2019;32(6):1130–1142. doi: 10.1109/TKDE.2019.2897311. [DOI] [Google Scholar]
- Debuse JCW, Iglesia B, Howard CM, Rayward-Smith VJ. Industrial Knowledge Management. London: Springer; 2000. Building the KDD Roadmap: A methodology for Knowledge Discovery; pp. 179–196. [Google Scholar]
- Du X, Yang J, Hung JL, Shelton B. Educational data mining: A systematic review of research and emerging trends. Information Discovery and Delivery. 2020;48(4):225–236. doi: 10.1108/IDD-09-2019-0070. [DOI] [Google Scholar]
- Deeva, G., De, S. J., Saint-Pierre, C., Weber, R., & De, W. J. (2022). Predicting student performance using sequence classification with time-based windows,Expert Systems with Applications,209. [DOI] [PMC free article] [PubMed]
- Gamal, B. (2020). Naïve Bayes Algorithm. Retrieved from https://medium.com/analytics-vidhya/na%C3%AFve-bayes-algorithm-5bf31e9032a2.
- Gareth, J., Daniela, W., Trevor, H., & Robert, T. (2013). An introduction to statistical learning: with applications in R. Spinger, London, UK.
- Hammad W, Al-Harthi ASA. Internationalisation of Educational Administration and Leadership Curriculum. Bingley: Emerald Publishing Limited; 2021. Aligning ‘international’standards with ‘national’educational leadership preparation needs: The case of a master’s programme in Oman; pp. 117–138. [Google Scholar]
- Hussain S, Gaftandzhieva S, Maniruzzaman M, et al. Regression analysis of student academic performance using deep learning. Educ Inf Technol. 2021;26:783–798. doi: 10.1007/s10639-020-10241-0. [DOI] [Google Scholar]
- Hussain M, Zhu W, Zhang W, Abidi SMR, Ali S. Using machine learning to predict student difficulties from learning session data. Artificial Intelligence Review. 2019;52(1):381–407. doi: 10.1007/s10462-018-9620-8. [DOI] [Google Scholar]
- Ibrahim A, Al-Barwani TA. A study of Omani secondary school Certificate Examination as a predictor of academic performance of Sultan Qaboos University. Research in college Teaching Practicum Research in Sultan Qaboos University. 1993;1:1–29. [Google Scholar]
- Imran, M., Latif, S., Mehmood, D., & Shah, M. S. (2019). Student Academic Performance Prediction using Supervised Learning Techniques.International Journal of Emerging Technologies in Learning, 14(14)
- Jalota, C., & Agrawal, R. (2019, February). Analysis of educational data mining using classification. In 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon) (pp. 243–247). IEEE.
- Jiao, P., Ouyang, F., Zhang, Q., & Alavi, A. H. (2022). Artificial intelligence-enabled prediction model of student academic performance in online engineering education.Artificial Intelligence Review,1–24.
- Jia, J. W. (2013). Machine learning algorithms and predictive models for undergraduate student retention at an HBCU (Doctoral dissertation, Bowie State University).
- Kalavathy, R., Suresh, R. M., & Akhila, R. (2007, December). KDD and data mining. In 2007 IET-UK International Conference on Information and Communication Technology in Electrical Sciences (ICTES 2007) (pp. 1105–1110). IET.
- Khan, F. (2019). Design Thinking humanizes Data Science & more. retrieved from https://medium.com/technicity/design-thinking-humanizes-data-science-more-5a666119c8b1.
- Khan A, Ghosh SK. Data mining based analysis to explore the effect of teaching on student performance. Educ Inf Technol. 2018;23:1677–1697. doi: 10.1007/s10639-017-9685-z. [DOI] [Google Scholar]
- Khanna, L., Singh, S. N., & Alam, M. (2016, August). Educational data mining and its role in determining factors affecting students academic performance: A systematic review. In 2016 1st India international conference on information processing (IICIP) (pp. 1–7). IEEE.
- Kulin M, Kazaz T, De Poorter E, Moerman I. A survey on machine learning-based performance improvement of wireless networks: PHY, MAC and network layer. Electronics. 2021;10(3):318. doi: 10.3390/electronics10030318. [DOI] [Google Scholar]
- Kumar R, Sharma A. Data mining in education: A review. International Journal of Mechanical Engineering and Information Technology. 2017;5(1):1843–1845. doi: 10.18535/ijmeit/v5i1.02. [DOI] [Google Scholar]
- Mariscal G, Marban O, Fernandez C. A survey of data mining and knowledge discovery process models and methodologies. The Knowledge Engineering Review. 2010;25(2):137–166. doi: 10.1017/S0269888910000032. [DOI] [Google Scholar]
- Maqableh M, Jaradat M, Azzam A. Exploring the determinants of students’ academic performance at university level: The mediating role of internet usage continuance intention. Educ Inf Technol. 2021;26:4003–4025. doi: 10.1007/s10639-021-10453-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mellor JC, Stone MA, Keane J. Application of data mining to “big data” acquired in audiology: Principles and potential. Trends in hearing. 2018;22:233–250. doi: 10.1177/2331216518776817. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mengash HA. Using data mining techniques to predict student performance to support decision making in university admission systems. Ieee Access : Practical Innovations, Open Solutions. 2020;8:55462–55470. doi: 10.1109/ACCESS.2020.2981905. [DOI] [Google Scholar]
- Moosa SM, Ibrahim AM. Academic Observation as Perceived by students: Causes, reactions, and remedies. Journal of Higher Education in the Arab World. 2008;11(2):15–28. [Google Scholar]
- Sarfra, M., Khawaja, K. F., & Ivascu, L. (2022). Factors affecting business school students’ performance during the COVID-19 pandemic: A moderated and mediated model,The International Journal of Management Education, 20(2).
- Nahar K, Shova BI, Ria T, et al. Mining educational data to predict students performance. Educ Inf Technol. 2021;26:6051–6067. doi: 10.1007/s10639-021-10575-3. [DOI] [Google Scholar]
- Naicker, N., Adeliyi, T., & Wing, J. (2020). Linear support vector machines for prediction of student performance in school-based education. Mathematical Problems in Engineering, 2020.
- Nilashi, M., Abumalloh, R. A., Zibarzani, M., et al. (2022). What factors influence students satisfaction in massive Open Online Courses? Findings from user-generated content using Educational Data Mining. Educ Inf Technol.
- Oman (2040 vision). [online] Available: https://www.2040.om/wp-content/uploads/2019/02/190207-Preliminmy-Vision-Docunent-English.pdf.
- Orriols-Puig A, Martínez-López FJ, Casillas J, Lee N. Unsupervised KDD to creatively support managers’ decision making with fuzzy association rules: A distribution channel application. Industrial Marketing Management. 2013;42(4):532–543. doi: 10.1016/j.indmarman.2013.03.005. [DOI] [Google Scholar]
- Powers, D. M. W. (2020). Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. ArXiv abs/2010.16061.
- Rahman, F. A., Desa, M. I., Wibowo, A., & Haris, N. A. (2014). Knowledge discovery database (KDD)-data mining application in transportation. Proceeding of the Electrical Engineering Computer Science and Informatics, 1(1), 116–119.
- Rahman, F. A., Desa, M. I., & Wibowo, A. (2016, June). A review of kdd-data mining framework and its application in logistics and transportation. In The 7th International Conference on Networked Computing and Advanced Information Management (pp. 175–180). IEEE.
- Refaeilzadeh P, Tang L, Liu H. Cross-validation. Encyclopedia of database systems. 2009;5:532–538. doi: 10.1007/978-0-387-39940-9_565. [DOI] [Google Scholar]
- Rivas A, Gonzalez-Briones A, Hernandez G, Prieto J, Chamoso P. Artificial neural network analysis of the academic performance of students in virtual learning environments. Neurocomputing. 2021;423:713–720. doi: 10.1016/j.neucom.2020.02.125. [DOI] [Google Scholar]
- Sang (2022). K-Nearest Neighbor(KNN) Algorithm for Machine Learning. from https://www.javatpoint.com/k-nearest-neighbor-algorithm-for-machine-learning.
- Sekeroglu B, Abiyev R, Ilhan A, Arslan M, Idoko JB. Systematic literature review on machine learning and student performance prediction: Critical gaps and possible remedies. Applied Sciences. 2021;11(22):10907. doi: 10.3390/app112210907. [DOI] [Google Scholar]
- Shah, M. B., Kaistha, M., & Gupta, Y. (2019, November). Student Performance Assessment and Prediction System using Machine Learning. In 2019 4th International Conference on Information Systems and Computer Networks (ISCON) (pp. 386–390). IEEE.
- Shoyukhi M, Vossen PH, Ahmadi AH, Kafipour R, Beattie KA. Developing a comprehensive plagiarism assessment rubric. Educ Inf Technol. 2022 doi: 10.1007/s10639-022-11365-1. [DOI] [Google Scholar]
- Shyamala, K. (2008). A study on data mining techniques using higher educational system for efficient prediction. Department of Computer Science, Mother Teresa Women’s University. Doctor of Philosophy in Computer Science.
- Academic Procedure, S. Q. U. (2019, February 24). Retrived from https://www.squ.edu.om/Portals/14/Users/027/27/27/Academic%20Procedure%20Electronic%20Booklet%202019%20.pdf.
- SQU Annual Statistics Book 2019–2020 (2020). accessed on May 2021 retrieved from https://www.squ.edu.om/Portals/0/DNNGalleryPro/uploads/2020/9/3/AnnualStatisticsBOOK_2019-2020_compressed.pdf.
- Thonnard, O., & Dacier, M. (2008, December). Actionable knowledge discovery for threats intelligence support using a multi-dimensional data mining methodology. In 2008 ieee international conference on data mining workshops (pp. 154–163). IEEE.
- Tomasevic N, Gvozdenovic N, Vranes S. An overview and comparison of supervised data mining techniques for student exam performance prediction. Computers & education. 2020;143:1–15. doi: 10.1016/j.compedu.2019.103676. [DOI] [Google Scholar]
- Triguero I, García-Gil D, Maillo J, Luengo J, García S, Herrera F. Transforming big data into smart data: An insight on the use of the k‐nearest neighbors algorithm to obtain quality data. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2019;9(2):e1289. [Google Scholar]
- Tsiakmaki M, Kostopoulos G, Kotsiantis S, Ragos O. Transfer learning from deep neural networks for predicting student performance. Applied Sciences. 2020;10(6):2145. doi: 10.3390/app10062145. [DOI] [Google Scholar]
- Vidiyala, R. (2020). Performance Metrics for Classification Machine Learning Problems. Retrieved from https://towardsdatascience.com/performance-metrics-for-classification-machine-learning-problems-97e7e774a007
- Yang, S. (2019). An Introduction to Naïve Bayes Classifier: From theory to practice, learn underlying principles of Naïve Bayes. from https://towardsdatascience.com/introduction-to-na%C3%AFve-bayes-classifier-fa59e3e24aaf.
- Waheed H, Hassan SU, Aljohani NR, Hardman J, Alelyani S, Nawaz R. Predicting academic performance of students from VLE big data using deep learning models. Computers in Human behavior. 2020;104:106189. doi: 10.1016/j.chb.2019.106189. [DOI] [Google Scholar]
- Wang X, Yu X, Guo L, Liu F, Xu L. Student performance prediction with short-term sequential campus behaviors. Information. 2020;11(4):201. doi: 10.3390/info11040201. [DOI] [Google Scholar]
- Wang, X. (2011, July). A fast exact k-nearest neighbors algorithm for high dimensional search using k-means clustering and triangle inequality. In The 2011 International Joint Conference on Neural Networks (pp. 1293–1299). IEEE. [DOI] [PMC free article] [PubMed]
- Wook M, Yusof ZM, Nazri MZA. Educational data mining acceptance among undergraduate students. Educ Inf Technol. 2017;22:1195–1216. doi: 10.1007/s10639-016-9485-x. [DOI] [Google Scholar]
- Yakubu MN, Abubakar AM. Applying machine learning approach to predict students’ performance in higher educational institutions. Kybernetes. 2022;51(2):916–934. doi: 10.1108/K-12-2020-0865. [DOI] [Google Scholar]
- Zaffar M, Hashmani MA, Savita KS, Rizvi SSH, Rehman M. Role of FCBF feature selection in educational data mining. Mehran University Research Journal Of Engineering & Technology. 2020;39(4):772–778. doi: 10.22581/muet1982.2004.09. [DOI] [Google Scholar]
- Zhu, Y., Xu, S., Wang, W., Zhang, L., Liu, D., Liu, Z., & Xu, Y. (2022). The impact of Online and Offline Learning motivation on learning performance: the mediating role of positive academic emotion.Education and Information Technologies,1–18.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request.







