Skip to main content
IOS Press Open Library logoLink to IOS Press Open Library
. 2024 Nov 8;32(6):4947–4963. doi: 10.3233/THC-232054

Visual analysis and interactive interface design of students’ abnormal behavior introducing clustering algorithm

Xiaoqian Wu 1,*, Cheng Chen 1, Lili Quan 1
Editors: Chi Lin, Chang Wu Yu, Ning Wang
PMCID: PMC11612963  PMID: 38875056

Abstract

BACKGROUND:

Traditional methods have the limitations of low accuracy and inconvenient operation in analyzing students’ abnormal behavior. Hence, a more intuitive, flexible, and user-friendly visualization tool is needed to help better understand students’ behavior data.

OBJECTIVE:

In this study a visual analysis and interactive interface of students’ abnormal behavior based on a clustering algorithm were examined and designed.

METHODS:

Firstly, this paper discusses the development of traditional methods for analyzing students’ abnormal behavior and visualization technology and discusses its limitations. Then, the K-means clustering algorithm is selected as the solution to find potential abnormal patterns and groups from students’ behaviors. By collecting a large number of students’ behavior data and preprocessing them to extract relevant features, a K-means clustering algorithm is applied to cluster the data and obtain the clustering results of students’ abnormal behaviors. To visually display the clustering results and help users analyze students’ abnormal behaviors, a visual analysis method and an interactive interface are designed to present the clustering results to users. The interactive functions are provided, such as screening, zooming in and out, and correlation analysis, to support users’ in-depth exploration and analysis of data. Finally, the experimental evaluation is carried out, and the effectiveness and practicability of the proposed method are verified by using big data to obtain real student behavior data.

RESULTS:

The experimental results show that this method can accurately detect and visualize students’ abnormal behaviors and provide intuitive analysis results.

CONCLUSION:

This paper makes full use of the advantages of big data to understand students’ behavior patterns more comprehensively and provides a new solution for students’ management and behavior analysis in the field of education. Future research can further expand and improve this method to adapt to more complex students’ behavior data and needs.

Keywords: Clustering algorithm, student behavior, big data, visual analysis, interactive interface

1. Introduction

With the development of education and the progress of technology, the analysis of students’ abnormal behavior has become an important task in school management and educational intervention [1]. However, the traditional methods have some problems such as low accuracy and inconvenient operation when analyzing students’ abnormal behavior. To better understand students’ behavior data and provide valuable insight, a more intuitive, flexible, and user-friendly visualization tool is needed [2]. Therefore, this paper aims to study and design a visual analysis and interactive interface of students’ abnormal behavior based on the clustering algorithm.

Visualization technology plays an important role in behavior analysis [3]. When analyzing the anomalous behavior of students, visualization technology significantly improves the presentation of clustering results. Pattern recognition, communication, interpretability, temporal analysis, anomaly detection, cluster analysis, and interactive exploration are all made possible by it. These skills provide analysts the ability to recognize trends in behavior, decide wisely, and carry out successful interventions to promote the academic achievement and general well-being of students. By transforming data into visual elements, visualization technology can help users understand data more intuitively and find hidden patterns and trends [4]. In the field of students’ abnormal behavior analysis, visualization technology can present clustering results in a visual form, helping users to intuitively observe and explain students’ abnormal behavior patterns [5]. In addition, the design of interactive interfaces also provides convenience for users to explore and analyze data [6].

The innovation of this paper lies in the application of a clustering algorithm to the analysis of students’ abnormal behavior, and the corresponding visual analysis and interactive interface are designed. The structure of this paper is as follows: First, the research background and problem statement are introduced, and the research status of related fields is summarized. Then, the research methods are introduced in detail, including data acquisition and preprocessing and abnormal behavior discovery based on a clustering algorithm. Next, this paper shows the visual presentation of the research results and the design of the interactive interface. Finally, this paper summarizes the innovation and conclusion of the research. Through this structure, this paper aims to propose a visual analysis and interactive interface method to help users better understand students’ abnormal behavior data and provide decision support for school management and education intervention. The significance of examining unusual student behavior in school administration has increased with the advancement of education and technology. Technological developments in data analytics facilitate data-driven decision-making, address student well-being, facilitate tailored learning, and improve safety protocols for students. These developments support teachers in understanding the needs of their students, creating a supportive learning environment, and encouraging student achievement.

The paper explores the evolution of conventional approaches in analyzing students’ abnormal behavior and advancements in visualization technology while addressing limitations. It typically follows a structured format, including sections on historical approaches, advancements in visualization, challenges, integration of methods, case studies, discussion, implications, conclusion, and references.

2. Research status

At present, researchers have begun to apply visualization technology to the analysis of students’ abnormal behavior [7]. Visualization technology can transform the complex data of students’ behavior into a visual form, so that researchers and educators can observe and analyze the data intuitively, thus discovering potential abnormal behavior patterns and trends [8]. Niu et al. found that visualization technology could not only provide intuitive visual presentation but also provide interactive functions, enabling users to explore and analyze data in depth [9]. In the visualization research of students’ abnormal behavior, researchers also explored other methods and techniques [10]. For example, the visualization method based on a heat map by Ghadi et al. can display students’ behavior data in the spatial dimension and express the frequency or intensity of behavior through color changes [11]. To identify kids exhibiting anomalous behavior, the features and patterns of each cluster are analyzed to interpret the clustering results. This includes comparing clusters to established behavioral norms and analyzing the centroids, densities, and separations of the clusters. Any data points that show significant variations from the average behavior of most students in a cluster are considered anomalies because they may indicate abnormal behaviors. This method can help researchers identify students’ abnormal behaviors in specific areas or scenes. By offering a variety of viewpoints, evaluating robustness, examining parameter sensitivity, investigating ensemble approaches, and refining feature selection, investigating various clustering techniques improves the identification of abnormal behavior. This method increases detection accuracy and guarantees thorough insights.

In addition to the application of visualization technology, researchers are also committed to providing interactive interfaces and functions to enhance users’ ability to explore and analyze students’ abnormal behavior data [12, 13]. The use of visualization technologies has greatly improved behavior analysis by offering user-friendly means of examining large, complicated datasets and enabling a more in-depth understanding of behavioral patterns, trends, and anomalies in students. The main goals of the interactive interface design and visual analysis are to create an intuitive tool for analyzing abnormal behavior in students, improve user interaction and engagement, and support informed decision-making through the clear presentation of clustering results and the facilitation of meaningful behavior data exploration. Developing a user-friendly tool for visualizing student behavior data requires input from end users. In addition to assisting in determining user needs, it also assists with usability, data interpretation, feature customization, engagement, and coordination with educational objectives. Educators and administrators can be assisted in using student data to make well-informed decisions and enhance student outcomes by developers who include stakeholders in the design process. For example, the research of Nguyen et al. can provide a data screening function, which enables users to screen data according to specific conditions or interested student groups to analyze specific behavior patterns more accurately [14]. There is also a correlation analysis function. The research of Li et al. can help users find the correlation between students’ behaviors and further understand the causes and influencing factors of students’ abnormal behaviors [15].

To sum up, the visualization research of students’ abnormal behavior provides educators and researchers with a more intuitive, flexible, and operational analysis tool by applying visualization technology and interactive functions. These methods and tools can help to discover the patterns and trends of students’ abnormal behavior and provide the ability for in-depth data analysis and interactive exploration. This paper will further explore these problems and improve the visualization methods and tools to improve the accuracy and practicability of students’ abnormal behavior analysis.

3. Research method

3.1. Data acquisition and preprocessing

The research subjects of this paper were 300 college students in a university, all over 18 years old. The collected data include smart card data, school educational administration system data, library borrowing data, and shower data. In this paper, all the data of all the research objects are anonymized. Anonymization is an essential step in adhering to privacy standards when managing student data. To avoid identification, personally identifiable information must be deleted or encrypted. Pseudonymization, generalization, and masking are among the techniques. Ensuring adherence to regulations such as FERPA and GDPR safeguards student privacy and fosters confidence. While reducing privacy issues, anonymized data is still useful for analysis and decision-making. Consistent audits uphold data security and compliance. Smart cards, educational administration systems, library checkout, and shower data are some of the ways that information on students’ activities and behavior can be obtained. Data from smart cards monitors campus activities, while information from the administration, libraries, and showers provides insights into reading habits, behavior, and academic performance. Through well-informed decision-making and customized interventions, analysis of these data sources supports students’ academic progress and well-being. The reliability of insights gained is impacted, which influences educational decision-making, by the accuracy issues of traditional methods in analyzing deviant behavior in pupils. These difficulties include subjective judgment, subjectivity, low predictive capacity, underutilization of data, incomplete or skewed data, and the possibility of misunderstanding. To address these problems and enhance excellent student outcomes, evidence-based decision-making must be encouraged, advanced analytics must be used, and data quality must be improved. Based on big data technology [16], a total of 280,000 pieces of valid smart card consumption data and 130,000 pieces of network log data are obtained. Among them, the student consumption places included in the data of smart card consumption include printing centers, canteens, and supermarkets [17]. Smart card consumption is shown in Table 1.

Table 1.

Attribute of consumption data fields of smart card

Field attribute Attribute description
StudentID Unique identifier of the student
Date The date when the consumer transaction occurred
Time The time when the consumer transaction occurred
Location The places where consumer transactions take place (such as printing centers, canteens, supermarkets, etc.)
Amount Amount of consumer transactions
Category Types of consumer transactions (such as catering, school supplies, bathing, etc.)
Item Specific consumption items (such as printing and copying fees, lunch, stationery, etc.)

The list of field attributes of bathroom usage data is shown in Table 2.

Table 2.

Applicable data field attributes of bathroom

Field attribute Attribute description
StudentID Unique identifier of the student
Date The date of using the bathroom
StartTime The start time for using the bathroom
EndTime The end time of using the bathroom
Duration The length of time to use the bathroom
Building The building or building where the bathroom is located
Floor The floor where the bathroom is located
ShowerType Shower type (such as ordinary shower, massage shower, etc.)

The list of field attributes of library lending data is shown in Table 3.

Table 3.

Library borrowing data field attributes

Field attribute Attribute description
Student ID Unique identifier of the student
Date The date of borrowing books
BookTitle The title of the borrowed book
Author The author of the book
Category Classification of books (such as literature, science, history, etc.)
ReturnDate The date of return of books

In data preprocessing, to protect students’ privacy, this paper anonymizes the collected data, deletes or encrypts personal identity information, ensures the security and confidentiality of the data, and cleans the collected data, as shown in Fig. 1 [18, 19, 20].

Figure 1.

Figure 1.

Content of data cleaning.

In Fig. 1, data cleaning generally includes removing missing values, abnormal values, and duplicate records to ensure the accuracy and consistency of data. To fill in missing values in duplicate records, the best interpolation technique must take into account various criteria, including but not limited to data properties, temporal and spatial correlations, smoothness, treatment of outliers, computing complexity, and accuracy. It’s critical to choose a technique that yields accurate estimates, controls outliers, maintains smoothness, maintains relationships, and is computationally efficient. The consumption data of 300 students collected in this paper contains a “time period” attribute. Incomplete analysis, erroneous assessments, restricted predictive modeling, less personalization, biased decision-making, difficulties in program evaluation, and worries about data quality and integrity can result from missing student time-period data. When cleaning the data, it is found the following situations in the data set: Missing values: there are five students whose time periods are missing. Abnormal value: A student’s time period is -100, which deviates from the normal range. Duplicate record: There are two students whose time data are the same. Given the above situation, measures are taken to clean the data: missing value processing: deleting the records of five students in the missing period, or using appropriate interpolation method (median) to fill in the missing values. The straightforward, computationally efficient, and data trend-preserving nature of linear interpolation makes it the preferred method for completing missing values in the dataset. A straight line connecting known data points is drawn, and the value of each missing point along the line is calculated to estimate the missing values. For accurate and trustworthy data for further analysis, this approach is flexible, appropriate for a range of data kinds, and preserves the linearity of trends within the dataset. Abnormal value processing: treat the period of -100 as an abnormal value and replace it with a value within a reasonable range. Duplicate record processing: delete duplicate records to ensure that each student only appears once.

After combining domain knowledge and expert advice, the characteristic indicators determined in this paper are shown in Fig. 2 [21].

Figure 2.

Figure 2.

Characteristic indicators.

As shown in Fig. 2, ten characteristic indexes are determined in this paper. Among them, the entropy of activity place, the entropy of activity time, the size of personal social network, the frequency of personal social interaction, and the number of personal library loans are all indicators proposed in this paper. The calculation of the entropy value of the activity site is shown in Eq. (1) [22].

Hp=-i=1n(fiF)log(fiF) (1)

Hp represents the entropy index of the activity site. n represents the number of different locations. fi represents the frequency of the i-th location. F represents the total frequency of all locations.

The calculation of activity time entropy value is shown in Eq. (2) [23].

Ht=-i=1m(tiT)log(tiT) (2)

Ht represents the entropy index of the activity site. m represents the number of different time periods. ti represents the frequency of the i-th time period. T represents the total frequency of all time periods.

The calculation of personal social network size is shown in Eq. (3) [24].

𝑆𝑁=i=1n(FiT)×(Iimax(I1,I2,,In))×(Aimax(A1,A2,,An)) (3)

SN represents the size of the personal social network. n represents the number of connections or friendship relationships established by individuals. Fi represents the number of the i-th connection. T represents the total number of connections or friendship relationships in the whole network. It represents the comprehensive index of the interest factors involved in the i-th connection (calculated according to the interest information obtained by individuals from the data of the school educational administration system and library borrowing data). Ai represents the comprehensive index of the activity factors involved in the i-th connection (calculated according to the activity information obtained by the individual in the card data, shower data, etc.). max(I1,I2,,) It represents the maximum value of the comprehensive index of all connected interest factors. max(A1,A2,,An) represents the maximum value of the comprehensive index of all connected active factors.

The calculation of personal social frequency is shown in Eq. (4) [25].

F=i=1nCiTs×100 (4)

F stands for personal social frequency index. n represents the number of social interactions. Ci indicates the duration of the i-th social interaction. Ts represents the total observation time.

The calculation of the individual library borrowing quantity index is shown in Eq. (5) [26].

B=i=1n(DiTi)n (5)

B represents the index of the number of personal library loans. n represents the number of students. Di represents the effective library borrowing number of the i-th student. Ti represents the total number of semesters for the i-th student.

This paper also converts and standardizes the data. Specifically, it performs logarithmic conversion on the average usage times of student cards. The logarithmic conversion method is chosen for calculating the average usage times of student cards because it can effectively manage skewed distributions, normalize the data, stabilize variance, enhance interpretability, lessen sensitivity to extreme values, and conform to the assumptions of statistical models. Logarithmic conversion is mostly used to normalize skewed distributions, which improves interpretability and makes the data appropriate for statistical analysis. This is especially true when converting the average usage times of student cards.

The transaction amount, peak consumption, normal consumption times, activity place entropy value, and activity time entropy value of the student card are normalized [27].

The commonly used normalization methods, such as the z-score normalization method, are calculated as shown in Eq. (6) [28].

z=x-μσ (6)

x is the original data. μ is the average of the original data. σ is the standard deviation of the original data. z is the normalized data.

Through proper data conversion and standardization of feature indexes, the scale differences between different features can be eliminated and their proper application in the clustering algorithm can be ensured. Filtering for specific criteria, zooming for in-depth examination, correlation analysis for identifying relationships, drill-down for deeper exploration, highlighting for focusing on key insights, and annotation for adding context and insights to visualizations are some of the interactive features designed to help users explore and understand clustered data. Meanwhile, standardization can also reduce the influence of outliers on clustering results and improve the accuracy and stability of clustering.

3.2. Visualization research method of abnormal behavior

While studying the visualization methods of students’ abnormal behaviors, it is necessary to consider choosing suitable visualization methods and follow some design principles to ensure that the visualization results are intuitive, easy to explain, comparable, and interactive. The factors to be considered in the selection of visualization methods include data type and analysis target. The design principle includes four points, as shown in Fig. 3 [29].

Figure 3.

Figure 3.

Visual design principles.

In Fig. 3, the visualization principles of students’ abnormal behavior include subjectivity, interpretability, comparability, and interactivity. Efficiency, scalability, handling non-linear relationships, robust outlier detection, interpretability, flexibility, validation metrics, support for visualization, and resilience to noise and missing data are important considerations when choosing a clustering algorithm for abnormal behavior visualization. This guarantees that the selected technique, which facilitates interpretation and decision-making, successfully recognizes and depicts aberrant behavior patterns in sizable and intricate datasets.

Data visualization technology and tools play an important role in the visualization research of students’ abnormal behavior. Technology for data visualization is essential for studying abnormal student behavior. It aids with pattern recognition, anomaly detection, and context comprehension. While guaranteeing ethical considerations, visualization supports interactive exploration, communication, teamwork, and predictive analytics. When developing and deploying data visualization tools for the study of anomalous behavior, user feedback is crucial. It facilitates the identification of user requirements, usability enhancement, data interpretation improvement, feature customization, stakeholder participation, and iterative improvement. By choosing appropriate charts and graphs, such as histograms, line charts, and scatter charts, and using visual means such as color, size, and shape coding, the differences, trends, and relevance of students’ behavior data can be effectively presented. In addition, using visualization tools such as Tableau, D3.js, and Power Business Intelligence (BI), researchers can flexibly create interactive visualization to help them better understand and analyze students’ abnormal behaviors and provide more accurate decision support for educators [30]. Visualization tools like Tableau, D3.js, and Power BI offer a range of features like customizable dashboards, interactive data exploration, real-time updates, and seamless integration with various data sources, which add to the flexibility and interactivity of creating visualizations for understanding abnormal behaviors in students. These tools make it simple for users to handle and display complicated statistics, which helps administrators, researchers, and educators better understand student behavior trends, spot anomalies, and make data-driven decisions that promote academic success and well-being. Due to their limited customization options, rigid visualization types, poor performance with large datasets, complex data integration, lack of support for advanced analytics, and lack of collaboration features, existing visualization tools make it difficult to understand and extract insights from student behavior data. More powerful solutions that provide more performance, customization possibilities, interaction, flexibility, support for advanced analytics, and collaborative features are needed to overcome these restrictions.

Visual interface design is an important part of research methods. First, through user demand analysis, it investigates and analyzes users’ needs and expectations for abnormal behavior visualization and understands users’ usage scenarios and usage goals. By facilitating early intervention, personalized learning, promoting equity and inclusion, cultivating a positive school climate, guiding data-driven decisions, preventing at-risk behaviors, and preparing students for college and career readiness, analyzing abnormal behavior in students is in line with larger educational goals. Through these initiatives, learning environments that are inclusive, safe, and supportive can be established where every student can succeed. Secondly, according to the user’s needs, it designs the interaction mode and interaction design principles, including clicking, dragging, zooming, and other operations, as well as the design of user feedback and navigation, such as information prompts and status indication [31]. Finally, the visual interface layout and navigation design are carried out, the position and size of the chart are determined, and the appropriate information organization and presentation mode are selected. Usability testing, information architecture, wireframing, visual design, responsive design, accessibility concerns, and visual interface layout and navigation design are all involved in the process of creating an interface for researchers and educators. This procedure guarantees accessibility, simplicity of use across devices, clarity, and ease of navigation, all of which improve user pleasure and experience. The navigation functions, such as zooming in and out, panning, and screening, are designed to provide a user-friendly interface and operation experience.

3.3. Discovery of students’ abnormal behavior based on clustering algorithm

The clustering algorithm is a commonly used data analysis technology, which is used to cluster similar data samples into the same group It aids in pattern recognition, data summarization, anomaly detection, preprocessing for feature engineering, decision-making support, and data exploration through visualization, and enhances computational efficiency by reducing complexity. These advantages make clustering indispensable for extracting insights from diverse datasets. In this paper, the K-means clustering algorithm is chosen as the main method to analyze students’ abnormal behavior. The K-means method aids in anomaly extraction by grouping data points according to similarity, making anomalous behavior stand out as outliers. K-means find centroids and map data points to the closest centroid, which makes anomalies seem further from the clusters and facilitates their detection. The principle of the K-means clustering algorithm is as follows: Assuming that there are n data samples, each sample is represented by a D-dimensional feature vector, and the K-means clustering algorithm divides these samples into k clusters, where k is a user-specified parameter. Accurate analysis of the relationships between variables is made possible by the D-dimensional feature vector, which makes it possible to describe complicated data patterns in high-dimensional environments. It makes it easier to spot patterns, clusters, and anomalies in datasets and makes it possible to create complex machine-learning models that aid in better decision-making. The goal of the algorithm is to minimize the sum of squares of the distances between all data samples and the center of the cluster to which they belong. The specific steps are shown in Table 4 [32].

Table 4.

K-means clustering algorithm steps

Step number Specific steps
1 K initial cluster centers can be selected randomly or by other heuristic methods.
2 For each data sample, the distance between it and the center of each cluster is calculated, and the sample is assigned to the nearest cluster.
3 Update the center of each cluster as the average of all samples in the cluster.
4 Repeat steps 2 and 3 until the cluster center no longer changes significantly or reaches a predetermined number of iterations.

The mathematical expression of the K-means algorithm and the equations involved in the optimization problem are shown in Eqs (7)–(9). The equation for measuring the distance between the cluster center and the sample using Euclidean distance is shown in Eq. (7) [33].

||xi-μj||=k(xik-μjk)2 (7)

xik and μjk respectively represent the values of sample xi and cluster center μj on the k-th feature.

The optimization goal of the K-means algorithm is to minimize the sum of squares of the distances between all data samples and the center of the cluster to which they belong, which is expressed as Eq. (8).

minimizeij||xi-μj||2 (8)

i stands for sample index and j stands for cluster index.

The equation for updating the cluster center is shown in Eq. (9).

μj=1|Cj|i=1|Cj|xi (9)

|Cj| indicates the number of samples in the cluster Cj.

In anomaly extraction, firstly, the K-means algorithm is combined to cluster the abnormal behaviors. The primary conclusions of the study offer quantitative proof from clustering results and anomaly detection rates in addition to qualitative insights into abnormal behavior patterns, such as irregular attendance. For focused intervention techniques, these combined results provide a thorough insight into the anomalous conduct of students.

ROCF data is essential for K-means anomaly identification since it measures the relevance of outliers within clusters. Contextualizing outliers assists in making decisions, prioritizes anomaly detection efforts, and enhances overall detection performance by differentiating real abnormalities from noise. Secondly, the relative anomaly is judged, and the degree of unsupervised clustering anomaly is judged by combining the relative outlier cluster factor (ROCF), the calculation of which is shown in Eq. (10).

ROCF(Qi)=1-e-𝑇𝐿(Qi)|Qi|=1-e-|Qi+1||Qi|2 (10)

Q is a set of categories. TL is the relative rate of change.

On this basis, this paper introduces the Local Outlier Factor (LOF) algorithm to detect individuals of abnormal categories, and the calculation process of this algorithm is shown in Eq. (11). The Local Outlier Factor (LOF) considers numerous factors for detecting abnormal individuals within datasets. These include neighborhood density, relative density ratio, k-nearest neighbor distance, local reachability distance, cumulative density distribution, data distribution characteristics, and feature scaling and normalization.

LOF(p)=qN(p)𝐿𝑅𝐷(q)𝐿𝑅𝐷(p)|N(p)| (11)

𝐿𝑅𝐷(q) represents the local reachable density of domain point q. One of the data points is p, and its neighborhood is N(p).

The data of 100 students, including all kinds of network data in one year, are selected for experimental verification. In this paper, the Sum of Squared Errors (SSE) is selected to quantify the clustering index of the K-means algorithm. Equation (12) [34] shows the calculation of SSE.

𝑆𝑆𝐸=ΣiΣj||xi-μj||2 (12)

i represents the index of data points. j represents the index of the cluster center. x represents the i-th data point. μj represents the j-th cluster center, and ||xi-μj|| represents the Euclidean distance between data point xi and cluster center μj.

The sum of Squares Between (SSB) is introduced to quantify the coupling among classes, and the calculation process is shown in Eq. (13).

SSB=(Ni×||μi-μ||2) (13)

Ni represents the number of samples in the i-th category. μi represents the centroid (cluster center) of the i-th category. μ represents the overall centroid (cluster center) of all samples. ||μi-μ||2 represents the square of the Euclidean distance between the centroid of the i-th category and the whole centroid.

At last, this paper analyzes the optimal student association scheme based on modularity and introduces the similarity operator based on moving mode (FeaSim), similarity operator based on spatio-temporal fusion (SpaSim), and similarity operator based on characteristic law (ActSim) to measure the correlation between students and individual students [35].

4. Results

4.1. Cluster result analysis

In this paper, different K-means clustering numbers are set for multiple experiments, and the results are summarized. The results are shown in Table 5. The data in Table 5 are visualized, and the results when k= 3, 4, 5, 6, 7, 8, 9 and 10 are shown in Fig. 4. Figure 5 shows the results when k= 11, 12, 13, 14, 15, 16 and 17.

Table 5.

Summary results of cohesion index

K value SSE × 109 SSB × 109
3 0.5 5.3
4 2.6 3.1
5 1.8 4.0
6 2.1 3.8
7 1.0 4.7
8 3.6 2.0
9 1.4 4.5
10 0.2 5.5
11 3.5 2.2
12 3.8 2.0
13 1.1 4.8
14 1.0 4.9
15 2.4 3.4
16 1.4 4.4
17 2.7 3.1

Figure 4.

Figure 4.

Aggregation index summary at k= 3, 4, 5, 6, 7, 8, 9 and 10.

Figure 5.

Figure 5.

Aggregation of aggregation indices at k= 11, 12, 13, 14, 15, 16 and 17.

As shown in Figs 4 and 5, when the value of k is 3, SSE is the smallest, which is 0.5 × 109, which means that the compactness of clustering results is the highest under this number of clusters. When the k value is 8, the SSB is the largest, which is 3.6 × 109, indicating that the difference between clusters is the largest under this number of clusters. To understand students’ abnormal behavior patterns more comprehensively, this paper finally chooses k= 10 for verification analysis, and k= 9, k= 11, and k= 12 for abnormal class analysis. On this basis, the results of relative anomalies of various categories are shown in Table 6. The histogram representation is shown in Fig. 6.

Table 6.

Results of the relative anomaly of various categories

N k= 9 k= 10 k= 11 k= 12
0 0 0 0 0
1 0.040 0.040 0.060 0.040
2 0.008 0.008 0.008 0.009
3 0.004 0.004 0.006 0.006
4 0.001 0.001 0.008 0.015
5 0.002 0.002 0.002 0.008
6 0 0 0.002 0.001
7 0 0 0 0.002
8 0 0 0 0
9 0 0 0 0
10 0 0 0 0
11 0 0 0 0

Figure 6.

Figure 6.

Comparison of relative anomalies of various categories.

In Fig. 6, the relative anomalies of categories 0, 8, 9, and 10 are 0 under all the clustering numbers, which means that the data points in these categories are not marked as anomalies in the corresponding clustering results. Overfitting or underfitting of the data is caused by fluctuations in the fraction of relative anomalies, which are caused by differences in cluster number selection, cluster density, and separation. Furthermore, these variations are caused by cluster border effects as well as intrinsic dataset properties like dimensionality and noise level. Effective anomaly detection requires giving careful thought to these variables. For other categories, the proportion of relative anomalies may change under different cluster numbers. For example, for category 1, under k= 9, k= 10, and k= 11, the relative anomaly amount is 0.040, and when k= 12, it drops slightly to 0.04. Categories 2 and 3 also show a similar trend. For example, the relative anomaly of category 4 is 0.015 at k= 12, but it is lower at other cluster numbers. It shows that there have been some changes in category 3- category 6, so when k= 0, the specific analysis with the label of 0 or 6 is carried out, and the results are shown in Table 7.

Table 7.

Summary of feature mean values of each category

La 0 1 2 3 4 5 6 7 8 9
Avg Carduse 129 .9 163 .5 183 .4 98 .58 192 .3 29 .8 128 .0 78 .1 214 .7 123 .8
Trans Money 1,949 .6 2,153 .1 2,695 .2 1,489 .6 2,405 .0 476 .4 1,782 .6 1,197 .5 4,885 .7 1,865 .8
Space Entropy 792 .6 27 .8 38 .5 103 .4 29 .5 7 .4 25 .3 16 .4 5,565 .9 23 .5
Time Entropy 20 .6 13 .9 22 .1 1,008 .3 12 .2 4 .1 14 .5 9 .2 12 .1 12 .8
Percount 60 .4 78 .6 90 .6 38 .8 91 .5 11 .4 65 .5 37 .1 87 .0 62 .0
Hot Fre 137 .0 184 .0 205 .2 92 .6 215 .8 29 .8 149 .7 84 .0 219 .0 138 .1
Social length 13 .2 157 .2 16 .9 5 .4 420 .5 2 .0 31 .0 5 .6 19 .0 11 .7
Social avgfre 0 .7 8 .8 0 .9 0 .2 21 .3 0 .1 1 .6 0 .4 1 .0 0 .7
Borrowing Number 1 .1 3 .4 1 .2 4 .4 5 .3 0 .5 26 .9 0 .9 11 .0 1 .2

It can be observed from Table 7 that there are some differences in characteristics between Category 0 and Category 6. The eighth category has higher values in the case of fewer individuals, so it can be considered as an abnormal individual. The number of individuals in the third category is small and the value of campus activities is low, so it is considered to be an abnormal individual. The students in the fifth and seventh categories have a small average, so they may be considered as individuals with abnormal behavior. The characteristic values of other categories of students are within the normal range.

4.2. Result analysis of similarity operator

The results of the simple group of similarity operators are shown in Fig. 7. In the student code in Fig. 7, the first two digits indicate the level of undergraduate students, the third and fourth digits indicate the college code, the fifth and sixth digits indicate the class code, and the last two letters indicate specific personal information. FeaSim’s result is between 0.12 and 0.45, which indicates similarity calculation based on moving patterns. The result of SpaSim is between 0.69 and 0.94, which indicates the similarity calculation of spatio-temporal fusion. The result of ActSim is between 0.78 and 0.95, which indicates the similarity calculation of feature law. The partial results of the weighted group are shown in Fig. 8.

Figure 7.

Figure 7.

Simple group results of similarity operators.

Figure 8.

Figure 8.

Weighted group results of similarity operators.

In the result of Fig. 8, SpaActSim represents a weighted group of spatio-temporal and feature similarity, and the result is between 0.59 and 0.79. SpaFeaSim represents a weighted group based on similarity between time and space and movement patterns, and the result is between 0.64 and 0.88. ActFeaSim represents the weighted group based on the similarity of feature rules and moving patterns, and the result is between 0.62 and 0.82. Based on the results of Figs 7 and 8, the ActSim operator is selected as the energy individual similarity operator because it has the most associated individuals. In this paper, based on the ActSim operator, the relationship among students is calculated, and 20 students with high correlation are selected to construct the relationship network, and the results meet the requirements.

5. Conclusion

This paper realizes the visual analysis of students’ abnormal behavior by introducing a clustering algorithm and provides a user-friendly operating experience basis for the in-depth design of the later interactive interface. Experiments show that this method can accurately detect and visualize students’ abnormal behavior and lay a technical support for providing intuitive analysis results. This paper makes full use of the advantages of big data to understand students’ behavior patterns more comprehensively and provides new solutions for students’ management and behavior analysis in the field of education. This paper uses the K-means algorithm in the selection of the clustering algorithm. Although it has achieved certain results in the experiment, it can also explore the application of other clustering algorithms to further improve the accuracy and visualization effect of abnormal behavior detection. Secondly, this paper uses predefined feature sets in feature extraction and representation, which may limit the comprehensive understanding of students’ abnormal behavior. Future research can explore more flexible feature extraction methods, such as deep learning technology, to mine more potential abnormal behavior patterns. In addition, this paper is only based on big data for empirical evaluation, and future research can further carry out field research and questionnaire surveys to obtain more qualitative and quantitative data about students’ abnormal behavior to understand and analyze students’ behavior patterns more comprehensively.

Funding

This work was supported by the 2020 Natural Science Research Project of Anhui Educational Committee: Design and Implementation of a Smart Campus Visualization Platform Based on Data Mining (KJ2019A1109).

Data availability statement

No datasets were generated or analyzed during the current study.

Author contributions

All authors contributed to the design and methodology of this study, the assessment of the outcomes, and the writing of the manuscript.

Conflict of interest

The authors do not have any conflicts of interest to report.

References

  • [1]. Kui X, Liu N, Liu Q, Liu J, Zeng X, Zhang C. A survey of visual analytics techniques for online education. Visual Informatics. 2022; 2022: 122. [Google Scholar]
  • [2]. Guo Y, Guo S, Jin Z, Kaul S, Gotz D, Cao N. Survey on visual analysis of event sequence data. IEEE Transactions on Visualization and Computer Graphics. 2021; 28(12): 5091-5112. [DOI] [PubMed] [Google Scholar]
  • [3]. Mubarak AA, Cao H, Zhang W, Zhang W. Visual analytics of video-clickstream data and prediction of learners’ performance using deep learning models in MOOCs’ courses. Computer Applications in Engineering Education. 2021; 29(4): 710-732. [Google Scholar]
  • [4]. Mansoor H, Gerych W, Alajaji A, Buquicchio L, Chandrasekaran K, Agu E, Rundensteiner E. ARGUS: Interactive visual analysis of disruptions in smartphone-detected Bio-Behavioral Rhythms. Visual Informatics. 2021; 5(3): 39-53. [Google Scholar]
  • [5]. Rosar M, Weidlich J. Creative students in self-paced online learning environments: An experimental exploration of the interaction of visual design and creativity. Research and Practice in Technology Enhanced Learning. 2022; 17(1): 8. [Google Scholar]
  • [6]. Zhang H, Dong J, Lv C, Lin Y, Bai J. Visual analytics of potential dropout behavior patterns in online learning based on counterfactual explanation. Journal of Visualization. 2022; 24: 1-19. [Google Scholar]
  • [7]. Guo H, Zou S, Xu Y, Yang H, Wang J, Zhang H, Chen W. DanceVis: toward a better understanding of online cheer and dance training. Journal of Visualization. 2022; 25(1): 159-174. [Google Scholar]
  • [8]. Abreu FH, Soares A, Paulovich FV, Matwin S. A trajectory scoring tool for local anomaly detection in maritime traffic using visual analytics. ISPRS International Journal of Geo-Information. 2021; 10(6): 412. [Google Scholar]
  • [9]. Niu Z, Wu J, Liu X, Huang L, Nielsen PS. Understanding energy demand behaviors through spatio-temporal smart meter data analysis. Energy. 2021; 226: 120493. [Google Scholar]
  • [10]. Himeur Y, Ghanem K, Alsalemi A, Bensaali F, Amira A. Artificial intelligence based anomaly detection of energy consumption in buildings: A review, current trends, and new perspectives. Applied Energy. 2021; 287: 116601. [Google Scholar]
  • [11]. Ghadi YY, Akhter I, Aljuaid H, Gochoo M, Alsuhibany SA, Jalal A, Park J. Extrinsic Behavior Prediction of Pedestrians via Maximum Entropy Markov Model and Graph-Based Features Mining. Applied Sciences. 2022; 12(12): 5985. [Google Scholar]
  • [12]. Novoseltseva D, Lelardeux CP, Jessel N. Examining Students’ Behavior in a Digital Simulation Game for Nurse Training. International Journal of Serious Games. 2022; 9(4): 3-24. [Google Scholar]
  • [13]. Pareek P, Thakkar A. A survey on video-based human action recognition: recent updates, datasets, challenges, and applications. Artificial Intelligence Review. 2021; 54: 2259-2322. [Google Scholar]
  • [14]. Nguyen THD, El-Nasr MS, Canossa A. Glyph: A visualization tool for understanding problem-solving strategies in puzzle games. arXiv preprint arXiv2106.13742, 2021. [Google Scholar]
  • [15]. Li Q, Kumar P, Alazab M. IoT-assisted physical education training network virtualization and resource management using a deep reinforcement learning system. Complex & Intelligent Systems. 2022; 8: 1-14. [Google Scholar]
  • [16]. Avola D, Cascio M, Cinque L, Foresti GL, Pannone D. Machine learning for video event recognition. Integrated Computer-Aided Engineering. 2021; 28(3): 309-332. [Google Scholar]
  • [17]. Abdullah F, Ghadi YY, Gochoo M, Jalal A, Kim K. Multi-person tracking and crowd behavior detection via particle gradient motion descriptor and improved entropy classifier. Entropy. 2021; 23(5): 628. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [18]. Radwan TM, Al Abachy S, Al-Araji AS. A One-Decade Survey of Detection Methods of Student Cheating in Exams (Features and Solutions). Journal of Optoelectronics Laser. 2022; 41(4): 355-366. [Google Scholar]
  • [19]. Kong SC, Wang YQ. Item response analysis of computational thinking practices: Test characteristics and students’ learning abilities in visual programming contexts. Computers in Human Behavior. 2021; 122: 106836. [Google Scholar]
  • [20]. Alhalabi W, Jussila J, Jambi K, Visvizi A, Qureshi H, Lytras M, Adham RS. Social mining for terroristic behavior detection through Arabic tweets characterization. Future Generation Computer Systems. 2021; 116: 132-144. [Google Scholar]
  • [21]. Joshva Devadas T. A survey on agent learning architecture that adopts the Internet of things and wireless sensor networks. International Journal of Wavelets, Multiresolution and Information Processing. 2022; 20(2): 2030002. [Google Scholar]
  • [22]. Meuschke M, Preim B, Lawonn K. Aneulysis – A system for the visual analysis of aneurysm data. Computers & Graphics. 2021; 98: 197-209. [Google Scholar]
  • [23]. Malhotra M, Chhabra I. Student Invigilation Detection Using Deep Learning and Machine After COVID-19: A Review on Taxonomy and Future Challenges. Future of Organizations and Work After the 4th Industrial Revolution: The Role of Artificial Intelligence, Big Data, Automation, and Robotics. 2022; 311-326. [Google Scholar]
  • [24]. Bobek S, Kuk M, Brzegowski J, Brzychczy E, Nalepa GJ. KnAC: an approach for enhancing cluster analysis with background knowledge and explanations. Applied Intelligence. 2022; 13: 1-24. [Google Scholar]
  • [25]. Samani H, Yang CY, Li C, Chung CL, Li S. Anomaly detection with vision-based deep learning for epidemic prevention and control. Journal of Computational Design and Engineering. 2022; 9(1): 187-200. [Google Scholar]
  • [26]. Hussein F, Al-Ahmad A, El-Salhi S, Alshdaifat EA, Al-Hami MT. Advances in Contextual Action Recognition: Automatic Cheating Detection Using Machine Learning Techniques. Data. 2022; 7(9): 122. [Google Scholar]
  • [27]. Bai J, Zhang H, Qu D, Lv C, Shao W. FGVis: visual analytics of human mobility patterns and urban areas based on F-GloVe. Journal of Visualization. 2021; 24: 1319-1335. [Google Scholar]
  • [28]. Rao S, Verma AK, Bhatia T. A review on social spam detection: Challenges, open issues, and future directions. Expert Systems with Applications. 2021; 186: 115742. [Google Scholar]
  • [29]. Ma H, Zhang Z, Li W, Lu S. Unsupervised human activity representation learning with multi-task deep clustering. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies. 2021; 5(1): 1-25. [Google Scholar]
  • [30]. Goundar S, Deb A, Lal G, Naseem M. Using online student interactions to predict performance in a first-year computing science course. Technology, Pedagogy and Education. 2022; 31(4): 451-469. [Google Scholar]
  • [31]. Danaditya A, Ng LHX, Carley KM. From curious hashtags to polarized effect: profiling coordinated actions in Indonesian Twitter discourse. Social Network Analysis and Mining. 2022; 12(1): 105. [Google Scholar]
  • [32]. Zhou Y, Zhao J, Zhang J. Prediction of learners’ dropout in E-learning based on the unusual behaviors. Interactive Learning Environments. 2023; 31(3): 1796-1820. [Google Scholar]
  • [33]. Benabbes K, Housni K, Hmedna B, Zellou A, Mezouary AE. Explore the influence of contextual characteristics on the learning understanding of LMS. Education and Information Technologies. 2023; 22: 1-39. [Google Scholar]
  • [34]. Chang SC, Chang KL. Cheating Detection of Test Collusion: A Study on Machine Learning Techniques and Feature Representation. Educational Measurement: Issues and Practice. 2023; 23: 118. [Google Scholar]
  • [35]. Wang Y, Liu J, Liu RW, Liu Y, Yuan Z. Data-driven methods for detection of abnormal ship behavior: Progress and trends. Ocean Engineering. 2023; 271: 113673. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

No datasets were generated or analyzed during the current study.


Articles from Technology and Health Care are provided here courtesy of IOS Press

RESOURCES