Student Behavior Analysis using YOLOv5 and OpenPose in Smart Classroom Environment

Xiang Li; Yucheng Ji; Jiayi Yang; Mingyong Li

. 2025 May 22;2024:674–683.

Student Behavior Analysis using YOLOv5 and OpenPose in Smart Classroom Environment

Xiang Li ¹, Yucheng Ji ¹, Jiayi Yang ¹, Mingyong Li ^1*

PMCID: PMC12099366 PMID: 40417576

Abstract

In the classroom, artificial intelligence techniques help automate student behavior analysis, and teachers are able to understand students’ class status more effectively. We developed an intelligent method for classroom behavior analysis by building a CQStu datasets and annotating 6,687 images through active learning. OpenPose was used to detect the key points of the student’s body, and the key points of the key parts of the body were utilized to generate representative points of the student, and the idea of coordinates was used to assign the student’s position. Using YOLOV5 to recognize students’ classroom behaviors and count the number of times, our experimental results show that the average classroom behavior recognition accuracy is 84.23%, and the overall location accuracy is about 79.6%. In addition, we introduced a nonlinear weighting factor to evaluate the effectiveness of teaching and constructed corresponding classroom behavior weights based on different classroom scenarios. A method for student classroom behavior identification and analysis is provided, and a framework for future intelligent classroom teaching evaluation methods is established, providing objective data support for student performance analysis.

Introduction

With the rapid development of the field of deep learning, the classroom as a medium of educational communication has naturally attracted the attention of many scholars. The traditional classroom relies on the teacher’s observation of each student’s level of classroom engagement, which is susceptible to individual subjective factors and may not accurately capture the level of engagement of each student. Cheng jun et al. ¹ proposed to combine “human-computer collaboration and data integration” to explore the innovative path of high-quality classroom assessment in the new era. Therefore, it is necessary to establish a Classroom behavior recognition method that combines subjective and objective factors.

The age of intelligent information has accelerated the trend of integrating intelligent technologies into all walks of life, and the education industry is no exception. The classroom, as the main battleground of schooling, encompasses the entire teaching and learning process, and has been the focus of educational assessment research². To achieve efficient and high quality classroom teaching, students and teachers need to work closely together. For students, it is necessary to achieve such things as concentration and active participation in classroom activities. For teachers, it is necessary to pay attention to the students’ classroom status³. Nowadays, students’ attention is often shown in Looking up rate, and classroom activities are shown in various forms, including peer discussion, Raising hand to answer questions, standing up to answer questions and so on. Therefore, we focus on studying and analyzing students’ common classroom behaviors: “Raising hand, Looking down, Looking up, Standing up, and Turning around.”

At the same time, students’ classroom behavior is an important component of classroom quality. In the past, teachers usually relied on personal recollections to determine students’ classroom performance, while parents who wanted to understand their children’s learning could only rely on communication with their children or teachers to get it. These methods are based on subjective impressions and lack the support of objective data. Therefore, this paper presents a method for analyzing students’ classroom behavior by statistically and analytically examining Classroom behavior recognition from the perspective of classroom attention and classroom activity. By processing, recognizing and analyzing the data collected in the smart classroom, objective data on students’ Classroom behavior is obtained, which provides data support for classroom quality and promotes the improvement of classroom teaching quality.

After attending this course, learners should have a deeper understanding of the field of student Classroom behavior recognition, specifically:

We constructed a CQStu datasets of students’ classroom behaviors in an active learning approach, including five common classroom behaviors (Raising hand, Looking down, Looking up, Turning around, and Standing) containing a total of 6,687 images. Learners can adopt an active learning approach when constructing the datasets in their own practice.
We propose a method for analyzing student classroom behavior in a smart classroom environment. By introducing coordinates, each student is assigned a unique number, and the classroom behavior of students during the classroom period is counted and analyzed. Learners can use this method to conduct behavioral analysis studies on classroom students.
We propose an approach to classroom evaluation that introduces nonlinear weighting coefficients that learners can apply to their own practice, even if it is not in the education industry.

Related Work

Over the past few years, extensive research has been conducted in the education and computer industries on student Classroom behavior recognition^4-6. Unlike traditional target detection and gesture estimation, counting and analyzing the classroom behavior of each student in a lesson requires continuous recognition and a certain degree of accuracy. This section reviews recent advances in Classroom behavior recognition.

Target Detection in Student Classrooms

Earlier applications of target detection in classroom teaching usually relied on the grayscale values of the image pixels and utilized manually set features to achieve detection results. In recent years, the field of target detection has evolved rapidly with the continuous advancement of Convolutional Neural Networks (CNNs). M Sajjad et al.⁷ used gradient histogram features and support vector machine classifiers for facial recognition. Some systems use correlation methods to visualize and analyze student behavior, which is crucial to determine student performance in the classroom⁸. Several intelligent methods have been developed based on face detection⁹, body gaze direction¹⁰ , student head movement¹¹ and facial expression recognition methods¹². Goto, Masaaki and Tanaka¹³ analyzed the relationship between head and blinking. However, this method requires a large amount of data and computational power. Lina Li, Yupeng Li et al.¹⁴ enhanced the clarity of classroom images using ESRGAN (Enhanced Super Resolution Generative Adversarial Network) and improved the classroom recognition accuracy by using an enhanced YOLOv5s model with a small object detection module. Fan Yang and Tao Wang¹⁵ constructed and tested a classroom behavior datasets of SCB students. Huayi Zhou and Fei Jiang¹⁶ proposed a Status system that can observe student classroom behavior videos in a timely manner.

Behavior Recognition in Student Classrooms

Human body pose estimation in classrooms, due to scene and subject complexity, has received widespread attention in recent years. ^17,18 Cao et al. ¹⁹ proposed a real-time method (OpenPose) to detect 2D poses of multiple people in images. This method uses a nonparametric representation called the Part Affinity Field (PAF) to learn the connections between body parts and individuals in the image. Dai et al. ²⁰ proposed a relationship based skeleton graph network (RsgNet) for multi-person pose recognition in crowded places, achieving precise posture estimation in joint reasoning. Joze,Shaban et al.²¹ used a 3D backbone to extract RGB, then applied two types of constraints—classification consistency and spatiotemporal consistency, where temporal consistency consists of time consistency and gradient smoothness. These methods leverage the temporal continuity of actions in videos.

While skeleton coordinates have the advantage of being independent of the background, they lack important background cues. Especially in classroom scenarios with significant occlusion, relying solely on body keypoints to determine students’ classroom behavioral judgments may not be accurate enough. Inspired by Gueter et al.²² who used the OpenPose framework to extract keypoints from images for human activity recognition, we use the OpenPose framework to extract body keypoints and generate representative points for each person based on the keypoints in order to assign numbers to each student.

Teaching Effectiveness Evaluation

In the field of education, the evaluation of teaching effectiveness²³ is a crucial component of teaching improvement and student performance analysis. Traditional evaluation methods often rely on subjective judgment or simple counting methods, making it challenging to comprehensively and accurately assess students’ behavior in the classroom. In computer vision research in classrooms, by observing changes in student behavior and expressions, understanding their psychological state, and providing teachers with more information about students’ learning processes—such as focus on key points, understanding, and interest—computer vision can assist in teaching adjustments to improve teaching quality. In this regard, some scholars and research teams have explored the potential applications of computer vision in education and made certain progress: Cai Hongmei et al. ²⁴ designed a university classroom teaching quality evaluation index system through questionnaire surveys, introducing classroom activity levels into the indicators of teaching effectiveness. Ruiyao Zhang²⁵ used an RBF neural network model to assess the effectiveness of digital media teaching methods. Tongqing Yuan²⁶ proposed an improved Markov Chain-based teaching quality evaluation model, applied it to a university blended teaching quality evaluation system, and designed a corresponding blended teaching quality model. Wenyan Feng and Fan Feng²⁷ established a multi-modal digital teaching quality data evaluation model based on a fuzzy BP neural network, optimizing the initial weights and thresholds of multi-modal digital teaching.Li Shan et al. ²⁸ Established a new facial expression recognition method and summarized relevant existing methods.RR Adyapady et al.²⁹ A comprehensive review of the latest developments in facial expression recognition algorithms was conducted, and their advantages and limitations were discussed.

These studies show that the use of computer vision for teaching effectiveness evaluation is a highly emphasized area in education. This technology is expected to provide more comprehensive and accurate information to support the teaching and learning process and ultimately improve the quality of teaching and learning

Proposed Methodology

The proposed method for student behavior analysis in the smart classroom environment using YOLOv5 and OpenPose is illustrated in Figure 1. It comprises three main stages. Firstly, the learning behaviors of students in the classroom are captured by a high-definition dual RGB camera. The recorded footage is saved locally and then uploaded to the school’s information center. Secondly, we generate representative points for each student to determine their position information. The front view is transformed into a bird’s-eye view for an overall visual inspection of student classroom behaviors. Thirdly, the recorded video is processed frame by frame to identify and statistically analyze students’ learning behaviors during the class.

System Overview

The system for student behavior analysis in the smart classroom environment using YOLOv5 and OpenPose includes five aspects: camera data collection, model training, model execution, result analysis, and generation of classroom quality scores. Utilizing a high-definition dual RGB camera recommended by iFlytek in the smart classroom at Chongqing Normal University, the system has two main advantages. Firstly, data collected in the smart classroom environment is more effective, leading to more accurate analysis results. Secondly, in today’s rapidly advancing technology, a camera with a resolution of 1920×1080, priced at approximately $330, can meet the requirements of this research. To ensure timely data processing, we equipped the system with a storage array and GPU. Regarding model computation, a single 3090 GPU is sufficient for the operations

Data Collection

The CQStu dataset we built relies on real instructional video data managed by the Department of Collective Information Science at Chongqing Normal University. Prior to conducting the study, we signed an experimental participation agreement with the many students who participated in this study to ensure their right to know. The data collected by the high-definition dual RGB cameras in the smart classrooms were used strictly for scientific research purposes.

From Figure 2, we can see the real teaching scene and the number of students attending classes. We can see that this is a relatively crowded classroom, which poses a certain challenge to traditional classroom behavior recognition. We use high-definition dual RGB cameras to ensure the collection of higher quality images. A total of 30 types of videos were collected, with an average duration of 45 minutes and a frame rate of 30fps for each type of video. For the collected videos, the first step is to preprocess them using OpenCV to extract frames from the recorded videos. The frame extraction interval is set to 15fps to better match the classroom behavior of students. In addition, we used abelimg annotation software to add a new classroom behavior, turn, to the existing four classroom behaviors (head up, head down, hand up, and stand) in the CQStu student classroom behavior dataset.

The annotated results of behaviors (x, y, w, h) and categories were stored in txt files. To ensure the accuracy and precision of the annotation results, scholars in the field of object detection were engaged as annotators. Some annotation examples are shown in Figure 2.

Position Allocation and Behavior Recognition

Due to the specific nature of classroom teaching, students generally do not change their positions within a single class. Therefore, before recognizing student classroom behaviors, this study needs to perform skeleton detection on students in the classroom. By utilizing skeleton data to generate representative points for students, the recognition of student classroom behaviors can then be carried out.

As students enter the classroom and take their seats, traditional classroom behavior recognition needs to know where they are. Therefore, this research introduces the concept of coordinates to represent the distribution of student positions. This approach allows us to determine the area where students’ classroom behaviors occur while respecting the privacy of students participating in the experimental study. Specifically, the localization of student positions is divided into four steps.

Firstly, the initial position of student classroom locations is determined using the first photo captured by the camera at the beginning of the class , as shown in the lower left corner of Figure 3. OpenPose is employed to detect the body keypoints of students in the class, and the horizontal coordinate of each student’s nose and the vertical coordinates of the left and right elbow are used as representative points.
After generating the representative points for students, to perform an affine transformation of the 2D points in the camera view, a bird’s-eye view (BEV) is created using a projection transformation. The target points for the inverse perspective transformation can be more universally represented as a matrix containing four sets of coordinates. The coordinates of each point consist of the horizontal coordinate (x) and the vertical coordinate (y). The formula for the perspective transformation is as follows:
$P_{1} = (\frac{W}{2} - \frac{W}{2 \times N}, 0)$ (1)

$P_{3} = (\frac{W}{2} - \frac{W}{2 \times N}, H)$ (2)

$P_{2} = (\frac{W}{2} + \frac{W}{2 \times N}, 0)$ (3)

$P_{4} = (\frac{W}{2} + \frac{W}{2 \times N}, H)$ (4)
In this formula, W represents the width of the image after the inverse perspective transformation, H represents the height of the image after the inverse perspective transformation, N is a predefined constant, and represent the coordinates of the four target points.
Following this, clustering of coordinates is performed for each row and column in the classroom. In this case, we choose to use the simple and stable K-means clustering method for this clustering.The specific results are shown in Figure 4.
Finally, based on the two clustering results for each point in the bird’s-eye view, unique coordinate numbers are assigned to them. In Figure 5, we can assign location coordinates to each student participating in the class.
For the recognition of student classroom behaviors, this study chooses to employ the stable and high-precision YOLOv5 18 algorithm. It has maintained a stable status in terms of stability and accuracy in the industry. Figure 5 illustrates detection examples, and the recognized behaviors include raising hand, standing, turning head, lowering head, and raising head, totaling five behavior patterns. and in Figure 6, we can recognize the student standing at R2C3. Three students bowing their heads are in R1C1, R2C1, and R1C3.

Figure 3. — The camera records the first picture of the student in class. Display in the form of a scatter plot using the OpenPose method of centroid of key parts.

Figure 4. — The above image is the clustering result graph. The left image in Figure 4 represents the clustering results of each row in the classroom, while the right image represents the clustering results of each column in the classroom.

Figure 5. — In the left image, it is the result of numbering the students participating in the class based on the clustering results in step D. In the right image, it is the result of classroom behavior detection for another class scene. In the right image, it is the result of classroom behavior detection for another class scene.

Figure 6. — The statistics of the classroom video screenshots on the left side of Figure 5 are shown in Figure 6, where the corresponding colors indicate different behaviors.

Behavior Tracking and Visualization

Elimination of Repetitive Behaviors

In our study, a single Classroom behavior recognition of a student’s classroom behavior was not sufficient to accomplish the task of classroom behavior analysis. An accurate count of each student’s classroom behavior for is necessary. by the lack of temporal information in the collected figure data after being processed. However, true student classroom behavior can persist over a continuous frame. For example, a student may be Standing for about 10 seconds while answering a question in class. Therefore, it is necessary to suppress repetitive values using the intersection/union (IOU) method to eliminate repetitive behaviors in real-world testing. Specifically, in this experiment it was found that setting the overlap threshold to 0.3 achieved good results. If the matching result of a behavioral frame is different from the matching result of the next frame, the system considers that the student has completed a classroom behavior. In order to avoid the undesirable monitoring effect caused by students performing a single action for a long period of time in the classroom, the time threshold T was set to 8. This means that the counts will be reset after about 4 seconds if a student maintains the head-down state for a long period of time.

We counted student Classroom behavior by identifying student body key points to determine student classroom location, Classroom behavior recognition and suppression of repetitive classroom behaviors to contribute to the analysis of classroom quality.

Visualization Interface

In order to make the purpose of system easily understandable for teachers and parents outside the computer field, this study created a visualization interface for student classroom behavior monitoring. As shown in Figure 6, taking students in the classroom as an example, their coordinates are visually displayed on the right. Each grid represents the corresponding coordinates assigned to each student, and the five states in each grid will be counted every three seconds as the class progresses, without the need for additional intervention. This can help teachers provide some objective data support when summarizing after class. Thresholds for certain states can be set, for example, setting the threshold for raising hand to 6. If a student raises their hand more than 6 times in one class, it indicates that the student actively answers questions and performs well in class. At the same time, it provides some data support for teaching effectiveness evaluation.

Teaching Effectiveness Evaluation

The evaluation results of teaching effectiveness are influenced by classroom atmosphere, students’ mastery of knowledge, etc. Therefore, it is necessary to construct corresponding weights for different scenarios in real classrooms. For example, a theory-oriented class may require higher weights for behaviors such as raising the head or lowering the head, while a practical class may place a higher demand on students discussing with each other between classes. Influenced by non-linear weighting coefficients, it is considered to use non-linear weighting coefficients to adjust the impact of action counts. For instance, for low-frequency actions, larger non-linear weights can be applied to increase their impact, while relatively smaller non-linear weights can be applied for high-frequency actions. The introduction of this method aims to address the limitations of traditional evaluation methods and provide more accurate data support for teaching improvement and student performance analysis.

Formula (5) represents the final scoring calculation method after introducing non-linear weighting coefficients^30,31, where represents the vector weight set for classroom behaviors originally configured, represents the count of a certain behavior within a certain time period, and b is a constant representing a bias term. An enhanced function and an exponential function are used, and depending on the value of the different frequency levels will make the increase in the value of vary.

F i n a l S c o r e = \sum_{i = 1}^{5} W_{i} \times g (F r e q u e n c y_{i}) \times S_{i} + b

(5)

g (F r e q u e n c y_{i}) = {\begin{array}{l} e^{- α \times F r e q u e n c y_{i}} & (i \in [2, 4]) \\ 1 + {(F r e q u e n c y_{i})}^{β} & (i \in [0, 1]) \end{array}}

(6)

This method is introduced to overcome the limitations of traditional evaluation methods and provide more accurate data support for teaching improvement and student performance analysis.

Experiment

In order to verify the application value of the proposed method in this paper, an 8-week tracking experiment was conducted on the computer science students of the class of 2022 at Chongqing Normal University University, and the smart classroom of the university was selected as the research site. The students in the video signed the relevant experimental protocols, and each session was 45 minutes long with the frame rate set to 60 frames per second. Among them, in the first to sixth weeks, we refined the method proposed in this paper and constructed the CQStu student classroom behavior datasets, and in the seventh and eighth weeks, we applied our method to real classrooms. to verify whether it can describe students’ classroom performance more accurately.

Student Classroom Behavior Identification and Location Distribution Application

In the education field, complete classroom behavior datasets are rarely used publicly due to privacy protection and permissions. to verify the effectiveness of the proposed method in this study, we compared it with four commonly used target detection methods on the CQStu dataset, and verified the accuracy, checking rate, and mAP of the different methods in comparison. As shown in Table 1.

Table 1.

Compared with other models, the first list of experiments is different methods, and the other list is experimental data

Models	Precision%	Recall%	mAP@0.50%
Fast R-CNN	71.7	76.9	78.3
SSD	69.8	72.1	73.2
YOLOV5	75.0	83.2	84.23

Open in a new tab

In order to verify the accuracy of student location assignment, we randomly selected the seat information of four students in the classroom in a video session, recorded and observed the localization results of each seat (named A, B, C, and D, respectively). The results are shown in Table 2. Due to the effect of resolution, the resolution decreases when the students are far away from the camera. The pixel location information of four students was calculated and compared with the location information generated in this paper. The accuracy of the students’ personal location information can be roughly calculated by dividing the machine-generated information by the real information. After an extensive empirical study, the overall location accuracy of students is about 79.6%. Although some students were located at the edge of the camera or in the back rows were not precise enough, most of their location information was relatively accurate, providing accurate and objective data to support teachers’ observation of each student’s classroom performance.

Table 2.

The following table is a location information table, with the main purpose of testing the accuracy of coordinate information.

Id	Location	F_l	F_c	ACC
A	R3C3	710	662	93.24%
B	R2C2	694	486	70.03%
C	R1C1	842	715	84.92%
D	R3C3	785	536	68.28%

Open in a new tab

Nonlinear Weighting Coefficient Enhanced Classroom Behavior Evaluation Results

To validate the effectiveness and rationality of the teaching effectiveness evaluation model introduced in this study with the inclusion of non-linear weighting coefficients, a comparison was made between the teacher’s evaluation of students and the student classroom behavior scores based on non-linear weighting coefficients. This was done to observe the impact of the introduced non-linear weighting coefficients on teaching effectiveness evaluation.

We chose a classroom section on the principles of e-commerce for comparison. As mentioned earlier, in instructional evaluations using nonlinear weighting factors, both linear and nonlinear weighting factors need to be used to calculate scores for the entire class. Therefore, when calculating the score, different behavioral weights can be set depending on what the teacher is asking for in the classroom for that subject. For example, a theoretical course might place a higher demand on increasing head improvement rates, while a discussion-oriented practical course might emphasize classroom activities. The table below shows statistics for four students as individual entities, counting the number of times they were Looking up, Turning, Raising hand, Standing and Looking down. The teacher referred to the statistical counts of classroom behaviors of the four students and incorporated our proposed nonlinear weighting factor formula to assess them.

Conclusion

We propose a method for student behavior analysis in a smart classroom environment based on YOLOv5 and OpenPose. By recording students’ classroom activity data through smart classroom devices, we constructed a CQStu datasets and applied our method to achieve Classroom behavior recognition and analysis.

Our research contributes to the field of student classroom behavior detection. The findings highlight the potential of using deep learning methods to identify and analyze classroom dynamics, which contributes to educational research and student Classroom behavior recognition.

Acknowledgment

This work was supported by the Chongqing Social Science Planning Project (Grant No. 2023BS085), “Research on Learning State Analysis and Regulation of Learners Based on Deep Neural Networks”, and the Chongqing Graduate Student Research and Innovation Program (Grant No. CYS240393), “Research on Improving Students’ Behavior in the Classroom and Constructing Teacher Evaluation System Based on YOLOV7”. (Grant No. CYS240393), “Research on Improving Student Behavior in Classroom Based on YOLOV7 and Constructing Evaluation System for Teachers”, and “Optimization of Learners’ Online Learning Paths in the Context of Artificial Intelligence” under the Special Project on Intelligent Education of Chongqing Normal University (Grant No. YZH24013).

Figures & Tables

Table 3.

The following table describes the statistics of the number of classroom behaviors of four students selected, and the evaluation obtained based on the suggestions of the teaching teacher.

Id	Look down	Rise head	Swivel head	Standing	Raise hands	Evaluate
A-counts	15	62	12	3	8	A+
B-counts	32	49	8	1	3	B
C-counts	19	54	7	0	0	C
D-counts	18	28	27	2	6	B+

Open in a new tab

References

1.Chengjun X.Y.G.L.Q.Y.P.Z.L. Intelligent Technology Empowers Evaluation Innovation in High Quality Classrooms. Research on Electronic Education. 44:73. [Google Scholar]
2.Xianghe Z.Y.G. The Necessary Turn of Education Evaluation Standards in China under the Background of Smart Education Distance Education in China
3.Valiente C., Swanson J., DeLay D., Fraser A.M., Parker J.H. Emotion-related socialization in the classroom: Considering the roles of teachers, peers, and the classroom context. Developmental psychology. 2020;56:578. doi: 10.1037/dev0000863. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Zhiming W.Y.M. Research on Student Classroom Behavior Recognition Based on Deep Learning. Software Engineering. 26:40. [Google Scholar]
5.in the Kingdom;, S.C. Research on Student Behavior Recognition Based on Improved OpenPose. Computer Application Research. 38:3183. [Google Scholar]
6.Lin F.C., Ngo H.H., Dow C.R., Lam K.H., Le H.L. Student behavior recognition system for the classroom environment based on skeleton pose estimation and person detection. Sensors. 2021;21:5314. doi: 10.3390/s21165314. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Sajjad M., Zahir S., Ullah A., Akhtar Z., Muhammad K. Human behavior understanding in big multimedia data using CNN based facial expression recognition. Mobile networks and applications. 2020;25:1611–1621. [Google Scholar]
8.Brophy J.E., Good T.L. Teachers’ communication of differential expectations for children’s classroom performance: Some behavioral data. Journal of educational psychology. 1970;61:365. [Google Scholar]
9.Kumar A., Kaur A., Kumar M. Face detection techniques: a review. Artificial Intelligence Review. 2019;52:927–948. [Google Scholar]
10.Nonaka S., Nobuhara S., Nishino K. Dynamic 3d gaze from afar: Deep gaze estimation from temporal eye-head-body coordination. Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. pp. pp. 2192–2201.
11.Sharma P., Joshi S., Gautam S., Maharjan S., Khanal S.R., Reis M.C., Barroso J., de Jesus Filipe V.M. Proceedings of the International Conference on Technology and Innovation in Learning, Teaching and Education. Springer; 2022. Student engagement detection using emotion analysis, eye tracking and head movement with machine learning; pp. pp. 52–68. [Google Scholar]
12.Alexandre G.R., Soares J.M., Thé G.A.P. Systematic review of 3D facial expression recognition methods. Pattern Recognition. 2020;100:107108. [Google Scholar]
13.Goto M., Tanaka T., Matsumoto K. Estimating attention level from blinks and head movement. Proceedings of ISCA 30th International Confer. 2021;77:52–59. [Google Scholar]
14.Li L., Liu M., Sun L., Li Y., Li N. ET-YOLOv5s: toward deep identification of students’ in-class behaviors. IEEE Access. 2022;10:44200–44211. [Google Scholar]
15.Yang F., Wang T. SCB-Dataset3: A Benchmark for Detecting Student Classroom Behavior. arXiv preprint arXiv:2310.02522. 2023.
16.Zhou H., Jiang F., Si J., Xiong L., Lu H. Proceedings of the ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) IEEE; 2023. Stuart: Individualized Classroom Observation of Students with Automatic Behavior Recognition And Tracking; pp. pp. 1–5. [Google Scholar]
17.Jin S., Liu W., Xie E., Wang W., Qian C., Ouyang W., Luo P. Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII 16. Springer; 2020. Differentiable hierarchical graph grouping for multi-person pose estimation; pp. pp. 718–734. [Google Scholar]
18.Insafutdinov E., Pishchulin L., Andres B., Andriluka M., Schiele B. Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI 14. Springer; 2016. Deepercut: A deeper, stronger, and faster multi-person pose estimation model; pp. pp. 34–50. [Google Scholar]
19.Cao Z., Simon T., Wei S.E., Sheikh Y. Realtime multi-person 2d pose estimation using part affinity fields. Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. pp. pp. 7291–7299.
20.Duan H., Zhao Y., Chen K., Lin D., Dai B. Revisiting skeleton-based action recognition. Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. pp. pp. 2969–2978.
21.Joze H.R.V., Shaban A., Iuzzolino M.L., Koishida K. MMTM: Multimodal transfer module for CNN fusion. Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020. pp. pp. 13289–13299.
22.Faure G.J., Chen M.H., Lai S.H. Holistic interaction transformer network for action detection. Proceedings of the Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2023. pp. pp. 3340–3350.
23.Zhou Z.B.X.X.L. Teaching Elements and Evaluation of Teaching Effectiveness in Micro Courses in Universities. Modern Educational Technology. 25:30. [Google Scholar]
24.Libo L.H.C.J.Z.X.F.W.L. Construction of Evaluation Index System for Classroom Teaching Quality in Higher Education Institutions. Continuing Medical Education in China. 12:85. [Google Scholar]
25.Lv L.T., Ji N., Zhang J.L. Proceedings of the 2008 International conference on wavelet analysis and pattern recognition. Vol. 1. IEEE; 2008. A RBF neural network model for anti-money laundering; pp. pp. 209–215. Version March 16, 2024 submitted to Journal Not Specified 13 of 13. [Google Scholar]
26.Yuan T. Algorithm of classroom teaching quality evaluation based on Markov chain. Complexity. 2021;2021:1–12. [Google Scholar]
27.Feng W., Feng F. Research on the multimodal digital teaching quality data evaluation model based on fuzzy BP neural network. Computational Intelligence and Neuroscience. 2022;2022 doi: 10.1155/2022/7893792. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Li S., Deng W. Deep facial expression recognition: A survey. IEEE transactions on affective computing. 2020;13:1195–1215. [Google Scholar]
29.Adyapady R.R., Annappa B. A comprehensive review of facial expression recognition techniques. Multimedia Systems. 2023;29:73–103. [Google Scholar]
30.Lei G.X.T.S.Z.Z.S. Research on Nonlinear Weighted Coefficients in the Center of Gravity Method. Laser and Infrared. 40:109. [Google Scholar]
31.Zhao Y., Liang D., Tao Z. Calculating apparatus and method for nonlinear weighting coefficient. US Patent 9,998,223. 2018.
32.Lin T.Y., Maire M., Belongie S., Hays J., Perona P., Ramanan D., Dollár P., Zitnick C.L. Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer; 2014. Microsoft coco: Common objects in context; pp. pp. 740–755. [Google Scholar]

[r1-4834] 1.Chengjun X.Y.G.L.Q.Y.P.Z.L. Intelligent Technology Empowers Evaluation Innovation in High Quality Classrooms. Research on Electronic Education. 44:73. [Google Scholar]

[r2-4834] 2.Xianghe Z.Y.G. The Necessary Turn of Education Evaluation Standards in China under the Background of Smart Education Distance Education in China

[r3-4834] 3.Valiente C., Swanson J., DeLay D., Fraser A.M., Parker J.H. Emotion-related socialization in the classroom: Considering the roles of teachers, peers, and the classroom context. Developmental psychology. 2020;56:578. doi: 10.1037/dev0000863. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r4-4834] 4.Zhiming W.Y.M. Research on Student Classroom Behavior Recognition Based on Deep Learning. Software Engineering. 26:40. [Google Scholar]

[r5-4834] 5.in the Kingdom;, S.C. Research on Student Behavior Recognition Based on Improved OpenPose. Computer Application Research. 38:3183. [Google Scholar]

[r6-4834] 6.Lin F.C., Ngo H.H., Dow C.R., Lam K.H., Le H.L. Student behavior recognition system for the classroom environment based on skeleton pose estimation and person detection. Sensors. 2021;21:5314. doi: 10.3390/s21165314. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r7-4834] 7.Sajjad M., Zahir S., Ullah A., Akhtar Z., Muhammad K. Human behavior understanding in big multimedia data using CNN based facial expression recognition. Mobile networks and applications. 2020;25:1611–1621. [Google Scholar]

[r8-4834] 8.Brophy J.E., Good T.L. Teachers’ communication of differential expectations for children’s classroom performance: Some behavioral data. Journal of educational psychology. 1970;61:365. [Google Scholar]

[r9-4834] 9.Kumar A., Kaur A., Kumar M. Face detection techniques: a review. Artificial Intelligence Review. 2019;52:927–948. [Google Scholar]

[r10-4834] 10.Nonaka S., Nobuhara S., Nishino K. Dynamic 3d gaze from afar: Deep gaze estimation from temporal eye-head-body coordination. Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. pp. pp. 2192–2201.

[r11-4834] 11.Sharma P., Joshi S., Gautam S., Maharjan S., Khanal S.R., Reis M.C., Barroso J., de Jesus Filipe V.M. Proceedings of the International Conference on Technology and Innovation in Learning, Teaching and Education. Springer; 2022. Student engagement detection using emotion analysis, eye tracking and head movement with machine learning; pp. pp. 52–68. [Google Scholar]

[r12-4834] 12.Alexandre G.R., Soares J.M., Thé G.A.P. Systematic review of 3D facial expression recognition methods. Pattern Recognition. 2020;100:107108. [Google Scholar]

[r13-4834] 13.Goto M., Tanaka T., Matsumoto K. Estimating attention level from blinks and head movement. Proceedings of ISCA 30th International Confer. 2021;77:52–59. [Google Scholar]

[r14-4834] 14.Li L., Liu M., Sun L., Li Y., Li N. ET-YOLOv5s: toward deep identification of students’ in-class behaviors. IEEE Access. 2022;10:44200–44211. [Google Scholar]

[r15-4834] 15.Yang F., Wang T. SCB-Dataset3: A Benchmark for Detecting Student Classroom Behavior. arXiv preprint arXiv:2310.02522. 2023.

[r16-4834] 16.Zhou H., Jiang F., Si J., Xiong L., Lu H. Proceedings of the ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) IEEE; 2023. Stuart: Individualized Classroom Observation of Students with Automatic Behavior Recognition And Tracking; pp. pp. 1–5. [Google Scholar]

[r17-4834] 17.Jin S., Liu W., Xie E., Wang W., Qian C., Ouyang W., Luo P. Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII 16. Springer; 2020. Differentiable hierarchical graph grouping for multi-person pose estimation; pp. pp. 718–734. [Google Scholar]

[r18-4834] 18.Insafutdinov E., Pishchulin L., Andres B., Andriluka M., Schiele B. Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI 14. Springer; 2016. Deepercut: A deeper, stronger, and faster multi-person pose estimation model; pp. pp. 34–50. [Google Scholar]

[r19-4834] 19.Cao Z., Simon T., Wei S.E., Sheikh Y. Realtime multi-person 2d pose estimation using part affinity fields. Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. pp. pp. 7291–7299.

[r20-4834] 20.Duan H., Zhao Y., Chen K., Lin D., Dai B. Revisiting skeleton-based action recognition. Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. pp. pp. 2969–2978.

[r21-4834] 21.Joze H.R.V., Shaban A., Iuzzolino M.L., Koishida K. MMTM: Multimodal transfer module for CNN fusion. Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020. pp. pp. 13289–13299.

[r22-4834] 22.Faure G.J., Chen M.H., Lai S.H. Holistic interaction transformer network for action detection. Proceedings of the Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2023. pp. pp. 3340–3350.

[r23-4834] 23.Zhou Z.B.X.X.L. Teaching Elements and Evaluation of Teaching Effectiveness in Micro Courses in Universities. Modern Educational Technology. 25:30. [Google Scholar]

[r24-4834] 24.Libo L.H.C.J.Z.X.F.W.L. Construction of Evaluation Index System for Classroom Teaching Quality in Higher Education Institutions. Continuing Medical Education in China. 12:85. [Google Scholar]

[r25-4834] 25.Lv L.T., Ji N., Zhang J.L. Proceedings of the 2008 International conference on wavelet analysis and pattern recognition. Vol. 1. IEEE; 2008. A RBF neural network model for anti-money laundering; pp. pp. 209–215. Version March 16, 2024 submitted to Journal Not Specified 13 of 13. [Google Scholar]

[r26-4834] 26.Yuan T. Algorithm of classroom teaching quality evaluation based on Markov chain. Complexity. 2021;2021:1–12. [Google Scholar]

[r27-4834] 27.Feng W., Feng F. Research on the multimodal digital teaching quality data evaluation model based on fuzzy BP neural network. Computational Intelligence and Neuroscience. 2022;2022 doi: 10.1155/2022/7893792. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r28-4834] 28.Li S., Deng W. Deep facial expression recognition: A survey. IEEE transactions on affective computing. 2020;13:1195–1215. [Google Scholar]

[r29-4834] 29.Adyapady R.R., Annappa B. A comprehensive review of facial expression recognition techniques. Multimedia Systems. 2023;29:73–103. [Google Scholar]

[r30-4834] 30.Lei G.X.T.S.Z.Z.S. Research on Nonlinear Weighted Coefficients in the Center of Gravity Method. Laser and Infrared. 40:109. [Google Scholar]

[r31-4834] 31.Zhao Y., Liang D., Tao Z. Calculating apparatus and method for nonlinear weighting coefficient. US Patent 9,998,223. 2018.

[r32-4834] 32.Lin T.Y., Maire M., Belongie S., Hays J., Perona P., Ramanan D., Dollár P., Zitnick C.L. Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer; 2014. Microsoft coco: Common objects in context; pp. pp. 740–755. [Google Scholar]

PERMALINK

Student Behavior Analysis using YOLOv5 and OpenPose in Smart Classroom Environment

Xiang Li, Graduate Student

Yucheng Ji, Graduate Student

Jiayi Yang, Graduate Student

Mingyong Li, Ph.D.

Abstract

Introduction

Related Work

Target Detection in Student Classrooms

Behavior Recognition in Student Classrooms

Teaching Effectiveness Evaluation

Proposed Methodology

Figure 1.

System Overview

Data Collection

Figure 2.

Position Allocation and Behavior Recognition

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Behavior Tracking and Visualization

Elimination of Repetitive Behaviors

Visualization Interface

Teaching Effectiveness Evaluation

Experiment

Student Classroom Behavior Identification and Location Distribution Application

Table 1.

Table 2.

Nonlinear Weighting Coefficient Enhanced Classroom Behavior Evaluation Results

Conclusion

Acknowledgment

Figures & Tables

Table 3.

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases