Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Apr 16;15:13164. doi: 10.1038/s41598-025-95395-y

A fine-grained course session recommendation method based on knowledge point pruning

Yiwen Zhang 1, Xiaolan Cao 1, Wangjian Li 2, Li Zhang 2,
PMCID: PMC12003713  PMID: 40240510

Abstract

Course recommendation represents a significant research avenue within the educational domain. Presently, it predominantly employs collaborative filtering techniques to generate recommendations based on users’ historical learning behaviors, such as their past rating information. However, course information encompasses a plethora of knowledge point information. Relying exclusively on historical behavior for recommendations leads to a constrained information scope and an inability to capture more granular user interests, thereby resulting in suboptimal interpretability. To address these challenges, this paper introduces a fine-grained course session recommendation approach grounded in knowledge point pruning, which refines the set of candidate knowledge points and enhances dialogue quality. Initially, a multitude of candidate knowledge points are identified leveraging the constraint properties inherent in the graph structure. Subsequently, by assessing the similarity between the set of candidate knowledge points and the learner’s current preferred knowledge points, those knowledge points with a high degree of irrelevance to the learner’s preferences are pruned. Ultimately, a deep residual Q-network is utilized to process the learner’s feedback knowledge points, historical dialogues, and candidate knowledge points, yielding an action output for inquiry or recommendation. The proposed method has been validated on the MOOCcube dataset, with improvements observed in both HR and NDCG metrics. This approach effectively mitigates the issues of poor interpretability of recommendation outcomes in the realm of online course recommendation and the inability to dynamically capture learners’ granular interests.

Keywords: Knowledge graph, Session recommendation, Course recommendations, Knowledge point clipping

Subject terms: Computational science, Computer science, Information technology, Scientific data, Statistics

Introduction

In the wake of the advancement of Internet technology and the escalating demand for personalized education among learners, there has been a notable increase in both the quantity of online courses available and the size of the online education audience1. Unbound by temporal and geographical constraints, online education furnishes learners with convenient and varied modes of learning, while also facilitating a platform for the sharing of educational resources2. Online learning has thus emerged as a pivotal avenue through which individuals can acquire knowledge, broaden their skill sets, and engage in academic research3. Nevertheless, learners are confronted with an extensive array of online courses, and it is important to recognize that different learners possess distinct learning interests, objectives, and capabilities. Consequently, the customization of personalized learning trajectories and the provision of course recommendations are deemed essential.

In the evolution of course recommendation systems, a plethora of methodologies and techniques have been proffered by researchers. Initially, content-based recommendation approaches4 predominated, leveraging course-related information such as descriptions and tags. Subsequent advancements in data mining and machine learning precipitated the ascendance of collaborative filtering-based methodologies5 as the de facto standard in recommendation algorithms. These collaborative filtering techniques harness learners’ historical behavioral data to discern their interests and predilections, subsequently generating recommendations aligned with analogous learner behaviors. However, the reliance on historical behavior data in collaborative filtering methods often obfuscates the extraction of latent information within user behavior datasets. Consequently, neural network-based recommendation strategies have garnered increasing attention. Such neural network-driven approaches6 are adept at unearthing concealed information within learners’ behavior data, thereby enhancing the precision and personalization of recommendations. The aforementioned techniques generally operate under the assumption that learners exhibit static preferences, with modeling predicated on explicit feedback mechanisms (e.g., ratings) or implicit feedback (e.g., clicks) between learners and educational content.

The burgeoning field of deep learning has catalyzed the integration of deep learning algorithms7 into educational resource recommendations, facilitate a more profound exploration of user interests. For instance, literature8,9 elucidates the manner in which deep learning techniques, grounded in collaborative filtering, mitigate issues of cold start and data sparsity within recommendation systems. Literature10 demonstrates the use of deep learning to preemptively comprehend learner and course dynamics, thereby to achieve efficacious intelligent recommendations and refining collaborative filtering methodologies. Amidst the shifting landscape of data content and form, and the burgeoning diversity of data modalities such as text and images, scholars have begun to amalgamating natural language processing techniques11,12 and image processing technologies13 with recommendation algorithms, thereby augmenting the efficacy of recommendation systems.

The above method mainly relies on learners’ historical learning behaviors, such as course scores, for recommendations. However, this behavioral information is too singular to fully describe learners’ interest characteristics, resulting in poor interpretability. While NLP-based methods also incorporate course and other knowledge point information to enrich user interest descriptions, the large number of courses and knowledge points makes it challenging to accurately capture users’ fine-grained interests.

Therefore, this paper proposes a fine-grained course session recommendation method based on knowledge point clipping, optimizing the set of candidate knowledge points and improving dialogue quality. Firstly, the inference module utilizes the constraint properties of the graph structure to obtain a large number of candidate knowledge points. Then, by calculating the similarity between the candidate knowledge point set and the learner’s current preference knowledge point, it cuts out knowledge points that are highly unrelated to the current learner’s preference. Finally, the learner’s positive feedback knowledge point, historical dialogue, and candidate knowledge point dataset are simultaneously inputted as states into the deep residual Q-network to obtain the action output of inquiry or recommendation. When learners provide positive feedback on the knowledge points being asked multiple times, they make recommendations. In this paper, the proposed method was validated on a real Massive Open Online Course(MOOC) dataset, in which Hit Ratio (HR) and Normalized Discount Cumulative Gain (NDCG) are improved, and the method solves the problems of poor interpretability of recommendation results and inability to dynamically capture learners’ fine-grained interests in the field of online course recommendation.

The main contribution of this article is:

  1. Build a knowledge graph. Build a knowledge graph of students, courses, and knowledge points to enrich the semantic relationships of the model.

  2. Conversation recommendations.Through multiple rounds of dynamic conversation interaction between learners and the system, a reinforcement learning model is used to dynamically capture learners’ interest drift and deeply mine their fine-grained interests.

  3. Optimize the candidate set of learner interests. Calculate the similarity of conversation knowledge points interacting with learners, trim the candidate knowledge point set, retain knowledge points with higher learner interest, and optimize the candidate set.

Related research

An interpretable method for analyzing student learning behavior and recommending MOOCs was proposed, and the proposed multi attention network model was used to explore various unstructured information. However, due to the sparsity of the learner’s learning course data compared to the large amount of course information on the platform, and only historical learning information, the interpretability of the recommendation results is poor.

Some scholars have also introduced the methods of graph models and convolutional networks into educational resource recommendation, for example, Wang et al.14treated learners as sets of courses in the hypergraph and transformed the task of learning learner representations into inducing embeddings of hyperedges. They proposed a hyperedge graph attention network that considers both the long-term and short-term sequence relationships of courses, leading to improved recommendation effectiveness by acknowledging the significance of long-term and short-term course sequence patterns in course recommendations. Zhu et al.15 proposed a learning object recommendation model based on heterogeneous learning behaviors and knowledge graphs, introducing an Attention-Consolidated Graph Convolutional Network (ACGCN) initially. By introducing attention mechanism, the influence of noise is eliminated and the robustness of the model is improved. Utilizing Dense Feature based Operation Aware Networks (DFOAN) to capture implicit and complex learner interaction behaviors and make recommendations. The model has improved in recommendation accuracy, recall, F1, and accuracy. Ma et al.16 introduced an Attention Decay Network based on Contrastive Learning and Graph Convolutional Networks (CLGADN) model, which enhances recommendation fairness by learning from the knowledge backgrounds of different learners. Uiiah et al.17 proposed a new deep neural collaborative filtering method for educational service recommendations, composed of an input layer, multi-layer perceptron, and output layer. This method not only learns the N-dimensional and non-linear interactions between user identifiers and book identifiers but also significantly alleviates issues such as cold start, data sparsity, and inaccuracies. Gao et al.18 proposed a personalized course recommendation model that combines convolutional neural networks with negative sequence pattern mining. By establishing negative sequence patterns representing courses that students should not select or have made errors in, this model enhances recommendation accuracy. Zhang et al.19 proposed a learning path recommendation algorithm based on knowledge graphs. Build a multi-dimensional knowledge graph of computer domain courses, and then use graph convolutional network-based methods to model high-order correlations on the knowledge graph to more accurately capture learner preferences.

Scholars have also incorporated graph-based conversation patterns into recommendation systems. Knowledge graph models can mine more semantic information, while using dialogue models can learn more fine-grained user interests for personalized recommendations. For example Zhao et al.20 proposed a knowledge graph-enhanced sampling recommendation model, which integrates the dynamic graph of user interaction data with external knowledge to form a heterogeneous knowledge graph as contextual information environment. By sampling ambiguous text, the model’s learning ability regarding knowledge is enhanced, improving the impact of data sparsity and noise on the model. Zhang et al.21 Proposed a knowledge-rich session recommendation system develop the Bag-of-Entity (BOE) loss and the infusion loss to better integrate Knowledge Graph (KG) with Conversation Recommender System (CRS) for generating more diverse and informative responses, avoid duplicate items from affecting the recommendation results and improve recommendation accuracy. Yao et al.22 proposed a multi-turn conversation recommendation method based on dynamic heterogeneous encoding. They construct a dynamic heterogeneous graph with three types of nodes (user, item, and attribute) to describe the preceding dialogue. They design an encoder that adaptively updates node weights based on the heterogeneous graph, mining and representing dynamic higher-order semantic relationships between users, items, and attributes, thereby enhancing the recommendation results.

Due to the vast amount of knowledge information in the education field and the interactive knowledge information between users and knowledge, in the past two years, scholars have been discussing whether conversation recommendations can be applied to the education field. For example Guruge et al.23 introduced the current state of methodology used in Recommender System (RS) courses, as well as a summary of the types of data sources used to evaluate these techniques, mentioning the potential role that conversation recommendations may play in the education field. Dehbozorgi et al.24 proposed an architectural model for active learning design pattern retrieval, using NLP model to extract the context of the problem solved by each pattern, and the user interacts with the system through the “view” layer to provide solutions for teachers’ teaching. Rossi et al.25 proposed an interference and suggestion that facilitates the construction of knowledge educational resources for students, tutors and teachers through self-directed learning, an architecture that allows to infer the moments when a student needs help, and through a dialogic recommender system, provides the student with the opportunity to modify his/her own point of view.

In summation, beyond the enhancement of content-based collaborative filtering algorithms for course recommendations via deep learning and machine learning techniques, an increasing number of scholars have discerned that the substantial knowledge information and the intricate semantic relationships inherent in the educational domain significantly aid in optimizing recommendation outcomes. Accordingly, certain scholars have employed the methodology of knowledge graphs to discern user interests and augment the interpretability of recommendation results. Concurrently, dialogue-based approaches, predicated on reinforcement learning, have been integrated into recommendation systems, offering a dynamic representation of user interests. However, given the expansive nature of the course knowledge graph within education, an overabundance of knowledge points within the recommendation candidate set can inundate the system with extraneous information, thereby obscuring the users’ granular interests. As a result, this paper introduces a fine-grained course session recommendation approach founded on knowledge point pruning. Initially, this approach constructs a knowledge graph utilizing course-specific knowledge points, thereby enriching the model’s semantic content. Subsequently, it dynamically encapsulates user interests through a session pattern informed by reinforcement learning. Ultimately, by leveraging attribute pruning to delineate the users’ fine-grained interests, the proposed method elevates the caliber of the recommendations.

Proposed solution

Problem definition

The course recommendation adopts the Multi round Conversational Recommendation (MCR)26 method. This method asks learners about their preferences for a certain knowledge point and recommends it multiple times during the conversation. In the CRS27, first establish the course vector Inline graphic, Inline graphic. Each course Inline graphic is associated with a set of course knowledge points Inline graphic, Inline graphicis the set of knowledge points of course Inline graphic. At the beginning of each learner’s conversation with the system, the knowledge point Inline graphic is learner’s initial preference of learner Inline graphic, that is, the knowledge point where the learner explicitly expresses their preference for the first time. Subsequently, CRS can freely inquire about learners’ preferences for knowledge points in candidate knowledge point set Inline graphic or recommend courses from candidate course set Inline graphic.Learners respond to knowledge inquiries or recommended courses based on their own preferences. According to the method adopted by Lei et al.28, it is assumed that learners maintain clear preferences for all knowledge points and courses, and recommend them in the form of question and answer interaction. The process of system inquiry and learner response continues until the CRS successfully recommends or reaches the maximum number of recommendation rounds.

The Attribute Clipping based on Dynamic Graph (ACDG) model proposed in this article is shown in Fig. 1.

Fig. 1.

Fig. 1

Overall framework of ACDG algorithm.

ACDG mainly consists of four components: offline representation learning, inference module, knowledge point pruning module, and deep residual Q-learning network. The following will introduce the functions of these four components separately.

Offline representation learning

Offline representation learning can help recommendation systems effectively learn feature representations of learners and courses. Extracting feature representations from learner behavior and course information can better describe learner interests and course characteristics, improving the accuracy of recommendation systems. This article selects Factorization Machines (FM)29 as the offline representation learning model.By using FM to obtain embedded representations of learners, courses, and knowledge points, and calculating user preferences for courses and predicted scores for knowledge points through embedded representations.

  1. Course preference prediction. The main task is to capture learners’ preferences for the course, as shown in formula (1):

graphic file with name d33e448.gif 1

Among them, Inline graphic represents the preference of learner Inline graphic for course Inline graphic, and Inline graphicInline graphic and Inline graphic respectively represent the embedded representations of learner, course, and knowledge points.Inline graphic represents the learner’s interest in the course, while Inline graphic represents the learner’s interest in the knowledge points of the course.Inline graphic represents the set of knowledge points of the courses selected by the learner. FM uses pairwise loss for optimization, as shown in formula (2):

graphic file with name d33e515.gif 2

where Inline graphic represents the loss value of training.

graphic file with name d33e529.gif
graphic file with name d33e534.gif

Among them, Inline graphic and Inline graphic represent training instances, Inline graphicrepresents the set of courses that the learner has not interacted with, Inline graphicrepresents the set of courses that the learner has interacted with, Inline graphic represents the sigmoid function, and Inline graphic is the regularization parameter to prevent overfitting. Inline graphic represents a collection of candidate courses.

  • (2)

    Knowledge point preference prediction. The role of knowledge point preference prediction is to determine the knowledge points that need to be inquired about at the next moment during the conversation. As shown in formula (3):

graphic file with name d33e596.gif 3

Among them, Inline graphic represents the predicted learner’s rating on a knowledge point, Inline graphic represents the learner’s rating on a certain knowledge point, and Inline graphic represents the similarity between two knowledge Inline graphic and Inline graphic.

Similar to Inline graphic’s prediction, using pairwise loss for knowledge point prediction:

graphic file with name d33e644.gif 4

where Inline graphic represents the training loss value.

graphic file with name d33e657.gif

Inline graphic represents the set of knowledge points connected by course Inline graphic therefore Inline graphic and Inline graphic represent instances of knowledge points included and not included in the course, respectively.

  • (3)

    Multi task learning. Linear fusion of course interest and knowledge point interest. As shown in formula(5).

graphic file with name d33e697.gif 5

where L represents the sum of Inline graphic and Inline graphic loss values. Inline graphic represents the loss of the course, and Inline graphic represents the loss value of the knowledge points. The lower the L loss value, the better the FM model.

Inference module

The inference module is mainly an undirected heterogeneous graph constructed based on learners, courses, and knowledge points. Based on the inquiry or recommendation actions provided by the system, it randomly walks on the graph to obtain learners’ preferences.

  1. Construct an undirected heterogeneous graph Inline graphic, which includes a set of items Inline graphic, a set system of knowledge points Inline graphic, Each Inline graphic is a subset of Inline graphic, which is the set of knowledge points corresponding to course Inline graphic, and can be represented as Inline graphic. A set of learners Inline graphic. Each course Inline graphic is connected to part of knowledge point Inline graphic, and each learner Inline graphic is connected to part of course Inline graphic. There are two types of triplet relationships in the diagram, namely (learner, choice, course) and (course, including, knowledge points).

  2. Initialize the learner’s preferred knowledge points, select knowledge point Inline graphic as the learner’s initial preferred knowledge point before the conversation starts, randomly select a course Inline graphic from the courses that the learner has interacted with, and then randomly select a knowledge point Inline graphic from the courses as the knowledge point to be asked in the next conversation.

  3. The system first asks the learner’s preference for a certain knowledge point Inline graphic at time step t. If the learner provides positive feedback on the knowledge point Inline graphic, the role of the knowledge point Inline graphic is changed to the target knowledge point Inline graphic. Use Inline graphic as the seed node, neighboring nodes as intermediate nodes, and save 2-hop attribute nodes as candidate knowledge point sets. Among them, the types of intermediate nodes are learners and courses. Construct a dynamic subgraph of graph G, named Inline graphic that includes the visited learners, courses, and knowledge points, such as {Inline graphic–v1–a1–v2–a2}, {Inline graphic–v1–u1–v2–a1}, etc.Inline graphic is a subgraph of graph G.

Knowledge point clipping module

The function of the knowledge point pruning module is to obtain the pruned set of candidate knowledge points Inline graphic at time step t. Inline graphic contains the knowledge points at time step t. Through the inference module, Inline graphicis obtained and enters the knowledge point pruning stage. The walking on the graph structure helps the model filter out some knowledge points, but the resulting knowledge point space Inline graphic is still large, including many knowledge points that learners are not interested in. By clipping knowledge points, users can obtain knowledge points with high interest. Firstly, find the set of learners who like the target knowledge point Inline graphic, Inline graphic, and then collect the knowledge point Inline graphic that Inline graphic prefers. To avoid noise in Inline graphic, the score of f(u, Inline graphic) is used to mark the importance of each learner in Inline graphic to the current learner Inline graphic, as shown in formula (6).Mark the similarity between each knowledge point in Inline graphic and Inline graphic using the score of f(Inline graphic, Inline graphic), and crop the knowledge point nodes based on the similarity value, as shown in formula (7).The calculation of Inline graphic is as shown in formula (8).

graphic file with name d33e1018.gif 6
graphic file with name d33e1024.gif 7
graphic file with name d33e1030.gif 8

Among them, Inline graphic represents the set of candidate courses at time step t, Inline graphic represents the similarity between the current user Inline graphic and the set of learners who like the target knowledge point Inline graphic, Inline graphic,Inline graphic represents the final set of candidate knowledge points, Inline graphic, Inline graphic represents the learner embeddings obtained through offline representation learning. Among them, Inline graphic represents the set of candidate courses at time step t, Inline graphic represents the similarity between the current user Inline graphic and the set of learners who like the target knowledge point Inline graphic, Inline graphic, Inline graphic represents the final set of candidate knowledge points,Inline graphic,Inline graphic represents the learner embeddings obtained through offline representation learning.Inline graphic and Inline graphic represent knowledge point embeddings obtained through offline representation learning.Inline graphic represents the Hadamard product, which represents the element wise multiplication of two matrices or vectors, that is, multiplying the elements with the same position in two matrices or vectors to obtain a new matrix or vector. || || represents the second norm of a vector.

Deep residual Q-learning network

The deep residual Q-learning network is mainly used to learn action generation and action selection during the dialogue between the system and learners. The network structure is designed with three vectors to represent the dialogue state: the number of candidate courses, the number of positive feedback knowledge points from learners, and the dialogue history information. The input of the deep residual Q-learning network is the state, and the output is to inquire about knowledge points or recommend courses. The state Inline graphic is shown in formula (9).

graphic file with name d33e1168.gif 9

Inline graphic represents the history of the conversation, recording the reward values received by the system after each conversation, guiding the intelligent agent to make more accurate decisions and generate smoother conversations. Inline graphic records the number of candidate courses, and as the conversation progresses, the number of candidate courses will be continuously adjusted based on learner feedback. When the number of candidate courses is small, it may generate the best recommended time.Inline graphic records the number of positive feedback points from learners. Due to the fact that learner feedback on knowledge points is direct and clear, the number of knowledge points in the candidate set can reflect the breadth of the learner’s interest.

When the output action is inquiry, the system selects the most valuable knowledge points from the set of candidate knowledge points Inline graphic for inquiry. When the output action is a recommendation, the system will select the Top-N item from the candidate course set Inline graphic for recommendation. Using the embedded inner product of the learner and course vectors as the learner’s rating for the course, and selecting the next course to be asked based on the score, the learner’s rating for the course is represented as Inline graphic, and the calculation process is shown in formula (10).

graphic file with name d33e1216.gif 10

Inline graphic, Inline graphic, and Inline graphic represent the embeddings of learners, courses, and knowledge points, respectively, while Inline graphic represents the set of knowledge points preferred by learners. The selection of knowledge points for inquiry is based on weighted entropy, as shown in formula (11) and (12).

graphic file with name d33e1254.gif 11
graphic file with name d33e1260.gif 12

Among them, Inline graphic represents the sigmoid function, Inline graphic represents the set of candidate courses, and Inline graphic represents the set of courses whose function contains knowledge pointInline graphic.

Experimental evaluation

Dataset and preprocessing

The experiment used the publicly available dataset MoocData30, which was provided by the MOOC team of the Knowledge Engineering Group (KEG) at Tsinghua University. MOOCCube is an open data warehouse designed for researchers in fields such as natural language processing, knowledge graphs, and data mining in large-scale online education. It includes 706 online courses and 38,181 teaching videos, 114,563 concepts, as well as a large number of course selection and video watching records from 199,199 MOOC learners. The MOOCCube dataset contains 8 entity types and 12 relationships, with entities such as concept, course, paper, and teacher. Relationships such as concept field, concept paper, course concept, and course video.

The dataset used in the experiment comes from three files in MOOCCube: course, course concept, and user_video-act. The course file contains information such as the course ID, course name, a brief introduction to the course, and videos related to the course, as shown in Table 1.

Table 1.

Course field description.

Field name Description
Id Unique identification of courses in the dataset
Name Course name
Prerequisites Are there any prerequisite courses available
About Course introduction
Video_order Order of videos (ID number)
Display_name Video name (corresponding to the video ID number on the same line)

The content of the course concept file is the correspondence between the course ID and the concept ID, as shown in Table 2.

Table 2.

Course concept field description.

Field name Description
CourseId Unique identification of courses in the dataset
ConceptId Unique identification of concepts in the dataset

The user_video-act file contains video viewing behavior data for 48,640 learners. To avoid data sparsity, filter out users and courses that have studied at least 4 courses. and watched more than 10 videos were selected in the experiment, as shown in Table 3.

Table 3.

Description of the User_video-act field.

Field name Description
Id Student id
Activity Course_id Course ID
Video_id Video ID
Watching_count Number of times learners watch videos
Video_duration The number of times learners watch videos
Local_watching_time Actual viewing time of learners
Video_progress_time The duration of video playback by learners (including playback at multiple speeds)
Video_start_time Learner’s starting position for viewing
Video_end_time The position where learners end their viewing
Local_start_time The earliest start time for learners
Local_end_time The latest end time for learners

Dataset processing steps:

Due to the lack of displayed ratings by learners for a certain course in the dataset, this article takes the ratio of the actual viewing time of learners to the actual video time as the learner’s rating for a certain course, as shown in formula (13).

graphic file with name d33e1476.gif 13

Use video ratings with a grade greater than or equal to 0.5 as the learner’s rating for the course, as shown in Table 4. Save the processed data as user_dict.json.

Table 4.

Learners-video rating.

Interval Score
Grade ≤ 0.2 1
0.2 < grade ≤ 0.4 2
0.4 < grade ≤ 0.6 3
0.6 < grade ≤ 0.8 4
Grade > 0.8 5

Evaluation indicators

The experiment used the Top N31 recommendation task. The Top-N recommendation task is a common type of task in recommendation systems, aimed at providing learners with a list of the most relevant N courses predicted by the system. This article recommends N courses for target learners.

This article refers to the setting of evaluation indicators in references32,33. In the experiment, hit ratio (HR) and normalized discount cumulative gain (NDCG) were selected as two evaluation indicators. HR is an indicator used to evaluate whether a recommendation system includes courses of actual interest to learners in a given recommendation list.NDCG is an indicator that measures the quality of ranking recommendation results in a recommendation system, taking into account the position of courses in the recommendation list and the satisfaction of learners with the courses. The calculation of HR and NDCG is shown in formulas (14) and (15).

graphic file with name d33e1554.gif 14
graphic file with name d33e1560.gif 15

Among them, Inline graphic is the number of recommended courses appearing in the test set, Inline graphic is the number of learners, and Inline graphic is the position of recommended courses in the test set. Two recommended lists with N values of 5, 10, and 15 were selected for the experiment.

Experiment and result analysis

  1. Experimental parameter settings. Split the dataset into training, validation, and testing sets in a ratio of 7:1.5:1.5. This experiment conducts a Top-N recommendation task, with a recommendation list length N equal to 5, 10, and 15, and a maximum conversation count T of 15.The reward values for the decision network in the experiment are set in Table 5.

Table 5.

Reward value setting for decision networks.

Reward type Reward score Meaning
Inquiry successful + 0.01 Knowledge points for learners to receive inquiries
Inquiry failed − 0.1 Knowledge points that learners refuse to inquire about
Recommended successfully Learners accept recommended courses
Recommendation failed − 0.1 Learners refuse recommended courses
Maximum number of conversations − 0.3 Exceeded maximum number of conversations

The parameter settings for the deep Q-learning network are: experience playback capacity of 50,000, sample batch size of 128, and discount factor of γ = 0.999. Use the RMSProp optimizer to optimize the policy network, updating the target network every 20 times. Obtain the result by iterating 20,000 times.

  • (2)

    Experimental results. The loss values of ACDG compared to SCPR are shown in Fig. 2, and the changes in evaluation indicators HR and NDCG are shown in Figs. 3 and 4. The HR comparison results and NDCG comparison results of 5 experiments are shown in Table 8.

Fig. 2.

Fig. 2

Comparison of loss values between SCPR and ACDG.

Fig. 3.

Fig. 3

Comparison of HR values between SCPR and ACDG.

Fig. 4.

Fig. 4

Comparison of NDCG values between SCPR and ACDG.

Table 8.

Experimental results.

K = 5 K = 10 K = 15
HR NDCG HR NDCG HR NDCG
UserCF 0.168 7 0.183 6 0.219 6 0.199 8 0.242 5 0.205 2
ItemCF 0.317 7 0.311 9 0.430 0 0.338 5 0.499 9 0.349 4
MLP 0.438 0 0.300 1 0.599 6 0.355 3 0.713 5 0.386 0
SCPR 0.498 0 0.678 4 0.633 7 0.635 5 0.727 5 0.625 5
ACDG 0.509 1 0.700 0 0.685 8 0.638 1 0.722 9 0.627 2

Figure 2 shows the variation process of the loss function values during model training for SCPR and ACDG models with recommended list lengths of 5, 10, and 15, respectively. The Loss values in the three graphs show a fluctuating decrease, which is also a common phenomenon in deep reinforcement learning. The reason is due to the balance of the “exploration utilization” mechanism in reinforcement learning. During the training process, the agent needs to try different actions (asking or recommending) to explore the learner’s interests and preferences, and learn better preference representations from them. Therefore, this exploration process may lead to fluctuations in loss values. Another reason is that during the dialogue between the system (agent) and learners, the agent often only receives reward signals in partial states, and the sparsity of this reward may lead to fluctuations in the loss value during the training process. The longer the recommendation list, the more stable the loss value becomes for SCPR and ACDG, because the more recommendations are there, the more courses that learners are interested in the recommendation list.

Figure 3 shows the variation process of HR values during model training for SCPR and ACDG models with recommended list lengths N of 5, 10, and 15, respectively. Combining Fig. 3a and Table 6, it is evident that the value of HR during the training process of the ACDG model is significantly higher than that of SCPR, and the training process is more stable.

Table 6.

Comparison of standard deviations of HR.

K = 5 K = 10 K = 15
SCPR 9.99% 5.98% 8.65%
ACDG 7.48% 4.46% 4.60%

Table 6 shows the comparison of the standard deviation of HR values between SCPR and ACDG models when N is set to 5, 10, and 15 with 20,000 training iterations.

Figure 4 show the changes in NDCG values during model training for SCPR and ACDG models with recommended list lengths of 5, 10, and 15, respectively. Combining Fig. 4; Table 7, it can be seen that the value of HR during the training process of the ACDG model is higher than that of SCPR, and the training process is more stable.

Table 7.

Comparison of standard deviations of NDCG.

K = 5 K = 10 K = 15
SCPR 2.14% 1.43% 1.65%
ACDG 2.06% 1.14% 2.68%

Table 7 shows the comparison of the standard deviation of NDCG values between SCPR and ACDG models when N is set to 5, 10, and 15 with 20,000 training iterations.

Analysis of experimental results:

  1. The values highlighted in bold in Table 8 are the optimal results. It can be clearly seen that the proposed method ACDG performs better overall than all baseline methods. This indicates that the method proposed in this article for optimizing the candidate knowledge point set can to some extent improve the performance of recommendation algorithms.

  2. Figs. 3 and 4 respectively show the continuous changes of HR and NDCG values with the training of the model, where the fluctuation of ACDG is more stable than that of SCPR. From the standard deviations of HR and NDCG in Tables 6 and 7, it can be concluded that the ACDG model exhibits more stable fluctuations in the values of HR and NDCG on the test set during training. This indicates that the third state vector introduced in this paper (the number of learner positive feedback attributes) increases the interest of more fine-grained learners in the system and improves algorithm stability to a certain extent.

  3. Another reason for the more stable fluctuations of the ACDG model shown in Figs. 3 and 5 are the role of the knowledge point pruning module in the algorithm, which filters out many irrelevant knowledge points to the learner and reduces unnecessary inquiries and recommendations by the intelligent agent during the conversation process.

  4. From the comparison of NDCG standard deviations in Table 7, it can be seen that when N is set to 15, the standard deviation of ACDG is slightly higher than that of SCPR. This may be due to the relatively simple design of the knowledge point clipping module in this article, which may not have taken into account the “long tail” knowledge points. This will result in some “long tail” knowledge points being filtered out, so when the recommendation list becomes longer, the recommendation performance of the model may decrease.

Fig. 5.

Fig. 5

Comparison of HR values in ACDG ablation experiments.

Ablation experiment

In order to verify the effectiveness of the knowledge point pruning module of ACDG, the state design of deep residual Q-network, and the residual structure design of deep residual Q-network, three ablation experiments were conducted on the MOOCCube dataset in this paper. (1) The ACDG attribute clip module model removes the knowledge point clipping module, while other modules remain consistent with the ACDG model. (2) The ACDG passive state model removes the number of positive feedback knowledge points Inline graphic from the policy network state input, while other modules remain consistent with the ACDG model. (3) The ACDG residual module model removes the residual module of the deep residual Q-learning network, while other modules remain consistent with the ACDG model. Figures 5 and 6 show the changes in HR and NDCG on the test set during the experimental training process. Table 9 shows the standard deviation of HR and NDCG on the test set after training. The results of the ablation experiment are shown in Table 10.

Fig. 6.

Fig. 6

Comparison of NDCG values in ACDG ablation experiments.

Table 9.

Standard deviation Results(%).

Models K = 5 K = 10 K = 15
HR NDCG HR NDCG HR NDCG
ACDG 5.34 1.01 7.61 2.62 5.54 1.82
Attribute clip module 12.68 4.59 12.09 4.33 7.16 4.57
Positive state 11.99 2.10 11.86 3.16 7.98 3.87
Residual module 7.88 1.45 5.70 1.70 5.34 1.01

Table 10.

Results of ablation experiment.

Models K = 5 K = 10 K = 15
HR NDCG HR NDCG HR NDCG
UserCF 0.168 7 0.183 6 0.219 6 0.199 8 0.242 5 0.205 2
ItemCF 0.317 7 0.311 9 0.430 0 0.338 5 0.499 9 0.349 4
MLP 0.438 0 0.300 1 0.599 6 0.355 3 0.713 5 0.386 0
SCPR 0.498 0 0.678 4 0.633 7 0.635 5 0.727 5 0.625 5
ACDG 0.509 1 0.700 0 0.685 8 0.638 1 0.722 9 0.627 2
Attribute clip module 0.414 1 0.674 9 0.569 9 0.610 4 0.691 0 0.587 8
Positive state 0.451 9 0.678 7 0.677 9 0.613 5 0.702 3 0.616 2
Residual module 0.490 4 0.680 8 0.679 4 0.615 5 0.720 3 0.584 3

Figure 5 show the changes in HR values during the training process of ACDG and three ablation experiments. It is evident from Fig. 5a that the HR values during ACDG training are generally higher than the other three ablation experiments. From Fig. 5b,c, it can be seen that the HR value of the ACDG attribute clip mute ablation experiment fluctuates the most, and the training process is not very stable.

Figure 6 show the changes in NDCG values during the training process of ACDG and three ablation experiments. It can be clearly seen from the graph that during the ACDG training process, the ACDG values are generally higher than the other three ablation experiments, and the fluctuation of NDCG values is relatively stable.

Table 9 shows the standard deviation of HR and NDCG on the test set after training. The bold values in the table represent the optimal values. From the table, it can be seen that the ACDG model outperforms the -attribute clip module model and the -positive state model, and its performance is close to that of the -residual module model.

Analysis of experimental results:

  1. This article evaluates the impact of the knowledge point pruning module on the ACDG model, as shown in Table 10. The - attribute clip module model represents discarding the knowledge point pruning module, while other parts remain consistent with ACDG. Abandoning the knowledge point pruning module to filter the candidate knowledge point set can lead to the existence of many knowledge points that learners do not like in the candidate knowledge point set. Many knowledge points that learners do not like can interfere with the intelligent agent making correct decisions, and incorrect rewards can be given to the agent, leading to a decrease in the accuracy of recommendations. The maximum standard deviation of the -attribute clip module model in Table 9 and the lowest HR and NDCG values of the -attribute clip module model in Table 10 can also verify this conclusion.

  2. This article evaluates the effectiveness of state design for deep residual Q-networks to evaluate the impact of learner feedback on the ACDG model. As shown in Table 10 indicates that the state input of the deep residual Q-network lacks the learner’s fine-grained preference for knowledge points, resulting in inaccurate recommendation results. The large standard deviation of the - positive state model in Table 9 and the HR and NDCG values of the - negative state model in Table 10 can verify the influence of fine-grained preference on the ACDG model.

  3. This article evaluates the effectiveness of residual structure design for deep residual Q-networks, namely the impact of the Residual Block in Fig. 1 on the ACDG model. As shown in Table 10, the - residual module model represents removing the residual structure of the Q-network, while other parts remain consistent with ACDG. The standard deviation ratio of the HR and NDCG models in Table 9 is close, because the state design of the knowledge point pruning module and deep residual Q-network has a significant impact on the ACDG model. The main role of Residual Block in ACDG models is to increase training stability and improve the accuracy of recommendation results.

Summary

If the candidate knowledge point set doesn’t correlate well with the knowledge points the learner is currently interested in, this suggests a problem. It means that many of the knowledge points in the candidate set are probably not going to engage the learner. This will affect the quality of dialogue and lead to inaccurate recommendation results.

Consequently, this study introduces a fine-grained course session recommendation approach grounded in knowledge point pruning. The proposed method incorporates an offline representation learning module, which is designed to derive feature representations for learners, courses, and knowledge points. Subsequently, the knowledge point pruning module is employed to excise those knowledge points in which the learner has no interest, and the inference module is utilized to ascertain a fine-grained interest representation of the learner. Ultimately, the deep residual Q-learning network module directs the system to render more astute decisions regarding course recommendations or inquiries about knowledge points. Experimental results have demonstrated that the method presented herein effectively enhances the congruence between the candidate knowledge point set and learners to a certain extent, while also diminishing the magnitude of the candidate knowledge point set.

Nonetheless, extant methodologies may inadvertently prune knowledge points characterized by “long tail” attributes from the candidate knowledge point pool. Consequently, a salient avenue for advancing this research entails the development of strategies for handling knowledge points with “long tail” characteristics. This will involve further refinement of the knowledge point pruning module’s functionality to enhance its rationality and, in turn, augment the precision of the recommendations proffered.

Acknowledgements

This research is supported by the Intelligent Information Processing Research and Innovation Team of Anhui Provincial Department of Education (2024AH010012), Key Research Project of the Anhui Provincial Department of Education(2022AH051867).

Author contributions

Y.Z. is mainly responsible for the project’s primary management, proposal development, and initial draft writing. L.Z. is the corresponding author, responsible for project proposal discussions, experimental design, and implementation. X.C. is involved in data collection and processing. W.L. is responsible for data collection and manuscript revisions.

Data availability

This data was provided by Tsinghua University, and below is a link to the data set used in the manuscript: https://github.com/THU-KEG/MOOCCubeX. Data is provided within the manuscript or supplementary information files.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Lei, W. et al. Estimation-action-reflection: Towards deep interaction between conversational and recommender systems. In Proceedings of the 13th International Conference on Web Search and Data Mining, 304–312 (2020).
  • 2.Yu, J. et al. MOOCCube: A large-scale data repository for NLP applications in MOOCs. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 3135–3142 (2020).
  • 3.Steinberg, L. E-Learning effectiveness and cognitive load. J. Educ. Psychol.56 (3), 425–439 (2022). [Google Scholar]
  • 4.Zhou, K. et al. Towards topic-guided conversational recommender system. In Proceedings of the 28th International Conference on Computational Linguistics, 4128–4139 (2020).
  • 5.Liu, K. et al. MEGCF: multimodal entity graph collaborative filtering for personalized recommendation. ACM Trans. Inform. Syst.41 (1), 1–27 (2023). [Google Scholar]
  • 6.Ong, K., Haw, S. C. & Ng, K. W. Deep learning based-recommendation system: An overview on models, datasets, evaluation metrics, and future trends. In 2nd International Conference on Computational Intelligence and Intelligent Systems, vol. 2019, 128–135 (2019).
  • 7.Batmaz, Z. et al. A review on deep learning for recommender systems: challenges and remedies. Artif. Intell. Rev.52, 1–37 (2019). [Google Scholar]
  • 8.Torkashvand, A., Jameii, S. M. & Reza, A. Deep learning-based collaborative filtering recommender systems: a comprehensive and systematic review. Neural Comput. Appl. (2023).
  • 9.Wei, J. et al. Collaborative filtering and deep learning based recommendation system for cold start Items. Expert Syst. Appl.69, 29–39 (2016). [Google Scholar]
  • 10.Fu, M. et al. A novel deep Learning-Based collaborative filtering model for recommendation System. IEEE Trans. Cybern. 1–13 (2018). [DOI] [PubMed]
  • 11.Wang, C. et al. A light heterogeneous graph collaborative filtering model using textual information. Knowl. Based Syst.234, 10760212 (2021). [Google Scholar]
  • 12.Geng, S., Tan, J., Liu, S., Fu, Z. & Zhang, Y. VIP5: Towards multimodal foundation models for recommendation. Assoc. Comput. Linguist. 9606–9620 (2023).
  • 13.Pazzani, M. J. A framework for collaborative,content-based and demographic filtering. Artif. Intell. Rev.13(5), 393–408 (1999).
  • 14.Li, Y. U. & Qi-han, D. U. YUE Bo-yan,et al.survey of reinforcement learning based recommender systems. Comput. Sci.48 (10), 1–18 (2021). [Google Scholar]
  • 15.Wang, X. et al. HGNN:Hyperedge-based graph neural network for MOOC course recommendation. Inf. Process. Manag.59 (3), 102938 (2022). [Google Scholar]
  • 16.Zhu, Y. et al. Recommending learning objects through attentive heterogeneous graph convolution and operation-aware neural network. IEEE Trans. Knowl. Data Eng.35 (4), 4178–4189 (2021). [Google Scholar]
  • 17.Ma, W. et al. Integrating learners’knowledge background to improve course recommendation fairness: A multi-graph recommendation method based on contrastive learning. Inf. Process. Manag.61 (4), 103750 (2024). [Google Scholar]
  • 18.Ullah, F. et al. Deep Edu:a deep neural collaborative filtering for educational services recommendation. IEEE Access.8, 110915–110928 (2020). [Google Scholar]
  • 19.Gao, M. Luo Y,Hu X.Online course recommendation using deep convolutional neural network with negative sequence mining. Wirel. Commun. Mob. Comput.2022(1), 9054149 (2022).
  • 20.Zhang, X., Liu, S. & Wang, H. Personalized learning path recommendation for e-learning based on knowledge graph and graph convolutional network. Int. J. Softw. Eng. Knowl. Eng.33 (01), 109–131 (2023). [Google Scholar]
  • 21.Zhao, M., Huang, X. & Zhu, L. J,Yu J.Knowledge graph-enhanced sampling for conversational recommendation system. IEEE Trans. Knowl. Data Eng.22 (2022).
  • 22.Zhang, T. et al. Kecrs: towards knowledge-enriched conversational recommendation system. arxiv preprint arxiv:2105.08261 (2021).
  • 23.Yao, H., Yao H. & Ye, D. DHGECON: A multi-round conversational recommendation method based on dynamic heterogeneous encoding. Knowl. Based Syst.273, 110607 (2023). [Google Scholar]
  • 24.Guruge, D. B., Kadel, R. & Halder, S. J. The state of the Art in methodologies of course recommender systems—a review of recent research. Data6 (2), 18 (2021). [Google Scholar]
  • 25.Dehbozorgi, N. & Norkham, A. An architecture model of recommender system for pedagogical design patterns. In 2021 IEEE Frontiers in Education Conference (FIE) 1–4 (2021).
  • 26.Rossi, D. S. V. et al. CAERS: a conversational agent for intervention in MOOCs’ learning processes. In InInnovations in Learning and Technology for the Workplace and Higher Education: Proceedings of ‘The Learning Ideas Conference’, 371–382 (2021).
  • 27.Yao, H., Yao, H. & Ye, D. D. H. G. E. C. O. N. A multi-round conversational recommendation method based on dynamic heterogeneous encoding. Knowl. Based Syst.273, 110607 (2023). [Google Scholar]
  • 28.Cordero, P. et al. A conversational recommender system for diagnosis using fuzzy rules. Expert Syst. Appl.154, 113449 (2020). [Google Scholar]
  • 29.Liu, D. et al. Adaptive hierarchical attention-enhanced gated network integrating reviews for item recommendation. IEEE Trans. Knowl. Data Eng.99, 1 (2020).
  • 30.Wen, P. et al. Neural attention model for recommendation based on factorization machines. Appl. Intell.51, 1829–1844 (2021).
  • 31.Yu, J. et al. MOOCCube:A large-scale data repository for NLP applications in MOOCs. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Item-Based Top-N Recommendation Algorithms, 3135–3142 (2020).
  • 32.Zhang, R., Zhu, X., Zhu, W. Improved sample efficiency by episodic memory hit ratio deep Q-networks. J. Appl. Numer. Optim. ;3(3). (2021).
  • 33.Vilakone, P., Xinchang, K. & Park, D. S. Movie recommendation system based on users’ personal information and movies rated using the method of k-clique and normalized discounted cumulative gain. J. Inform. Process. Syst.16 (2), 494–507 (2020). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

This data was provided by Tsinghua University, and below is a link to the data set used in the manuscript: https://github.com/THU-KEG/MOOCCubeX. Data is provided within the manuscript or supplementary information files.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES