Skip to main content
PLOS One logoLink to PLOS One
. 2023 Dec 22;18(12):e0295808. doi: 10.1371/journal.pone.0295808

Knowledge relation rank enhanced heterogeneous learning interaction modeling for neural graph forgetting knowledge tracing

Linqing Li 1, Zhifeng Wang 1,2,*
Editor: Ayesha Maqbool3
PMCID: PMC10745179  PMID: 38134033

Abstract

Knowledge tracing models have gained prominence in educational data mining, with applications like the Self-Attention Knowledge Tracing model, which captures the exercise-knowledge relationship. However, conventional knowledge tracing models focus solely on static question-knowledge and knowledge-knowledge relationships, treating them with equal significance. This simplistic approach often succumbs to subjective labeling bias and lacks the depth to capture nuanced exercise-knowledge connections. In this study, we propose a novel knowledge tracing model called Knowledge Relation Rank Enhanced Heterogeneous Learning Interaction Modeling for Neural Graph Forgetting Knowledge Tracing. Our model mitigates the impact of subjective labeling by fine-tuning the skill relation matrix and Q-matrix. Additionally, we employ Graph Convolutional Networks (GCNs) to capture intricate interactions between students, exercises, and skills. Specifically, the Knowledge Relation Importance Rank Calibration method is employed to generate the skill relation matrix and Q-matrix. These calibrated matrices, alongside heterogeneous interactions, serve as input for the GCN to compute exercise and skill embeddings. Subsequently, exercise embeddings, skill embeddings, item difficulty, and contingency tables collectively contribute to an exercise relation matrix, which is then fed into an attention mechanism for predictions. Experimental evaluations on two publicly available educational datasets demonstrate the superiority of our proposed model over baseline models, evidenced by enhanced performance across three evaluation metrics.

Introduction

In the age of rapid technological advancement, the ubiquity of networks has ushered in a wealth of information and convenience. This surge in data, commonly referred to as big data, plays a pivotal role in interconnecting diverse sectors such as education, healthcare, and transportation. The fusion of education with information science is an inexorable trend, catalyzing an expansive realm of research in online education technologies. Notably, the application of artificial intelligence and related technologies to analyze substantial educational data yields invaluable insights and enhances the educational landscape [14].

Online educational platforms, like edX, Coursera, and Udacity, have gained widespread adoption for tracking, reporting, and delivering online courses [3, 5, 6]. These platforms offer students a spectrum of advantages, including diverse online courses and personalized learning resources [68]. Moreover, they provide a valuable alternative to traditional classroom teaching, circumventing geographical or weather-related limitations. Teachers leverage these platforms to tailor remedial materials to suit individual student needs [4, 811].

Central to the effectiveness of online education is the task of tracking students’ performance based on their past interactions [12, 13]. This task, known as Knowledge Tracing [14, 15], seeks to gauge students’ knowledge states by analyzing their responses to exercises. Essentially, Knowledge Tracing (KT) assesses students’ mastery of knowledge. Given a set of practice questions Xt = (x1, x2, …, xt) and response logs (e.g., correct or incorrect), KT predicts the likelihood of correctly anticipating students’ knowledge states in subsequent interactions (p(rt+1 = 1|et+1, X)) based on past interactions and responses. Here, xt is presented as the tuple (qt, at), where qt represents the student’s question and at denotes the corresponding response. Refer to Fig 1 for a visual representation of the knowledge tracing model’s framework.

Fig 1. Illustration of the knowledge tracing model’s process.

Fig 1

Numerous knowledge tracing models have emerged to tackle the challenge of monitoring students’ knowledge states, such as Deep Knowledge Tracing (DKT) [16], Dynamic Key-Value Memory Network (DKVMN) [17], Graph-based Knowledge Tracing (GKT) [18], and Self-Attentive Model for Knowledge Tracing (SAKT) [19]. These models exhibit improved predictive performance across several educational datasets. DKT incorporates deep neural networks to forecast student performance. However, it overlooks knowledge point information and student abilities. DKVMN employs nonlinear transformations to learn representations and master levels for each concept, but disregards the similarity between knowledge concepts (KCs) when making predictions. SAKT delves into the KC-exercise relationship for mastery level prediction but ignores the potential organizational structure of past interactions as a graph.

To address these limitations, we present the Knowledge Relation Rank Enhanced Heterogeneous Learning Interaction Modeling for Neural Graph Forgetting Knowledge Tracing (NGFKT). This model rectifies the shortcomings of existing models when predicting student performance from past interactions. The model encompasses four key steps: (1) Skill relation matrix and Q-matrix calibration using the Knowledge Relation Importance Rank Calibration method (KRIRC), which considers the students-exercise-KCs relationship; (2) Inputting the calibrated matrices into a Graph Convolutional Network (GCN) to yield skill and exercise embeddings; (3) Incorporating heterogeneous interactions, item difficulty, skill embeddings, and exercise embeddings to generate an exercise relation matrix; (4) Applying the Position-Relation-Forgetting attention mechanism, inspired by student forgetting behavior, to predict student performance. We validate our model on two datasets, demonstrating its superiority over traditional models.

The contributions of this paper encompass:

  1. Calibration Method for Enhanced Skill Relation Matrix and Q-Matrix: By considering the intricate relationship between knowledge concepts (KCs) within heterogeneous interactions—namely students, exercises, and KCs—the calibration method ensures a more accurate representation. This calibrated skill relation matrix and Q-matrix, coupled with the heterogeneous interactions, serve as inputs for a Graph Convolutional Network (GCN). This integration facilitates the generation of comprehensive exercise embeddings and skill embeddings, effectively capturing the intricate interplay between students, exercises, and KCs.

  2. Exercise Relation Modeling with Enhanced Forgetting Mechanism: In contrast to the GKT model [18], our proposed approach goes beyond surface-level prediction by incorporating the nuanced forgetting behavior of students and relative distance representations. This enhancement is realized through the utilization of the Position-Relation-Forgetting attention mechanism. This attention mechanism is designed to predict student performance, and its integration demonstrates the model’s ability to capture deeper insights into learning dynamics.

  3. Comprehensive Experimental Evaluation: Our evaluation strategy addresses three key aspects. First, we measure the predictive efficacy of the NGFKT model against baseline models using three established metrics. Second, we assess the effectiveness of the NGFKT model even when dealing with limited records. Finally, we employ radar diagrams to visually depict and understand the knowledge tracing outcomes. This meticulous experimental evaluation bolsters the robustness of our proposed model and showcases its superiority in various scenarios.

The rest of this paper unfolds as follows: The next section, “Related Works,” delves into knowledge tracing models, graph neural networks, relation modeling, and attention mechanism. Subsequently, the “Methods” section provides a detailed exposition of NGFKT’s structure. The “Experiments” section presents implementation details and experimental outcomes. Finally, we conclude and highlight avenues for future research in the “Conclusions and Future Work” section.

Related works

Knowledge tracing

Knowledge Tracing (KT) aims to gauge students’ comprehension levels based on their historical interactions. The effectiveness of deep learning techniques in fields like speech processing (e.g., [2023]) and computer vision (e.g., [2427]) has inspired the development of deep learning-based KT models. One pioneering model is the Deep Knowledge Tracing (DKT) [16], which employs neural networks to capture intricate educational processes. The DKT+ model [28] extends DKT by introducing regularization items. However, these approaches overlook the influence of exercise embeddings. The Enhanced Knowledge Tracing (EKT) model [29] utilizes exercise embedding modules for predicting students’ future performance. The Memory Augmented Neural Network (MANN) [30] employs Key and Value metrics to model exercise embeddings and make predictions. Yet, these methods often neglect relation modeling to predict students’ knowledge state. To address this gap, the Self-Attentive Knowledge Tracing (SAKT) model [19] explores relationships between students, exercises, and skills in past interactions. Additionally, knowledge tracing can be framed as a graph-based task. The Graph-based Knowledge Tracing (GKT) model [18] formulates knowledge tracing as a time series node-level classification problem within a graph. However, the GKT model does not take exercise relation modeling and the heterogeneous interactions into consideration. Therefore, the our model is proposed to handle these limitations and further improve the prediction performance of the knowledge tracing model.

Graph neural networks

Graph structures, which can represent connections and entities in the real world, are more intricate than simple tree structures [31]. Graph Convolutional Networks (GCNs), a unique type of neural network, operate directly on graph-structured data. A GCN updates node representations by considering the nodes themselves and their neighbors [32]. By incorporating multiple graph convolutional layers, GCNs ensure that updated nodes accurately capture both higher-order neighbor features and neighboring node features.

Several knowledge tracing models explore the utility of knowledge graph structures. Chanaa et al. [33] employ a dynamic graph structure in which nodes represent students and edges evolve over time. However, this method overlooks the heterogeneous interactions among students, exercises, and knowledge concepts (KCs). The Graph-Infused Knowledge Tracing (GIKT) model [13] treats the exercise-skill relationship graph as a bipartite graph and utilizes embedding propagation within GCNs to incorporate exercise-skill correlations. Nevertheless, GIKT doesn’t fully account for student behavior. Our proposed Neural Graph Forgetting Knowledge Tracing (NGFKT) model uses skill relation matrices and Q-matrices to model exercise-KC relationships, employing an attention mechanism enhanced with a forgetting layer to predict performance.

Relation modeling

Relation modeling, including exercise relation modeling, is crucial in knowledge tracing tasks [34]. When two exercises are connected through shared knowledge concepts or students, their relation becomes significant [35]. Additionally, the Q-matrix can reveal relationships between exercises and skills, unveiling connections between exercise pairs [36]. Intuitively, if two problems share similar difficulty and past practice, they can be considered related [37, 38]. Building on these concepts, NGFKT integrates item difficulty, exercise embedding, skill embedding, and a contingency table to formulate a comprehensive exercise relation matrix.

Attention mechanism

The attention mechanism is a powerful tool for sequence modeling tasks [3941]. It enhances predictions by focusing on crucial input elements. The mechanism determines attention weights for input vectors, enabling it to extract relevant words in machine translation tasks [4246]. In our model, we introduce a novel attention mechanism to predict student performance. This mechanism incorporates exercise relation modeling along with a forgetting behavior component, further enhancing prediction accuracy.

Motivation

This paper aims to provide an innovative knowledge tracing model: NGFKT to estimate the student knowledge state based on the heterogeneous interactions between students, exercises, and skills. Specifically, the first motivation is to design a calibration method to generate the Q matrix and skill relation matrix. The second motivation is to develop an exercise relation modeling method to obtain an exercise relation matrix. Finally, incorporating the attention mechanism with the relation matrix predicts student performance.

Preliminaries

In this section, on the one hand, we summarize the important mathematical notation covered in this paper in Table 1, and on the other hand, we give important terminology and problem definitions.

Table 1. Important notations.

Notations Description
KT The knowledge tracing model
KC Knowledge concepts
Q^ The calibrated Q-matrix
S^ The calibrated skill reltion matrix
M The calibrated matrix
a rank The knowledge relation importance rank of the element: “a”
b rank The knowledge relation importance rank of the element: “b”
K The number of the knowledge concepts
N The number of the questions
σ The standard deviation
e^ Skill-exercise embedding
q One question in the question set
s One student in the student set
ψ s,q,t the cognitive difficulty of the question: q for the student: s
R E The exercise relation matrix
I i The input element of the Position-Relation-Forgetting attention mechanism

Terminologies

Heterogeneous interactions

There are numerous interactions between students, exercises, and skills referring to the Fig 2. Heterogeneous interaction (H) is formed up of an object set, V, and a link set, E. Students, exercises, and skills are all object types in V. E is a set of relational types of the type E = (rA,rC). The rA indicates the response of exercises and rC means the relation it involved.

Fig 2. The heterogeneous interaction between student, exercise, and skills.

Fig 2

Knowledge tracing

Knowledge Tracing (KT) evaluates learners’ knowledge mastery. KT predicts the likelihood of effectively estimating students’ knowledge states in coming meets given a set of test questions and answer logs (e.g., correct or incorrect).

Student modeling

At the same time, based on the results of knowledge tracking, we can model student groups and individual students. Specifically, if the input is a student group, then the result of our knowledge tracking is the modeling of the student group. If the input is an individual student, then the knowledge tracking is the modeling of a single student.

Exercise modeling

In this paper, we model the exercises based on the exercise relationship on the heterogeneous interaction graph. Specifically, we apply the graph neural network to process the exercises and discuss the difficulty of the exercises to generate the final exercise embedding.

Knowledge modeling

In order to comprehensively model the knowledge in this paper, we apply the knowledge relation importance rank calibration method to model the knowledge contained in the exercises and generate the skill relation matrix. Then this matrix is treated as the input of GCN to generate the hidden embedding of exercises.

Problem definition

The computerized system generates the student’s responding records given a question set of n exercises (e.g., e0, e1, e2en) for the student in the smart educational system from Timestamps 1 to t. Those interactions are designated as S = (s1, s2, s3…‥ sn), and each interaction si is presented as a tuple: si = (ei,ri,ti), where ei is the exercise that this student attempted, ri ∈ {0, 1} is the student’s answer, and ti is the time at which si happens. Our goal is to estimate the knowledge state of students based on previous information. Specifically, we input the heterogeneous interaction graph and previous student records: S, the output of the knowledge tracing model is the knowledge state of students.

Methods

In this section, the NGFKT model is developed based on the exercise relation matrix and Position-Relation-Forgetting attention mechanism to predict the performance of students. There exist three steps including the skill relation matrix and Q-matrix modeling, the extraction of the exercise relation matrix, and the predictions based on the Position-Relation-Forgetting attention mechanism. First, the skill relation matrix(S^) and the calibrated Q-matrix (Q^) are designed based on the KRIRC method and serve as the input of the model. Then, the skill relation matrix S^, and the calibrated Q-matrix (Q^) are treated as the inputs of the GCN to output the embedding of exercises and skills in the heterogeneous interactions. The similarity of exercises can be computed based on the exercise embedding, skill embedding, item difficulty, and contingency table to generate the corresponding exercise relation matrix(RE). Finally, the exercise relation matrix(RE) serves as the input of the Position-relation-Forgetting attention mechanism to track the student knowledge state. The overall structure can refer to Fig 3.

Fig 3. The overall structure of the Neural Graph Forgetting Knowledge Tracing (NGFKT).

Fig 3

Firstly, according to previous interactions, the past interactions of students and knowledge levels can be extracted from the inputs. Then KRIRC method is designed to generate the calibrated skill relation matrix and Q-matrix. The skill relation matrix, Q-matrix, and the cognitive difficulty of each student are treated as the inputs of the GCN to generate the skill-exercise embedding: e^. Secondly, the e^, the item difficulty of each question, and the contingency table are combined to output the exercise relation matrix. Finally, the Position-Relation-Forgetting attention mechanism is utilized to process the inputs to make predictions.

Skill relation matrix and Q matrix modeling

We design this part to extract the heterogeneous information between students, exercises, and skills. The basic idea of this part is that the related skill can be considered as the skill that is covered by the same exercise. Considering the hierarchical knowledge levels of skill, the related skill also can be defined as the parent nodes and child nodes of the skill. For instance, the knowledge concept: the “Triangle” is viewed as the “Right Triangle”’s parent node, and the “Pythagorean Theorem” is the child node of the “Right Triangle”. Therefore, the related skills of the “Right Angle” are the concepts: “Triangle” and “Pythagorean Theorem”. Inspired by these two ideas, the importance of the skills is ranked based on the following partial order, and the calibrated method called the knowledge Relation Importance Rank Calibration method is proposed(KRIRC). A pairwise Bayesian treatment is as follows. For convenience, we define a partial order >i+ as:

a>i+b>i+>c>i+d (1)

where “a”, “b”, “c”, and “d” are the neighbors of the skill. Here, “a” implies the skill of the parent node, knowledge level is 0, in the knowledge level graph. “b” denotes the skill of the child node, knowledge level is 1, in the knowledge level graph. “c” can be interpreted in a similar way. “d” is the skill that is covered by the same exercise. Along this line, neighbor: “a” is more important than neighbor: “b” in extracting the neighbor of the skill. The rank of the skill: “a”, “b”, “c”, and “d” are 0, 1, 2, and 3 respectively. Thereby, according to the Eq 1, the partial order relationship set can be defined as DKRIRC={(i,a,b)|a>i+b,i=1,2,3K} where K is the number of knowledge concepts. Based on the traditional Bayesian method, we assume the calibration matrix: M^ such as Q^ or S^ uniforms the Gaussian distribution with the standard deviation of each dimension. To give the calibration matrix labels higher confidence, we define p(a>i+b|M^) as follows:

p(a>i+b|M^)=11+eλ(arank-brank) (2)

where λ controls the discrimination of relevance values of different knowledge levels. And the arank and brank means the importance of the element a or b. The log posterior over DKRIRC on M^ can be eventually computed as:

lnp(M^|DKRIRC)=ln(i,a,b)DKRIRC1p(a>i+b|Mi^)p(Mi^)=i=1Na=1Kb=1KI(a>i+b)ln11+eλ(arank-brank)+C-i=1Nj=1KM^ij22σ2 (3)

where I(*) is the judgment function when the function’s condition is met and the function output 1. The σ is the standard deviation in Eq 2 and C, the constant variable, can be ignored to train the matrix. Finally, a calibrated matrix M^ estimated by the KRIRC can be calculated. The Q-matrix and the skill relation matrix can apply the KRIRC method to obtain the calibrated Q-matrix and the calibrated skill relation to reflect the hierarchical knowledge levels between different knowledge points. The specific algorithm can be referenced as follows:

Algorithm 1: Knowledge Relation Importance Rank Calibration Method.

Input: Students’ historical response dataset: D = s1, s2, …sN, si = (ei, si, ti);

   The knowledge level graph G; The heterogeneous relation graph: τ;

   Task-learning rate: α; Hyper-parameter λ;

Output: The calibrated Q matrix: Q^; The calibrated skill relation matrix: S^.

1 initialization learning rate α and hyper-parameter λ randomly

2 while element in G and τ do

3  Extract hierarchical knowledge levels and related skills of each element based on G and τ

4  Generate the corresponding knowledge rank using Eq 1

5 end

6 while not converged do

7  while element in Q^ and S^ do

8   Evaluate calibrated skill element qij using Eq 2;

9   Replace the element in Q^ and S^ with a calibrated element

10end

11  Update parameters learning rate α and hyper-parameter λ

12 end

Exercise relation modeling

The exercise matrix modeling is designed based on two processes. Firstly, skill-exercise embedding:e^ is calculated by applying the GCN. Secondly, the item difficulty is modeled to estimate the difficulty of each question. And e^, item difficulty, and the contingency table are incorporated to generate the final exercise relation matrix:RE.

The basic idea of the GCN model is to extract the neighbor information to generate the target node embedding. For the GCN model, we apply it to process the relationship of exercises and skills to generate the hidden exercise embedding. In this case, the heterogeneous information can be extracted from the skill relation matrix and the Q-matrix based on previous matrix modeling. Then, based on the knowledge tracing results, we can model the student states to explore the student information in the heterogeneous interactions. These matrices are served as the input of GCN. The GCN model consists of numerous convolutional layers, and each layer can be updated by the states of itself and the neighbors of nodes. The ith node in the graph donated as nodei indicating the skill state si or exercise state ei. The neighboring nodes of nodei are denoted as the node set: Node(i). As a result, the ı th layer of the graph convolutional network can be updated as follows:

nodeiı=RELU(jiNode(i)wiınodejı-1+biı) (4)

where wı and bı infer as the weighted matrix and bias of the GCN layer and the RELU() indicates the activation function accepted in the GCN model. The output of the GCN model:e^ is used for estimating the implicit relations among questions by calculating the inner product of questions:

simii,j=ei^·ej^|ei^||ej^| (5)

Then, in order to evaluate the exercises’ similarity, we further incorporate the item difficulty with the previous students’ performance to generate an exercise relation matrix: RE. The basic idea is that we apply the student incorrect interactions to estimate the item difficulty and seven metrics, which are used to measure the performance of different variables, to estimate the exercises relation.

For the item difficulty modeling, the students’ incorrect interactions can intuitively represent the item difficulty of the exercises involved in the student interactions. Specifically, the students repeat the same questions by utilizing their skills in different timestamps, which can demonstrate the cognitive difficulty of these exercises. In order to model this situation, the cognitive difficulty for the students: s can be defined as follows:

Ψs,q,t={[|{Rs==0}|0:t|Qq|0:t*4]if|Qq|0:t55otherwise (6)

where Ψs,q,t indicates each student’s cognitive difficulty of the question set at timestamp t. The cognitive difficulty is divided into 5 levels including very hard, hard, medium difficulty, relatively easy, and easy. The number ranging from 0 to 5 is accepted to indicate the corresponding levels. The |Q| denotes the set of questions containing the question q before timestamp t and Rs refers to the student’s response to the same questions. A zero in the Rs indicates that the student provides a wrong answer for a question. If a learner attempts to answer a question fewer than five times, the cognitive difficulty of this question is directly quantified into 5. Then according to the different learners’ cognitive difficulty of questions, the average cognitive difficulty for different learners on the same question is defined as the item difficulty: φ(q) after processing the cognitive difficulty for different learners.

φ(q)=1|Ψs,q,t|Ψ(si,q,t)Ψ(s,q,t)Ψ(si,q,t) (7)

Then, according to the item difficulty: φq, the similarity of question difficulty: diff can be modeled as follows:

diff(qi,qj)=11+φ(qi)-φ(qj) (8)

In order to incorporate the previous interaction, the student’s performance on question pair qi and qj is summarized in the contingency table. The students’ correct and incorrect responses are interpreted as the mastery indicators of the questions referring to Table 2. When a question pair appears more than once in the previous student interactions, the latest occurrence is taken into consideration. According to the contingency table, seven evaluation metrics, measuring the association between two variables, are developed to measure the relationship between the question pair: ei and ej referring to Table 3. A threshold is imposed to control the sparsity of relations of exercises. The output of the contingency table is denoted as WR, R ∈ {SK, Kappa, Kappa′, Phi, Yule, Ochiai, Sokal, Jaccard.}.

{Wi,jR=max(Wi,jR,Wj,iR),Wj,iR=0Wi,jRWj,iRWj,iR=max(Wj,iR,Wi,jR),Wi,jR=0otherwise (9)

Table 2. The contingency table for exercise i and exercise j.

The labels: “F” and “T” present the student answering the exercise incorrectly or correctly.

exercise i
F T total
exercise j F a b a+b
T c d c+d
total a+c b+d a+b+c+d

Table 3. Seven evaluation metrics.

These metrics are designed to explore the association between two variables.

Metrics Formulation
Cohen’s Kappa Wei,ejKappa=2(ad-bc)/(a+b)(b+d)+(a+c)(c+d)
Adjusted Kappa Wei,ejKappa=2(ad-bc)/(a+c)(c+d)
Phi coefficient Wei,ejPhi=(ad-bc)/a+bb+da+cc+d
Yule coefficient Wei,ejYule=(ad-bc)/(ad+bc)
Ochiai coefficient Wei,ejOchiai=a/a+ba+c
Sokal coefficient Wei,ejSokal=(a+d)/a+b+c+d=(a+d)/(a+b+c+d)
Jaccard coefficient Wei,ejJaccard=a/(a+b+c)

Finally, the relation of exercise: i with exercise: j is calculated as follows:

Ai,j={μ1simii,j+μ2diff(qi,qj)+μ3Wi,jRifμ1simii,j+μ2diff(qi,qj)+μ3Wi,jRΘ0otherwise (10)

where Θ is a threshold to control the sparsity of the exercise relation matrix. Then Given the past exercises: (e1, e2, …en−1) and the next exercise: en, the exercise relation matrix is defined as RE=[Aen,e1, Aen,e2,Aen,en-1]. Finally, the exercise relation matrix is applied as the input of the Position-Relation-Forgetting attention mechanism.

Position-Relation-Forgetting attention mechanism

The Position-Relation-Forgetting attention mechanism includes the relative position attention layer, the relation attention layer, and the forgetting layer. The basic idea of the relative position attention aims to replace the positional encoding of input embedding with relative distance embedding. Specifically, the maximum absolute value of k is used to clip the distance of words in the input embedding. The relative position attention accepts the relative distance between input elements. Ii and Ij serve as the model inputs to track the student state. And edge vectors between Ii and Ij are presented as ai,jv,ai,jK to extract relative position representations. The edge vectors are clipped to a maximum absolute value of k: clip(x,k) = max(-k, min(k,x)). And corresponding relative position representations are WK=(W-kk.WkK) and WV=(W-kV.WkV) respectively. The outputs of the relative position attention mechanism are new sequences Z. The process can refer to the following equations:

ai,jK=Wclip(j-i,k)kai,jV=Wclip(j-i,k)V (11)
ai,j=exp(mi,j)i=1nexp(mi,k)mi,j=IiWQ(IjWk)T+IiWQ(ai,jK)Tdz (12)
Zi=j=1naij(IjWV) (13)

where WQ, WK, and WV are the query, key, and value matrices respectively and dz is the dimension of the new sequence of Z. Then, the relation attention predicts student performance on the next interaction by combining the outputs of the relative position attention mechanism.

The relation attention layer incorporates the output of the formula (13) with the exercise relation matrix to predict student performance on the next interaction. The basic idea of the relation attention layer is to incorporate the relation matrix into the attention mechanism to consider the relational information in data.

hi=EenWQ(ZjWK)Tdαi=exp(hi)k=0n-1exp(hk) (14)
γi=δαi+(1-δ)RiEH=i=1n-1γiZiWv (15)

where WQ, WK, and WV represent the query, key, and value matrices of the attention mechanism and Een means the embedding of exercises.

Then applying the output of the relation attention layer is treated as the input of the forgetting layer based on learning theory in the educational field. The basic idea of part is to utilize the time interval between past time and current time to estimate the student forgetting curve. The relative time intervals between past and next interactions are compared as △i = tnti. The final outputs of the Position-Relation-Forgetting attention mechanism incorporating forgetting behavior, RF, is computed as follows:

RF=[ξ1e-ξ21,ξ1e-ξ12ξ1e-ξ2n]O=δFH+(1-δF)RF (16)

where ξ1 and ξ2 are hyper-parameters.

Student performance prediction layer

The student performance prediction layer contains the pointwise Feed-Forward (FFN) and probability prediction layer. The FFN can be computed as follows referring to(17). Wl and Ws, bl and bs are weighted matrices and bias vectors respectively.

F=ReLU(OWl+bl)Ws+bs (17)

The probability prediction layer predicts the probability of the student’s performance by accepting function: σ() on the basis of the FFN. P denotes as the probability that the students provide correct answers in the next interaction. The W and b are trainable parameters.

p=σ(FW+b) (18)

Experiments

Implementation details

Framework setting

The model dimension of attention, the max sequence length, and the training batch size are 200. The dropout rate of the NGFKT model is 1e-2. And the hyper-parameters, including the λ, Θ, are 1 and 0.65 respectively. The parameters in the exercise relation modeling: μ1, μ2, and μ3 are 0.1, 0.2, and 0.7 respectively. The other parameters that are not specified involved in the process of the training model are normally initialized as 0. The details can refer to the following Table 4.

Table 4. The framework setting for the Neural Graph Forgetting Knowledge Tracing model(NGFKT).
Parameters Setting
Max sequence length 200
Training batch size 200
Dropout rate 0.01
λ 1
Θ 0.65
μ 1 0.1
μ 2 0.2
μ 3 0.7

Evaluation metrics

There exist three metrics to measure the performance of our model including the Area Under Curve(AUC), Accuracy(ACC), and Performance Stability(PS). The prediction task is evaluated in a binary classification scenario, i.e., whether or not an exercise is performed correctly. As a result, the AUC and ACC are accepted to measure the prediction performance of students. AUC or ACC values of 0.5 usually indicate that the result was determined at random. The greater the knowledge tracing performance, the higher the value of AUC or ACC. The cross-entropy is accepted as the loss function of the NGFKT model.

The Performance Stability metric(PS) is used to specifically compare the performance of the baseline models with the NGFKT model in the testing phase. The performance of the model: M is stable when the M can consistently outperform other models in most testing batches. Based on this idea, the PS is designed based on the performance rank. For instance, if the NGFKT model outperforms the DKT model and DKT+ model in 96 testing batches. However, the performance of the NGFKT model is worse than the DKT+ model in 4 testing batches. The performance rank of the NGFKT model is 1 in 94 testing batches and 2 in 4 testing batches. Then the PS of the NGFKT model is 97.32% referring to the following formulation. The NBatch and Nmodel are the number of the testing batches and models in this paper.

PS(M)=i=0NBatchNmodel-rank(M,i)+1Nmodel (19)

Datasets

In order to demonstrate the performance of the NGFKT model in the small dataset and the large dataset, two types of public educational datasets were accepted to verify the performance of the NGFKT model in terms of AUC, ACC, and PS.

The statistical information of the datasets is provided in Table 5. There are two public educational datasets to validate the performance of the NGFKT model.

Table 5. The overview of ASSIST2012 and Eedi based on different intelligent tutoring systems.

Statistic ASSIST2012 Eedi
Number of records 4193631 233767
Number of students 39364 2064
Number of questions 59761 948
Avg exercise record/student 107 113

ASSISTments2012(ASSIST2012)

The first dataset is the large dataset: ASSISTments2012(ASSIST2012) (available at: https://sites.google.com/site/assistmentsdata/datasets/2012-13-school-data-with-affect), which was gathered by the Assistemt Online Tutoring platforms. The ASSIST2012 contains 4193K records of 39K students. Each student answers an average of 107 questions.

There are several datasets that are collected by Assistemt Online Tutoring platforms. The reason for choosing ASSIST2012 is that the NGFKT model needs the timestamp feature in the dataset to model the forgetting behavior of students. However, other datasets Assistemt Online Tutoring platforms do not gather the timestamp feature to make predictions. Therefore, ASSIST2012 is accepted as the training dataset to validate the performance of the NGFKT model.

Eedi

The second dataset is the small dataset: the Eedi dataset (available at: https://eedi.com/projects/neurips-education-challenge) collected by the NeuralIPS platform, which includes 233K records and 2064 students. And an average of 113 exercises are provided for each student. In this paper, tasks 3 and 4 in the NeuralIPS system are applied to track the knowledge state of students.

Results and discussion

Student performance prediction

The experimental results are presented in Table 6. The AUC, ACC, and PS are computed to compare the performance of different models. In order to verify the prediction performance of student abilities, the training set and testing set are divided into 80% records of the dataset and 20% records of the dataset respectively.

Table 6. Comparison of results of baseline models with the Neural Graph Forgetting Knowledge Tracing model(NGFKT).

The NGFKT outperforms all baseline models in terms of AUC, ACC, and PS.

ASSIST2012 Eedi
AUC ACC PS AUC ACC PS
DKT 0.712 0.679 0.142 0.489 0.489 0.179
DKT+ 0.722 0.685 0.232 0.584 0.566 0.501
DKVMN 0.701 0.686 0.392 0.701 0.640 0.829
SAKT 0.736 0.692 0.732 0.495 0.495 0.216
GKT 0.702 0.687 0.491 0.566 0.542 0.354
NGFKT 0.776 0.704 0.960 0.710 0.673 0.937

In two educational datasets, the NGFKT model is compared with five baseline models, including the DKT model, the DKT+ model, the DKVMN model, the SAKT model, and the GKT model. As Table 6 illustrated, the DKT+ model outperforms the DKT model on two datasets due to the fact that two regularization terms are accepted by the DKT+ model to solve the reconstruction problem and the consistency problem in the DKT model. However, knowledge concept modeling is ignored in the DKT model and DKT+ model. Therefore, the DKVMN model applies the nonlinear transformations and student master level on each KC to make predictions and performs better than the DKT+ model and DKT model on the Eedi dataset. However, the DKVMN model performs worse than the DKT model and DKT+ model on the ASSIST2012 because the number of questions is so large on the ASSIST dataset that the Knowledge concepts modeling is excessively complex. Compared with the DKVMN model, the SAKT model further applies the attention mechanism to process the data and discovers the relation between KCs to obtain better performance than the DKVMN model on the ASSIST2012 dataset. However, the SAKT model possesses worse performance than the DKVMN model on the Eedi dataset. The reason is that the number of KCs on the Eedi is less than the ASSIST so the relation modeling of KCs in the SAKT model possesses less effect on the prediction results. In addition, the GKT model does not perform as well as we thought because considering the nodes of the graph established based on two datasets is relatively sparsity. Therefore, the performance of the GKT model is worse than most baseline models. These baseline models do not consider the relation modeling of the exercises and forgetting behavior of students. Therefore, the NGFKT model is proposed to solve these drawbacks. The NGFKT model incorporates the exercise relation modeling with the novel attention mechanism: Position-Relation-Forgetting attention mechanism to make predictions. The NGFKT model performs consistently better than all baseline models in terms of three evaluation metrics.

Ablation experiments

This part aims to find out the key components of our model and validate the performance of these key components. There exist two variants models of the NGFKT model: the NGFKT-PE and the NGFKT-RM, which means that the NGFKT model removes the relative position encoding(PE) or relation modeling(RM)respectively. Specifically, the positional encoding and relation modeling are implemented by the position attention layer and the relation attention layer with the forgetting attention layer. the According to Table 7, some conclusions can be drawn.

Table 7. The ablation experiments of the Neural Graph Forgetting Knowledge Tracing model(NGFKT).
ASSIST2012 Eedi
AUC ACC PS AUC ACC PS
NGFKT-RM 0.740 0.678 0.678 0.684 0.637 0.804
NGFKT-PE 0.753 0.689 0.860 0.691 0.641 0.679
NGFKT 0.776 0.704 0.960 0.710 0.673 0.937

Firstly, the individual components of the position attention layer and the relation attention layer do not generate satisfactory outcomes when used alone. The performance of the model is gradually better when more components are involved in the model. Secondly, removing the RM component leads to a significant drop, decreasing to 0.740 and 0.684 respectively compared with removing the PE components. Therefore, the RM is a more crucial component for the NGFKT model when the model makes predictions.

Cold start problem

The cold start issue also impacts knowledge tracing when the model predicts the knowledge state of students. The results are examined in two cold start scenarios, i.e., training with data from a small number of students and new students with abbreviated exercise sequences [47].

  1. The knowledge tracing model is trained using data from a limited number of students, and it is then applied to completely new, untested samples.

  2. When new students enroll in an online learning system, they often have short exercise sequences since there aren’t enough exercise recordings to provide the knowledge tracing model with enough information about them.

In scenario 1, student populations ranging from 10% to 20% of the training dataset are tested to evaluate the prediction performance of NGFKT, DKT, and DKT+ referencing Fig 4. Specifically, the NFGKT model outperforms the DKT model and DKT+ model on the Eedi dataset because the NFGKT model considers the exercise relation modeling and applies the attention mechanism to process the data. And the accuracy of the NGFKT model remains at around 68%. In addition, the performance of the DKT model fluctuates relatively. Compared with the DKT model, the DKT+ model solves the reconstruction problem and the consistency problem to further improve the performance to predict the student knowledge state. The DKT+ model’s performance is still at about 53%.

Fig 4. AUC on the Eedi dataset is compared when the number of the students is 10%, 12%, 14%, 16%, 18%, and 20% respectively.

Fig 4

The Neural Graph Forgetting Knowledge Tracing Model outperforms the other two models.

In scenario 2, which contains new students who have short exercise sequences, the training data are separated into six groups. Each group has a distinct range of exercise sequences, such as (50, 75], (75, 100], (100, 125], (125, 150], (150, 175], and (175, 200]. The lengths of the exercises from the original exercise sequence are sampled to generate each exercise sequence in the training data. Given that students in the first group have the fewest exercise answering records, it is clear that this situation is the most challenging for the student. In Fig 5, the effectiveness of the various techniques is compared. On the Eedi datasets, the DKT+ model performs better than the DKT model by accepting two regularization items to enhance the model performance. However, the relation modeling part of the DKT+ model is ignored when the DKT+ model predicts the performance of students. Therefore, the NGFKT model solves this problem and shows better performance in this scenario than DKT and DKT+ by considering the relation modeling of the exercises and forgetting behavior of students.

Fig 5. AUC on the Eedi dataset is compared when students are provided with 50, 75, 100, 125, 150, 175 and 200 answer records respectively.

Fig 5

The Neural Graph Forgetting Knowledge Tracing Model achieves better results than the other two models.

Knowledge state prediction visualization

Knowledge state prediction visualization is regarded as an important application of knowledge tracing models for online educational systems. We will show that our proposed model: the NGFKT model can capture the student knowledge state correctly compared with two standard knowledge tracing models: the DKT model and the DKT+ model. Specifically, Fig 6 indicates the knowledge state traced by the NGFKT model of the same student. The general knowledge evolving process of the knowledge state is consistent with the student learning process. When the student first attempts the exercise, the knowledge state reaches the minimum level. The student continues to learn skills: “32”, “49”, and “71” and continuously deepens his proficiency in knowledge points. Finally, the student knowledge state achieves the maximum, which is shown by the increased areas of the radar diagram. During the latest attempt of the student, the knowledge proficiency of the student presents some reduction considering the student forgetting behavior. However, knowledge proficiency is still improved by continuously practicing the skills compared with the first interaction with the student.

Fig 6. The radar diagram.

Fig 6

The NGFKT model outperforms the DKT model and the DKT+ model in tracking the student knowledge state. The “32”, “49”, and “71” are three skill ids and are presented with three different colors. The average prediction accuracy of the NGFKT model is around 70.5%.

Referring to Fig 6, the NGFKT model also outperforms the DKT model and the DKT+ model because the NGFKT model further incorporates the relation modeling that is generated by the GCN model. The DKT+ model achieves better results than the DKT model, which indicates that adding two regularization terms can further improve the performance of tracing the knowledge proficiency of the student.

Conclusions and future work

In this paper, we introduced a novel knowledge tracing model, NGFKT, designed to accurately track students’ knowledge states by integrating relation modeling, skill relation matrices, Q-matrices, and relative distance representations. The NGFKT model effectively predicts students’ knowledge levels even with limited interaction data. Our approach employed the KRIRC method to calibrate skill relation matrices and Q-matrices, serving as inputs to a Graph Convolutional Network (GCN) for generating exercise and skill embeddings. By combining skill-exercise embeddings, item difficulty, and the contingency table, we derived the final exercise relation matrix. Employing the Position-Relation-Forgetting Attention mechanism yielded accurate predictions. Our experiments on two public datasets demonstrated the NGFKT model’s efficiency in tracking students’ knowledge states. The NGFKT model’s blend of explanatory and predictive power holds promise for enhancing the design of Online Educational Systems.

In this paper, we properly model the student, exercises, and skills to design an effective knowledge tracing model to estimate the knowledge state of students when considering the heterogeneous graph between them and the exercise difficulty. However, this paper still has two limitations of the knowledge tracing model. Firstly, the skill relation model should take exercise content into consideration to make the prediction more accurate. Secondly, other student behaviors such as guessing behavior or slipping behavior should be further discussed when the student factor is modeled.

In the future, we envision incorporating exercise texts into the knowledge tracing model’s design and integrating more nuanced student behaviors—such as the guessing and slipping factors—to enhance performance prediction. Additionally, we plan to expand the knowledge model’s scope to develop an innovative online intelligence system for recommending exercises to students. This system aims to create a seamless and tailored learning experience that adapts to individual student’s needs and preferences.

Acknowledgments

The authors thank Prof. Chunyan Zeng from Hubei University of Technology for checking the first draft.

Data Availability

All relevant data are within the paper.

Funding Statement

This study was funded by the National Natural Science Foundation of China [62177022, 61901165, 61501199], the AI and Faculty Empowerment Pilot Project [CCNUAI&FE2022-03-01], the Collaborative Innovation Center for Informatization and Balanced Development of K-12 Education by MOE and Hubei Province [xtzd2021-005], and the Natural Science Foundation of Hubei Province [2022CFA007] in the form of grants to ZW.

References

  • 1. Sun Z, Anbarasan M, Praveen Kumar D. Design of online intelligent English teaching platform based on artificial intelligence techniques. Computational Intelligence. 2021;37(3):1166–1180. doi: 10.1111/coin.12351 [DOI] [Google Scholar]
  • 2. Cui Y, Ma Z, Wang L, Yang A, Liu Q, Kong S, et al. A survey on big data-enabled innovative online education systems during the COVID-19 pandemic. Journal of Innovation & Knowledge. 2023;8(1):100295. doi: 10.1016/j.jik.2022.100295 [DOI] [Google Scholar]
  • 3.Ma H, Huang Z, Tang W, Zhang X. Exercise recommendation based on cognitive diagnosis and neutrosophic set. In: 2022 IEEE 25th International Conference on Computer Supported Cooperative Work in Design (CSCWD). IEEE; 2022. p. 1467–1472.
  • 4.Kang W, Zhang L, Li B, Chen J, Sun X, Feng J. Personalized exercise recommendation via implicit skills. In: Proceedings of the ACM Turing Celebration Conference-China; 2019. p. 1–6.
  • 5. Hamid SNM, Lee TT, Taha H, Rahim NA, Sharif AM. E-content module for Chemistry Massive Open Online Course (MOOC): Development and students’ perceptions. JOTSE: Journal of Technology and Science Education. 2021;11(1):67–92. doi: 10.3926/jotse.1074 [DOI] [Google Scholar]
  • 6. Idris N, Yusof N, Saad P, et al. Adaptive course sequencing for personalization of learning path using neural network. Int J Advance Soft Comput Appl. 2009;1(1):49–61. [Google Scholar]
  • 7.N Bezus S, A Abduzhalilov K, K Raitskaya L. Distance Learning Nowadays: the Usage of Didactic Potential of MOOCs (on platforms Coursera, edX, Universarium) in Higher Education. In: 2020 The 4th International Conference on Education and Multimedia Technology; 2020. p. 14–19.
  • 8. Xia J, Li G, Cao Z. Personalized exercise recommendation algorithm combining learning objective and assignment feedback. Journal of Intelligent & Fuzzy Systems. 2018;35(3):2965–2973. doi: 10.3233/JIFS-169652 [DOI] [Google Scholar]
  • 9. de Jong PG, Pickering JD, Hendriks RA, Swinnerton BJ, Goshtasbpour F, Reinders ME. Twelve tips for integrating massive open online course content into classroom teaching. Medical Teacher. 2020;42(4):393–397. doi: 10.1080/0142159X.2019.1571569 [DOI] [PubMed] [Google Scholar]
  • 10. Ai F, Chen Y, Guo Y, Zhao Y, Wang Z, Fu G, et al. Concept-Aware Deep Knowledge Tracing and Exercise Recommendation in an Online Learning System. International Educational Data Mining Society. 2019;. [Google Scholar]
  • 11. Chen Y, Li X, Liu J, Ying Z. Recommendation system for adaptive learning. Applied psychological measurement. 2018;42(1):24–41. doi: 10.1177/0146621617697959 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Tomasevic N, Gvozdenovic N, Vranes S. An overview and comparison of supervised data mining techniques for student exam performance prediction. Computers & education. 2020;143:103676. doi: 10.1016/j.compedu.2019.103676 [DOI] [Google Scholar]
  • 13.Yang Y, Shen J, Qu Y, Liu Y, Wang K, Zhu Y, et al. GIKT: a graph-based interaction model for knowledge tracing. In: Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2020, Ghent, Belgium, September 14–18, 2020, Proceedings, Part I. Springer; 2021. p. 299–315.
  • 14. Liu S, Zou R, Sun J, Zhang K, Yang J. A Hierarchical Memory Network for Knowledge Tracing. Expert Systems with Applications. 2021;177(4):114935. doi: 10.1016/j.eswa.2021.114935 [DOI] [Google Scholar]
  • 15.Jia S, Liu X, Zhao P, Liu C, Sun L, Peng T. Representation of job-skill in artificial intelligence with knowledge graph analysis. In: 2018 IEEE symposium on product compliance engineering-asia (ISPCE-CN). IEEE; 2018. p. 1–6.
  • 16.Piech C, Bassen J, Huang J, Ganguli S, Sahami M, Guibas LJ, et al. Deep knowledge tracing. Advances in neural information processing systems. 2015;28.
  • 17.Zhang J, Shi X, King I, Yeung DY. Dynamic key-value memory networks for knowledge tracing. In: Proceedings of the 26th international conference on World Wide Web; 2017. p. 765–774.
  • 18.Nakagawa H, Iwasawa Y, Matsuo Y. Graph-based knowledge tracing: modeling student proficiency using graph neural network. In: IEEE/WIC/ACM International Conference on Web Intelligence; 2019. p. 156–163.
  • 19.Pandey S, Karypis G. A self-attentive model for knowledge tracing. arXiv preprint arXiv:190706837. 2019;.
  • 20. Wang Z, Yang Y, Zeng C, Kong S, Feng S, Zhao N. Shallow and Deep Feature Fusion for Digital Audio Tampering Detection. EURASIP Journal on Advances in Signal Processing. 2022;2022(69):1–20. doi: 10.1186/s13634-022-00900-4 [DOI] [Google Scholar]
  • 21. Zeng C, Zhu D, Wang Z, Wu M, Xiong W, Zhao N. Spatial and Temporal Learning Representation for End-to-End Recording Device Identification. EURASIP Journal on Advances in Signal Processing. 2021;2021(1):41. doi: 10.1186/s13634-021-00763-1 [DOI] [Google Scholar]
  • 22. Mehrish A, Majumder N, Bharadwaj R, Mihalcea R, Poria S. A review of deep learning techniques for speech processing. Information Fusion. 2023; p. 101869. doi: 10.1016/j.inffus.2023.101869 [DOI] [Google Scholar]
  • 23.Latif S, Zaidi A, Cuayahuitl H, Shamshad F, Shoukat M, Qadir J. Transformers in speech processing: A survey. arXiv preprint arXiv:230311607. 2023;.
  • 24. Wang Z, Wang Z, Zeng C, Yu Y, Wan X. High-Quality Image Compressed Sensing and Reconstruction with Multi-Scale Dilated Convolutional Neural Network. Circuits, Systems, and Signal Processing. 2022; p. 1–24. doi: 10.1186/10.1007/s00034-022-02181-6 [DOI] [Google Scholar]
  • 25.Zeng C, Yan K, Wang Z, Yu Y, Xia S, Zhao N. Abs-CAM: A Gradient Optimization Interpretable Approach for Explanation of Convolutional Neural Networks. Signal, Image and Video Processing. 2022; p. 1–8. doi: 10.1007/s11760-022-02313-0 [DOI] [Google Scholar]
  • 26. Li L, Wang Z, Zhang T. GBH-YOLOv5: Ghost Convolution with BottleneckCSP and Tiny Target Prediction Head Incorporating YOLOv5 for PV Panel Defect Detection. Electronics. 2023;12(3):1–15. doi: 10.3390/electronics12030561 [DOI] [Google Scholar]
  • 27. Zeng C, Ye J, Wang Z, Zhao N, Wu M. Cascade Neural Network-Based Joint Sampling and Reconstruction for Image Compressed Sensing. Signal, Image and Video Processing. 2022;16(1):47–54. doi: 10.1007/s11760-021-01955-w [DOI] [Google Scholar]
  • 28.Xiong X, Zhao S, Van Inwegen EG, Beck JE. Going deeper with deep knowledge tracing. International Educational Data Mining Society. 2016;.
  • 29. Liu Q, Huang Z, Yin Y, Chen E, Xiong H, Su Y, et al. Ekt: Exercise-aware knowledge tracing for student performance prediction. IEEE Transactions on Knowledge and Data Engineering. 2019;33(1):100–115. doi: 10.1109/TKDE.2019.2924374 [DOI] [Google Scholar]
  • 30.Santoro A, Bartunov S, Botvinick M, Wierstra D, Lillicrap T. Meta-learning with memory-augmented neural networks. In: International conference on machine learning. PMLR; 2016. p. 1842–1850.
  • 31.Battaglia PW, Hamrick JB, Bapst V, Sanchez-Gonzalez A, Zambaldi V, Malinowski M, et al. Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:180601261. 2018;.
  • 32.Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:160902907. 2016;.
  • 33.Chanaa A, Faddouli E, et al. Predicting learners need for recommendation using dynamic graph-based knowledge tracing. In: International conference on artificial intelligence in education. Springer; 2020. p. 49–53.
  • 34. Song X, Li J, Sun S, Yin H, Dawson P, Doss RRM. SEPN: a sequential engagement based academic performance prediction model. IEEE Intelligent Systems. 2020;36(1):46–53. doi: 10.1109/MIS.2020.3006961 [DOI] [Google Scholar]
  • 35. Wu Z, Huang L, Huang Q, Huang C, Tang Y. SGKT: Session graph-based knowledge tracing for student performance prediction. Expert Systems with Applications. 2022;206:117681. doi: 10.1016/j.eswa.2022.117681 [DOI] [Google Scholar]
  • 36.Nagatani K, Zhang Q, Sato M, Chen YY, Chen F, Ohkuma T. Augmenting knowledge tracing by considering forgetting behavior. In: The world wide web conference; 2019. p. 3101–3107.
  • 37. Gan W, Sun Y, Sun Y. Knowledge structure enhanced graph representation learning model for attentive knowledge tracing. International Journal of Intelligent Systems. 2022;37(3):2012–2045. doi: 10.1002/int.22763 [DOI] [Google Scholar]
  • 38.Pandey S, Srivastava J. RKT: relation-aware self-attention for knowledge tracing. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management; 2020. p. 1205–1214.
  • 39.Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Advances in neural information processing systems. 2017;30.
  • 40.Nayakanti N, Al-Rfou R, Zhou A, Goel K, Refaat KS, Sapp B. Wayformer: Motion forecasting via simple & efficient attention networks. In: 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE; 2023. p. 2980–2987.
  • 41. Joshi A, Fidalgo E, Alegre E, Fernández-Robles L. DeepSumm: Exploiting topic models and sequence to sequence networks for extractive text summarization. Expert Systems with Applications. 2023;211:118442. doi: 10.1016/j.eswa.2022.118442 [DOI] [Google Scholar]
  • 42.Vardasbi A, Pires TP, Schmidt RM, Peitz S. State Spaces Aren’t Enough: Machine Translation Needs Attention. arXiv preprint arXiv:230412776. 2023;.
  • 43. Kang L, He S, Wang M, Long F, Su J. Bilingual attention based neural machine translation. Applied Intelligence. 2023;53(4):4302–4315. doi: 10.1007/s10489-022-03563-8 [DOI] [Google Scholar]
  • 44.Wu M, Foster G, Qu L, Haffari G. Document flattening: Beyond concatenating context for document-level neural machine translation. arXiv preprint arXiv:230208079. 2023;.
  • 45. Ranathunga S, Lee ESA, Prifti Skenduli M, Shekhar R, Alam M, Kaur R. Neural machine translation for low-resource languages: A survey. ACM Computing Surveys. 2023;55(11):1–37. doi: 10.1145/3567592 [DOI] [Google Scholar]
  • 46.Rei R, Guerreiro NM, Treviso M, Coheur L, Lavie A, Martins AF. The inside story: Towards better understanding of machine translation neural evaluation metrics. arXiv preprint arXiv:230511806. 2023;.
  • 47.Zhao J, Bhatt S, Thille C, Gattani N, Zimmaro D. Cold Start Knowledge Tracing with Attentive Neural Turing Machine. In: Learning at Scale; 2020.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

All relevant data are within the paper.


Articles from PLOS ONE are provided here courtesy of PLOS

RESOURCES