A Quantum Genetic Algorithm for Building a Semantic Textual Similarity Estimation Framework for Plagiarism Detection Applications

. 2023 Aug 29;25(9):1271. doi: 10.3390/e25091271

Algorithm 2: QGA for Plagiarism Detection

Input: Dataset X_src; Suspicious Document X_susp; QGA Parameters, WordNet

1-
while n < size of documents do
2-
S ← Sentence Segmentation (X_src)
3-
y ←0
4-
While y < S ! = NULL do
5-
T ← Tokenization (S)
6-
z←0
7-
while z < size of T do
8-
M ← POS Tagging (T)
9-
N ← Lemmatization (M)
10-
z++
11-
end
12-
tf-isf (N)
13-
y++
14-
end
15-
n++
16-
end
17-
t ← 0
18-
while termination condition not satisfied do
19-
t ← t+1
20-
Call Algorithm 1 // QGA Procedure
21-
Return Best_Pop ←New_Pop // Store the best solution among P(t)
22-
end
23-
sim₁← sum of words in X_susp // the number of common word-level concepts in Xsusp
- that collects nouns and verbs
24-
sim₂← sum of words in X_src // the number of common word-level concepts in Xsrc
25-
$If {s i m}_{1} - {s i m}_{2} > ε$
26-
Doc. Status = =Plagiarized
27-
end
28-
For each suspicious-source word pair (w_q,w_k) //To compute the semantic similarity
29-
- $WordNet is used to derive the synset lists W_{q_s y n} of w_{q} and W_{k_s y n} of w_{k}$
30-
of each word.
31-
- Only synsets in the same POS class as the word are retrieved for these lists
32-
end
33-
Count ←The common words between the compared suspicious-source sentence
- $pair (S_{s u s p}, S_{s r c_s e l})$ // $S_{s r c_s e l}$ is the best set of selected source sentences
- extracted from QGA’ procedure
34-
If count > τ
35-
Doc. Status = = Plagiarized
36-
end
37-
Else
38-
Doc. Status = = not plagiarized
39-
end
40-
Output = Doc. Status