Skip to main content
. 2023 Nov 9;9:e1567. doi: 10.7717/peerj-cs.1567

Table 1. Summarization of the related works.

Research focus Reference How did the authors model the problem? Dataset Precision (P) Recall (R) Accuracy Comments
Bug reports Runeson, Alexandersson & Nyholm (2007) Ranking problem Defect reports from development projects. 24–42% The approach uses NLP techniques.
Alipour, Hindle & Stroulia (2013) Classification task Android ecosystem bug repository. 91% The approach uses ML algorithms.
Kukkar et al. (2020) Classification task Six datasets from (Lazar, Ritchey & Sharif, 2014; Lerch & Mezini, 2013) 79–94% R@20 85–99% The approach is a CNN-based strategy.
Cooper et al. (2021) Ranking problem RICO dataset (Deka et al., 2017) 83% top-2 The approach uses computer vision, optical recognition, and text retrieval techniques.
Q&A forums Zhang et al. (2015) Ranking problem Pre-labeled Stack Overflow database 64% R@20 The approach combines the similarity scores of four features
Ahasanuzzaman et al. (2016) Classification task Pre-labeled Stack Overflow database 66% R@20 The approach is a supervised classification strategy.
Zhang et al. (2017) Classification task Pre-labeled Stack Overflow database 87% The approach is based on ML algorithms.
Mizobuchi & Takayama (2017) Ranking problem Pre-labeled Stack Overflow database 43% R@20 The approach is based on Word2vec models.
Zhang et al. (2018) Ranking-classification task Pre-labeled Stack Overflow database 75–86% 66–86% The approach uses rank strategies, deep learning, and IR techniques.
Wang, Zhang & Jiang (2020) Classification task Pre-labeled Stack Overflow database 76–79% R@5 The approach is based on CNNs, RNNs, and LSTMs
Mohomed Jabbar et al. (2021) Classification task Pre-labeled datasets from the Stack Exchange sub-communities 75–78% The approach is based on deep learning and transfer learning techniques
Pei et al. (2021) Classification task Pre-labeled Stack Overflow database 82% 82% The approach is based on an Attention-based Sentence and ASIM model
Gao, Wu & Xu, 2022 Classification task Pre-labeled Stack Overflow database 68–79% The approach is based on word embedding and CNNs
GitHub activities Wang et al. (2019) Classification task DupPR (Yu et al., 2018) 73% P@1 65% R@1 The approach is based on AdaBoost algorithm.
Li et al. (2017) Ranking problem The authors constructed a dataset of duplicate PRs 54–83% R@20 The approach is based on IR and NLP techniques
Ren et al. (2019) Classification task DupPR (Yu et al., 2018) 83% 11% The approach is based on IR and NLP techniques
Zhang et al. (2020) Recommendation task The authors constructed a dataset of duplicates (https://github.com/yangzhangs/iLinker) 45–61% R@10 The approach is based on IR and deep learning techniques