. 2023 Nov 9;9:e1567. doi: 10.7717/peerj-cs.1567

Table 1. Summarization of the related works.

Research focus	Reference	How did the authors model the problem?	Dataset	Precision (P)	Recall (R)	Accuracy	Comments
Bug reports	Runeson, Alexandersson & Nyholm (2007)	Ranking problem	Defect reports from development projects.		24–42%		The approach uses NLP techniques.
	Alipour, Hindle & Stroulia (2013)	Classification task	Android ecosystem bug repository.			91%	The approach uses ML algorithms.
	Kukkar et al. (2020)	Classification task	Six datasets from (Lazar, Ritchey & Sharif, 2014; Lerch & Mezini, 2013)		79–94% R@20	85–99%	The approach is a CNN-based strategy.
	Cooper et al. (2021)	Ranking problem	RICO dataset (Deka et al., 2017)			83% top-2	The approach uses computer vision, optical recognition, and text retrieval techniques.
Q&A forums	Zhang et al. (2015)	Ranking problem	Pre-labeled Stack Overflow database		64% R@20		The approach combines the similarity scores of four features
	Ahasanuzzaman et al. (2016)	Classification task	Pre-labeled Stack Overflow database		66% R@20		The approach is a supervised classification strategy.
	Zhang et al. (2017)	Classification task	Pre-labeled Stack Overflow database		87%		The approach is based on ML algorithms.
	Mizobuchi & Takayama (2017)	Ranking problem	Pre-labeled Stack Overflow database		43% R@20		The approach is based on Word2vec models.
	Zhang et al. (2018)	Ranking-classification task	Pre-labeled Stack Overflow database	75–86%	66–86%		The approach uses rank strategies, deep learning, and IR techniques.
	Wang, Zhang & Jiang (2020)	Classification task	Pre-labeled Stack Overflow database		76–79% R@5		The approach is based on CNNs, RNNs, and LSTMs
	Mohomed Jabbar et al. (2021)	Classification task	Pre-labeled datasets from the Stack Exchange sub-communities			75–78%	The approach is based on deep learning and transfer learning techniques
	Pei et al. (2021)	Classification task	Pre-labeled Stack Overflow database	82%	82%		The approach is based on an Attention-based Sentence and ASIM model
	Gao, Wu & Xu, 2022	Classification task	Pre-labeled Stack Overflow database		68–79%		The approach is based on word embedding and CNNs
GitHub activities	Wang et al. (2019)	Classification task	DupPR (Yu et al., 2018)	73% P@1	65% R@1		The approach is based on AdaBoost algorithm.
	Li et al. (2017)	Ranking problem	The authors constructed a dataset of duplicate PRs		54–83% R@20		The approach is based on IR and NLP techniques
	Ren et al. (2019)	Classification task	DupPR (Yu et al., 2018)	83%	11%		The approach is based on IR and NLP techniques
	Zhang et al. (2020)	Recommendation task	The authors constructed a dataset of duplicates (https://github.com/yangzhangs/iLinker)		45–61% R@10		The approach is based on IR and deep learning techniques