. 2016 Aug 4;11(8):e0159644. doi: 10.1371/journal.pone.0159644

Table 1. Representative recent supervised learning methods to detect duplicates in general domains.

Method	Domain	Expert curated set (DU + DI)	Technique(s)
[15]	Geospatial	1,927 + 1,927	DT and SVM
[26]	Product matching	1,000 + 1,000	SVM
[14]	Document Retrieval	2,500 + 2,500	SVM
[27]	Bug report	534 + 534	NB, DT and SVM
[28]	Spam check	1,750 + 2,000	SVM
[29]	Web visitor	250,000 + 250,000	LR, RF, and SVM

DU: duplicate pairs; DI: distinct pairs; NB: Naïve Bayes; DT: Decision Tree; SVM: Support Vector Machine; LR: Logistic Regression; RF: Random Forest; The dataset listed here is for supervised learning. Some work might have other datasets.