Table 2.
Data collection approaches (N=54).
Data collection approaches (data set category) and data source | Included studies, n (%) | References | |||
Studies using original data sets (n=24) | |||||
|
Reddit APIa | 12 (50) | [42,44,46,52,60,62,63,68,71,72,75,77] | ||
|
Pushshift | 10 (42) | [47,49,50,53-55,58,59,66,76] | ||
|
Reddit search function | 1 (4) | [61] | ||
|
Google search | 1 (4) | [56] | ||
Studies using a premade data set (n=21) | |||||
|
CLEFb eRisk 2017 data set | 8 (38) | [24-26,29,34-37] | ||
|
CLEF eRisk 2018 data set | 5 (24) | [27,31-33,39] | ||
|
CLEF eRisk 2019 data set | 1 (5) | [38] | ||
|
CLEF eRisk 2020 data set | 1 (5) | [30] | ||
|
Multiple CLEF eRisk data sets | 1 (5) | [28] | ||
|
Data set from Yates et al [76] | 3 (14) | [45,69,74] | ||
|
Data set from Gkotsis et al [50] | 1 (5) | [48] | ||
|
Data set from Pirina and Çöltekin [63] | 1 (5) | [70] | ||
Studies with multiple data collection approaches (n=4) | |||||
|
Reddit API and Pushshift.io | 1 (25) | [57] | ||
|
Reddit search function and Reddit API | 1 (25) | [67] | ||
|
Reddit API plus data set from Pavalanathan and De Choudhury [62] | 1 (25) | [41] | ||
|
CLEF eRisk 2017 data set and data set from Yates et al [76] | 1 (25) | [64] | ||
Studies with unclear data collection approach (n=5) | |||||
|
Data collection not clearly described | 5 (100) | [40,43,51,65,73] |
aAPI: application programming interface.
bCLEF: Conference and Labs of the Evaluation Forum.