Skip to main content
. 2021 Nov 25;8(11):e29487. doi: 10.2196/29487

Table 2.

Data collection approaches (N=54).

Data collection approaches (data set category) and data source Included studies, n (%) References
Studies using original data sets (n=24)

Reddit APIa 12 (50) [42,44,46,52,60,62,63,68,71,72,75,77]

Pushshift 10 (42) [47,49,50,53-55,58,59,66,76]

Reddit search function 1 (4) [61]

Google search 1 (4) [56]
Studies using a premade data set (n=21)

CLEFb eRisk 2017 data set 8 (38) [24-26,29,34-37]

CLEF eRisk 2018 data set 5 (24) [27,31-33,39]

CLEF eRisk 2019 data set 1 (5) [38]

CLEF eRisk 2020 data set 1 (5) [30]

Multiple CLEF eRisk data sets 1 (5) [28]

Data set from Yates et al [76] 3 (14) [45,69,74]

Data set from Gkotsis et al [50] 1 (5) [48]

Data set from Pirina and Çöltekin [63] 1 (5) [70]
Studies with multiple data collection approaches (n=4)

Reddit API and Pushshift.io 1 (25) [57]

Reddit search function and Reddit API 1 (25) [67]

Reddit API plus data set from Pavalanathan and De Choudhury [62] 1 (25) [41]

CLEF eRisk 2017 data set and data set from Yates et al [76] 1 (25) [64]
Studies with unclear data collection approach (n=5)

Data collection not clearly described 5 (100) [40,43,51,65,73]

aAPI: application programming interface.

bCLEF: Conference and Labs of the Evaluation Forum.