. 2021 Nov 25;8(11):e29487. doi: 10.2196/29487

Table 2.

Data collection approaches (N=54).

Data collection approaches (data set category) and data source			Included studies, n (%)		References
Studies using original data sets (n=24)
	Reddit API^a	12 (50)		[42,44,46,52,60,62,63,68,71,72,75,77]
	Pushshift	10 (42)		[47,49,50,53-55,58,59,66,76]
	Reddit search function	1 (4)		[61]
	Google search	1 (4)		[56]
Studies using a premade data set (n=21)
	CLEF^b eRisk 2017 data set	8 (38)		[24-26,29,34-37]
	CLEF eRisk 2018 data set	5 (24)		[27,31-33,39]
	CLEF eRisk 2019 data set	1 (5)		[38]
	CLEF eRisk 2020 data set	1 (5)		[30]
	Multiple CLEF eRisk data sets	1 (5)		[28]
	Data set from Yates et al [76]	3 (14)		[45,69,74]
	Data set from Gkotsis et al [50]	1 (5)		[48]
	Data set from Pirina and Çöltekin [63]	1 (5)		[70]
Studies with multiple data collection approaches (n=4)
	Reddit API and Pushshift.io	1 (25)		[57]
	Reddit search function and Reddit API	1 (25)		[67]
	Reddit API plus data set from Pavalanathan and De Choudhury [62]	1 (25)		[41]
	CLEF eRisk 2017 data set and data set from Yates et al [76]	1 (25)		[64]
Studies with unclear data collection approach (n=5)
	Data collection not clearly described	5 (100)		[40,43,51,65,73]

^aAPI: application programming interface.

^bCLEF: Conference and Labs of the Evaluation Forum.