. 2017 Aug 2;19(8):e272. doi: 10.2196/jmir.7660

Table 2.

Comparison of different data sources for prediction in OHSNs

Data source	Survey data (high-level data)	User-generated data (mid-level data)	User log data (low-level data)
Effort required to collect data
	Design questionnaires	Perform text mining	Extract data from server database
	Conduct surveys	Apply natural language processing on text
Data generation rate
	Slow	Fast	Instantaneous
	Need to conduct new survey to get recent data	Hundreds of posts written by users everyday	New generated with every user action (eg, access time, search history)
Interpretability
	Very easy to understand	Relatively easy to understand	Difficult to derive meaning from raw data
	Questions directly suited to user’s intentions	Requires data processing to extract features from long texts	Requires insight on what features to obtain from given data
Data types
	Numerical data (eg, scale of 1~10)	Text data (eg, title, user posts, comments)	Periodical data (eg, access time)
	Demographic information (eg, age, sex, region)		Demographic information (eg, user profile information)
	Text data for open-ended questions		Hypertext data (eg, accessed links)
			Text data (eg, keywords typed in for search)
Obtainable characteristics
	A user’s (dis)agreement toward a particular characteristic	Words that represent a user’s main interests or concerns	Visiting frequency
	Open-ended answers toward a question	Response to a particular article	Reading preference
			Search preference