Skip to main content
. 2017 Aug 2;19(8):e272. doi: 10.2196/jmir.7660

Table 2.

Comparison of different data sources for prediction in OHSNs

Data source Survey data
(high-level data)
User-generated data
(mid-level data)
User log data
(low-level data)
Effort required to collect data

Design questionnaires Perform text mining Extract data from server database
Conduct surveys Apply natural language processing on text
Data generation rate

Slow Fast Instantaneous
Need to conduct new survey to get recent data Hundreds of posts written by users everyday New generated with every user action (eg, access time, search history)
Interpretability

Very easy to understand Relatively easy to understand Difficult to derive meaning from raw data
Questions directly suited to user’s intentions Requires data processing to extract features from long texts Requires insight on what features to obtain from given data
Data types

Numerical data
(eg, scale of 1~10)

Text data
(eg, title, user posts, comments)
Periodical data
(eg, access time)
Demographic information (eg, age, sex, region)


Demographic information (eg, user profile information)

Text data for open-ended questions
Hypertext data (eg, accessed links)


Text data (eg, keywords typed in for search)
Obtainable characteristics

A user’s (dis)agreement toward a particular characteristic Words that represent a user’s main interests or concerns Visiting frequency
Open-ended answers toward a question Response to a particular article Reading preference



Search preference