Skip to main content
. 2016 Jun 13;18(6):e148. doi: 10.2196/jmir.5327

Table 1.

List of all used sources with their number of posts and with the available demographic attributes.

Dataset No. of posts Gendera Agea Ethnicitya Locationa Writing level
TwitterHealth [23] 11,637,888 Gender classifier NO Ethnicity classifier YES Writing level classifier
Google+Health [24] 186,666 YES YES Ethnicity classifier YES Writing level classifier
Drugs.com [25] 74,461 Gender classifier NO NO NO Writing level classifier
DailyStrength/Treatments [26] 1,055,603 YES YES NO YES Writing level classifier
WebMD/Drugs [27] 122,040 YES YES NO NO Writing level classifier
Drugs.com/Answers [28] 320,118 Gender classifier NO NO NO Writing level classifier
DailyStrength/Forums [29] 5,948,877 YES YES NO YES Writing level classifier
WebMD [30] 1,128,629 Gender classifier NO NO NO Writing level classifier

aNO indicates that the demographic attribute is not provided by the source and no classifier is used due to low accuracy. YES indicates that the demographic attribute is provided by the source. More details on the demographic classifiers are available in the paper by Sadah et al [21].