. 2017 Mar 31;19(3):e91. doi: 10.2196/jmir.7022

Table 3.

Coding methods.

Article	Coding method	No. of coders	No. of tweets coded	Coded retweets	No. of Twitter accounts	Followed URLs	Coding agreement
[8]	Hand-coded by researchers	1: all tweets; 2: subsample 300 tweets	2248: relevance; 2192: content	Yes	NR^a	No	91%: sentiment; 72%: theme
[25]	Hand-coded by researchers	6: for a subset of 250 tweets; NR for total	17,098: relevance; 10,128: content	Yes, if additional context	NR	Yes	κ=.64 to .70
[26]	Machine learning with initial hand-coding; Python Scikit-Learn	NR	1,669,123	Yes	NR	Yes	NR
[27]	Machine learning and hand-coding; naïve Bayes, k-nearest neighbors, and support vector machines	2: pilot of 1000; 2: random subset of 150; 2: all 7362	7362: relevance; 4215: content	Retweeted posts were only included once	NR	NR	κ>.70 for the random subset of 150
[28]	Hand-coded by researchers	1: all tweets; 2: for 10% subsample	300: complete sample; 300: industry-free sample; 481 of 600: content (duplicates between samples removed)	Yes	148: complete sample; 215: industry-free sample	Yes	κ=.74
[29]	Hand-coded by researchers	2	NR	Yes	Approximately 3400	NR	NR
[30]	Crowdsourcing with initial hand-coding	3	5000: relevance; 4978: content	NR	3804	NR	κ=.66 to .85 among a subset coded by researchers
[31]	Topic modeling with machine learning; MALLET, a command-line implementation of latent Dirichlet allocation (LDA)	NR	319,315: total; 95,738: hookah; 22,513: cigar; 201,064: cigarette	NR	NR	NR	NR
[32]	Topic modeling (LDA) with machine learning	NR	4962	NR	NR	NR	NR
[33]	Machine learning and hand-coding; DiscoverText	2: for a subset of 500 for relevance, 4500 for commercial versus organic, 7500 for cessation	73,672	Yes	23,700	Yes, hand-coded tweets with URLs	κ=.87 to .93
[34]	Hand-coded by researchers	1: all; 2: for subsets of 100 tweets	5000: relevance; 2847: content	NR	NR	Yes	κ=.64 to 1.00
[35]	Hand-coded by researchers	1: all tweets; 3: subsample	133	No	NR	NR	alpha = .61 to 1.00
[36]	Hand-coded by researchers	3	3935: relevance, foreign language, retweets; 2656 sampled for 288 original tweets for coding	No	346	Yes	κ=.64 to .91
[37]	Hand-coded by researchers; wordcloud R package	NR	171: relevance; 84: content	NR	84	NR	NR
[38]	Hand-coded by researchers	1: all tweets; 2: for 20% of tweets	143,287: identified; 4753: coded for clinical practice guidelines for treating tobacco dependence	NR	153	Yes	>90%
[39]	Hand-coded by researchers	2	684	Yes	306	Yes	NR
[40]	Machine learning and hand-coding; naïve Bayes, LIBLINEAR, Bayesian logistic regression, random forests; keyword comparisons	1: all tweets; 2: subsample of 2000	13,146	NR	2147	No, removed URLs	κ=.87 for subsample
[41]	Machine learning and hand-coding; human detection algorithm; Hedonometrics; key phrasal pattern matching	2: for all tweets from 500 automated accounts and 500 organic accounts as classified by the algorithm; 2: for 4 groups of 500 randomly sampled tweets to gauge accuracy of subcategorical tweet topics	850,000	Yes	131,622: automated accounts; 134717: organic accounts: 188,182: not classified accounts (ie, accounts with <25 tweets)	No, but the algorithm used the count of URLs to distinguish automated accounts from organic accounts; also used keywords in the URLs for the algorithm to determine subcategories of automated accounts	94.6% true- positive rate, 12.9% false- positive rate for the machines on the tweets from the 1000 accounts also coded with human-coding
[42]	Machine learning with initial hand-coding; Python Scikit-Learn; topic modeling with MALLET	2: for a subset of 1000 profiles	224,000 in 2013 sample; 349,401 in 2015 sample	Yes	34,000 in 2013 sample; 100,000 in 2015 sample	No; metadata on the presence of URL links	κ=.88
[43]	Hand-coded by researchers and MySQL pattern matcher	NR	1180	Yes	2: Blu and V2; 537: users retweeting Blu and V2	NR	NR
[44]	Hand-coded by researchers	1: all tweets; 2: for 20% of tweets (n=358)	2191: relevance; 1790: content	Yes	NR (>21)	NR	κ=.95 for 20% subsample
[45]	Machine learning with initial hand-coding; naïve Bayes classifier, k-nearest neighbors, support vector machines	6: for a subset of 250 tweets; NR for total	17,098: relevance; 10,128: content	Yes, if additional context	NR	NR	κ=.64 to .70
[46]	Hand-coded by researchers	3	1776	No	16	Yes	For 5% of data, 95.7%; κ=.72
[47]	Machine learning with initial hand-coding; naïve Bayes classifier	2: subset of 450 tweets for relevance; 2: subset of 350 tweets for content	245,319: relevance; 193,491: content	NR	166,857	NR; metadata on the presence of URL links	κ=.93
[48]	Hand-coded by researchers	1: all tweets; 2: for 1% of tweets	8645: relevance; 6257: content	Yes	NR	Yes	90% for a 1% sample of tweets
[49]	Hand-coded by researchers	2	900, with 50 tweets per account	Yes	18	NR	84%
[50]	Hand-coded by researchers	2	1519	No	1321	Yes	κ=.84

^aNR: not reported.