Table 3.
Coding methods.
Article | Coding method | No. of coders | No. of tweets coded | Coded retweets | No. of Twitter accounts | Followed URLs | Coding agreement |
[8] | Hand-coded by researchers | 1: all tweets; 2: subsample 300 tweets |
2248: relevance; 2192: content |
Yes | NRa | No | 91%: sentiment; 72%: theme |
[25] | Hand-coded by researchers | 6: for a subset of 250 tweets; NR for total |
17,098: relevance; 10,128: content |
Yes, if additional context | NR | Yes | κ=.64 to .70 |
[26] | Machine learning with initial hand-coding; Python Scikit-Learn | NR | 1,669,123 | Yes | NR | Yes | NR |
[27] | Machine learning and hand-coding; naïve Bayes, k-nearest neighbors, and support vector machines |
2: pilot of 1000; 2: random subset of 150; 2: all 7362 |
7362: relevance; 4215: content |
Retweeted posts were only included once |
NR | NR | κ>.70 for the random subset of 150 |
[28] | Hand-coded by researchers | 1: all tweets; 2: for 10% subsample |
300: complete sample; 300: industry-free sample; 481 of 600: content (duplicates between samples removed) |
Yes | 148: complete sample; 215: industry-free sample |
Yes | κ=.74 |
[29] | Hand-coded by researchers | 2 | NR | Yes | Approximately 3400 | NR | NR |
[30] | Crowdsourcing with initial hand-coding | 3 | 5000: relevance; 4978: content |
NR | 3804 | NR | κ=.66 to .85 among a subset coded by researchers |
[31] | Topic modeling with machine learning; MALLET, a command-line implementation of latent Dirichlet allocation (LDA) |
NR | 319,315: total; 95,738: hookah; 22,513: cigar; 201,064: cigarette |
NR | NR | NR | NR |
[32] | Topic modeling (LDA) with machine learning |
NR | 4962 | NR | NR | NR | NR |
[33] | Machine learning and hand-coding; DiscoverText | 2: for a subset of 500 for relevance, 4500 for commercial versus organic, 7500 for cessation |
73,672 | Yes | 23,700 | Yes, hand-coded tweets with URLs | κ=.87 to .93 |
[34] | Hand-coded by researchers | 1: all; 2: for subsets of 100 tweets |
5000: relevance; 2847: content |
NR | NR | Yes | κ=.64 to 1.00 |
[35] | Hand-coded by researchers | 1: all tweets; 3: subsample |
133 | No | NR | NR | alpha = .61 to 1.00 |
[36] | Hand-coded by researchers | 3 | 3935: relevance, foreign language, retweets; 2656 sampled for 288 original tweets for coding |
No | 346 | Yes | κ=.64 to .91 |
[37] | Hand-coded by researchers; wordcloud R package | NR | 171: relevance; 84: content |
NR | 84 | NR | NR |
[38] | Hand-coded by researchers | 1: all tweets; 2: for 20% of tweets |
143,287: identified; 4753: coded for clinical practice guidelines for treating tobacco dependence |
NR | 153 | Yes | >90% |
[39] | Hand-coded by researchers | 2 | 684 | Yes | 306 | Yes | NR |
[40] | Machine learning and hand-coding; naïve Bayes, LIBLINEAR, Bayesian logistic regression, random forests; keyword comparisons |
1: all tweets; 2: subsample of 2000 |
13,146 | NR | 2147 | No, removed URLs | κ=.87 for subsample |
[41] | Machine learning and hand-coding; human detection algorithm; Hedonometrics; key phrasal pattern matching |
2: for all tweets from 500 automated accounts and 500 organic accounts as classified by the algorithm; 2: for 4 groups of 500 randomly sampled tweets to gauge accuracy of subcategorical tweet topics |
850,000 | Yes | 131,622: automated accounts; 134717: organic accounts: 188,182: not classified accounts (ie, accounts with <25 tweets) |
No, but the algorithm used the count of URLs to distinguish automated accounts from organic accounts; also used keywords in the URLs for the algorithm to determine subcategories of automated accounts |
94.6% true- positive rate, 12.9% false- positive rate for the machines on the tweets from the 1000 accounts also coded with human-coding |
[42] | Machine learning with initial hand-coding; Python Scikit-Learn; topic modeling with MALLET |
2: for a subset of 1000 profiles | 224,000 in 2013 sample; 349,401 in 2015 sample |
Yes | 34,000 in 2013 sample; 100,000 in 2015 sample |
No; metadata on the presence of URL links | κ=.88 |
[43] | Hand-coded by researchers and MySQL pattern matcher | NR | 1180 | Yes | 2: Blu and V2; 537: users retweeting Blu and V2 |
NR | NR |
[44] | Hand-coded by researchers | 1: all tweets; 2: for 20% of tweets (n=358) |
2191: relevance; 1790: content |
Yes | NR (>21) | NR | κ=.95 for 20% subsample |
[45] | Machine learning with initial hand-coding; naïve Bayes classifier, k-nearest neighbors, support vector machines |
6: for a subset of 250 tweets; NR for total |
17,098: relevance; 10,128: content |
Yes, if additional context | NR | NR | κ=.64 to .70 |
[46] | Hand-coded by researchers | 3 | 1776 | No | 16 | Yes | For 5% of data, 95.7%; κ=.72 |
[47] | Machine learning with initial hand-coding; naïve Bayes classifier | 2: subset of 450 tweets for relevance; 2: subset of 350 tweets for content |
245,319: relevance; 193,491: content |
NR | 166,857 | NR; metadata on the presence of URL links | κ=.93 |
[48] | Hand-coded by researchers | 1: all tweets; 2: for 1% of tweets |
8645: relevance; 6257: content |
Yes | NR | Yes | 90% for a 1% sample of tweets |
[49] | Hand-coded by researchers | 2 | 900, with 50 tweets per account | Yes | 18 | NR | 84% |
[50] | Hand-coded by researchers | 2 | 1519 | No | 1321 | Yes | κ=.84 |
aNR: not reported.