Skip to main content
. 2017 Mar 31;19(3):e91. doi: 10.2196/jmir.7022

Table 3.

Coding methods.

Article Coding method No. of coders No. of tweets coded Coded retweets No. of Twitter accounts Followed URLs Coding
agreement
[8] Hand-coded by researchers 1: all tweets;
2: subsample 300 tweets
2248: relevance;
2192: content
Yes NRa No 91%:
sentiment;
72%: theme
[25] Hand-coded by researchers 6: for a subset of 250 tweets;
NR for total
17,098: relevance;
10,128: content
Yes, if additional context NR Yes κ=.64 to .70
[26] Machine learning with initial hand-coding; Python Scikit-Learn NR 1,669,123 Yes NR Yes NR
[27] Machine learning and hand-coding; naïve Bayes,
k-nearest
neighbors, and support vector machines
2: pilot of 1000;
2: random subset of 150;
2: all 7362
7362: relevance;
4215: content
Retweeted posts were only
included once
NR NR κ>.70 for the random subset of 150
[28] Hand-coded by researchers 1: all tweets;
2: for 10% subsample
300: complete sample;
300: industry-free sample;
481 of 600: content (duplicates between samples removed)
Yes 148: complete sample;
215: industry-free sample
Yes κ=.74
[29] Hand-coded by researchers 2 NR Yes Approximately 3400 NR NR
[30] Crowdsourcing with initial hand-coding 3 5000: relevance;
4978: content
NR 3804 NR κ=.66 to .85 among a subset coded by researchers
[31] Topic modeling with machine learning;
MALLET, a command-line implementation of latent Dirichlet allocation (LDA)
NR 319,315: total;
95,738: hookah;
22,513: cigar;
201,064: cigarette
NR NR NR NR
[32] Topic modeling (LDA) with
machine learning
NR 4962 NR NR NR NR
[33] Machine learning and hand-coding; DiscoverText 2: for a subset of 500 for relevance, 4500 for commercial versus
organic, 7500 for
cessation
73,672 Yes 23,700 Yes, hand-coded tweets with URLs κ=.87 to .93
[34] Hand-coded by researchers 1: all;
2: for subsets of 100 tweets
5000: relevance;
2847: content
NR NR Yes κ=.64 to 1.00
[35] Hand-coded by researchers 1: all tweets;
3: subsample
133 No NR NR alpha = .61 to 1.00
[36] Hand-coded by researchers 3 3935: relevance,
foreign language, retweets;
2656 sampled for 288 original tweets for coding
No 346 Yes κ=.64 to .91
[37] Hand-coded by researchers; wordcloud R package NR 171: relevance;
84: content
NR 84 NR NR
[38] Hand-coded by researchers 1: all tweets;
2: for 20% of tweets
143,287: identified;
4753: coded for clinical practice guidelines for
treating tobacco
dependence
NR 153 Yes >90%
[39] Hand-coded by researchers 2 684 Yes 306 Yes NR
[40] Machine learning and hand-coding; naïve Bayes,
LIBLINEAR, Bayesian logistic regression,
random forests; keyword
comparisons
1: all tweets;
2: subsample of 2000
13,146 NR 2147 No, removed URLs κ=.87 for subsample
[41] Machine learning and hand-coding; human detection algorithm;
Hedonometrics; key phrasal
pattern matching
2: for all tweets from 500 automated accounts and 500 organic
accounts as classified by the algorithm;
2: for 4 groups of 500 randomly sampled tweets to gauge
accuracy of
subcategorical tweet topics
850,000 Yes 131,622:
automated
accounts;
134717: organic accounts:
188,182: not classified
accounts (ie,
accounts with <25 tweets)
No, but the
algorithm used the count of URLs to
distinguish
automated
accounts from
organic accounts; also used
keywords in the URLs for the
algorithm to
determine
subcategories of automated
accounts
94.6% true-
positive rate, 12.9% false-
positive rate for the machines on the tweets from the 1000
accounts
also coded with
human-coding
[42] Machine learning with initial hand-coding; Python Scikit-Learn;
topic modeling with MALLET
2: for a subset of 1000 profiles 224,000 in 2013 sample;
349,401 in 2015 sample
Yes 34,000 in 2013 sample;
100,000 in 2015 sample
No; metadata on the presence of URL links κ=.88
[43] Hand-coded by researchers and MySQL pattern matcher NR 1180 Yes 2: Blu and V2;
537: users retweeting Blu and V2
NR NR
[44] Hand-coded by researchers 1: all tweets;
2: for 20% of tweets (n=358)
2191: relevance;
1790: content
Yes NR (>21) NR κ=.95 for 20%
subsample
[45] Machine learning with initial hand-coding; naïve Bayes classifier, k-nearest
neighbors,
support vector machines
6: for a subset of 250 tweets;
NR for total
17,098: relevance;
10,128: content
Yes, if additional context NR NR κ=.64 to .70
[46] Hand-coded by researchers 3 1776 No 16 Yes For 5% of data, 95.7%;
κ=.72
[47] Machine learning with initial hand-coding; naïve Bayes classifier 2: subset of 450 tweets for relevance;
2: subset of 350 tweets for content
245,319: relevance;
193,491: content
NR 166,857 NR; metadata on the presence of URL links κ=.93
[48] Hand-coded by researchers 1: all tweets;
2: for 1% of tweets
8645: relevance;
6257: content
Yes NR Yes 90% for a 1% sample of tweets
[49] Hand-coded by researchers 2 900, with 50 tweets per account Yes 18 NR 84%
[50] Hand-coded by researchers 2 1519 No 1321 Yes κ=.84

aNR: not reported.