. 2017 Aug 29;12(8):e0183537. doi: 10.1371/journal.pone.0183537

Table 4. Top predictive features for each age group in tweet language use and Twitter handle metadata models.

Predictive Features	Youth (Aged 13 to 17)		Young Adults (Aged 18 to 24)		Adults (Aged 25 or Older)
Metadata Features	Cohen’s d	Direction of Association	Cohen’s d	Direction of Association	Cohen’s d	Direction of Association
Age of Twitter Account	0.336	−			0.193	+
*Linguistic Features*
Count of the term “school”	0.210	+			0.194	−
Count of WWBP words positively correlated with 23–29 age category, in tweet	0.222	−
Count of the stems of “ili” (e.g. “I like”)	0.186	−
Count of the term “college”	0.236	−	0.232	+
Percent of WWBP words negatively correlated with 19–22 age category, in tweet^a	0.171	+	0.331	-
Count of stems of 18^b			0.210	+
Count of stems of 21			0.209	+
Count of the term “drunkard”			0.194	+
Count of the term “semester”			0.179	+
Count of kissyheart emoji			0.162	+
Count of smiley emoji					0.170	-
Count of stems of “via”					0.172	+
Mean absolute deviation of count of URLs in tweet^a					0.174	+

^a To capture the distributional properties of a user’s tweeting behavior, we created tweet-level features and then calculated descriptive statistics of those features across a user’s tweets. For example, for the “Average Percent Characters in Tweet that are Emoji” feature, we calculated the percentage of characters that are emoji for each tweet and then took the average across all the user’s collected tweets.

^b To group common categorizes of words together, terms were stemmed, a process of reducing words to their base form. For example, a stemming algorithm would reduce the words “hunting,” “hunter,” “hunts,” and “hunters” to the stem “hunt.”