Skip to main content
. Author manuscript; available in PMC: 2014 May 2.
Published in final edited form as: CSCW Conf Comput Support Coop Work. 2014:1523–1536. doi: 10.1145/2531602.2531607

Table 5. Content analysis of n-grams in the location and text fields. For each category, we show the fraction of total weight in all location estimates from n-grams of that category; e.g., 49% of all estimate weight in the good estimates was from n-grams with category city (weights do not add up to 100% because time zone and language fields are not included). Weights that are significantly greater in good estimates than bad (or vice versa) are indicated with a significance code (o = 0.1, * = 0.05, ** = 0.01, *** = 0.001) determined using a Mann-Whitney U test with Bonferroni correction, the null hypothesis being that the mean weight assigned to a category over all n-grams in the good set is equal to the mean weight for the same category in the bad set. Categories with less than 1.5% weight in both classes are rolled up into other. We also show the top-weighted examples in each category.

Category Good Bad Examples



location *** 0.83 0.19
city *** 0.49 0.09 edinburgh, roma, leicester, houston tx
country ** 0.10 0.03 singapore, the netherlands, nederland, janeiro brasil
generic 0.01 0.02 de mar, puerta de, beach, rd singapore
state *** 0.14 0.02 maryland, houston tx, puebla, connecticut
other lo *** 0.09 0.02 essex, south yorkshire, yorkshire, gloucestershire
not-location 0.07 0.57 ***
dutch word *** 0.02 0.00 zien, bij de, uur, vrij
english word 0.01 0.37 *** st new, i, pages, check my
letter 0.01 0.04 μ, w, α, s
slang 0.00 0.08 *** bitch, lad, ass, cuz
spanish word 0.00 0.07 *** mucha, niña, los, suerte
swedish word 0.00 0.02 rätt, jävla, på, kul
turkish word 0.02 0.00 kar, restoran, biraz, daha
untranslated 0.02 0.00 cewe, gading, ung, suria
technical ** 0.03 0.02
foursquare *** 0.03 0.00 paulo http, istanbul http, miami http, brasflia http
url 0.00 0.02 co, http, http t, co h
other 0.03 0.04