Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Jun 4.
Published in final edited form as: Complex Netw XI (2020). 2020 Feb 22;2020:197–211. doi: 10.1007/978-3-030-40943-2_17

Twitter Watch: Leveraging Social Media to Monitor and Predict Collective-Efficacy of Neighborhoods

Moniba Keymanesh 1, Saket Gurukar 1, Bethany Boettner 1, Christopher Browning 1, Catherine Calder 2, Srinivasan Parthasarathy 1
PMCID: PMC9164274  NIHMSID: NIHMS1583429  PMID: 35662896

Abstract

Sociologists associate the spatial variation of crime within an urban setting, with the concept of collective efficacy. The collective efficacy of a neighborhood is defined as social cohesion among neighbors combined with their willingness to intervene on behalf of the common good. Sociologists measure collective efficacy by conducting survey studies designed to measure individuals’ perception of their community. In this work, we employ the curated data from a survey study (ground truth) and examine the effectiveness of substituting costly survey questionnaires with proxies derived from social media. We enrich a corpus of tweets mentioning a local venue with several linguistic and topological features. We then propose a pairwise learning to rank model with the goal of identifying a ranking of neighborhoods that is similar to the ranking obtained from the ground truth collective efficacy values. In our experiments, we find that our generated ranking of neighborhoods achieves 0.77 Kendall tau-x ranking agreement with the ground truth ranking. Overall, our results are up to 37% better than traditional baselines.

1. INTRODUCTION

Understanding occurrence of crime and disorder in cities is important for public health, policy, and governance. However, occurrence of criminal violence is uneven across the neighborhoods (Chainey and Ratcliffe 2013; Weisburd, Bruinsma, and Bernasco ). Sociologists and policy-makers associate the spatial variation of disorder to the organizational characteristics of the neighborhoods (Morenoff, Sampson, and Raudenbush 2001; Sampson, Raudenbush, and Earls 1997; Kornhauser 1978; Sampson and Groves 1989; Browning, Cagney, and Boettner 2016). An important measure of such disorder is collective efficacy (Sampson, Raudenbush, and Earls 1997). Collective efficacy is defined as “social cohesion and trust among neighbors combined with the joint willingness to intervene on behalf of the common good” (Sampson, Raudenbush, and Earls 1997). Collective efficacy is increasingly used by local governments to prioritize resources to both monitor and reduce disorder through targeted policies and neighborhood gentrification strategies. It is also used to measure the impact of said policies and strategies over time (Hipp 2016; Bandura 1997).

The computation of neighborhoods’1 collective efficacy traditionally requires conducting expensive surveys; usually requiring funding on the order of hundreds of thousands of dollars (Couper 2017). Changes to collective efficacy over time (Hipp 2016), due to policy shifts (e.g. through neighborhood gentrification efforts) require additional surveys, further exacerbating this cost.

Sociologists and government agencies typically use collective efficacy to “order” neighborhoods with respect to neighborhood safety perception, social cohesion among residents, and their willingness to intervene on behalf of the common good. Neighborhoods with high collective efficacy tend to be safer while lower collective efficacy values correspond to relatively less safe neighborhoods (Sampson, Raudenbush, and Earls 1997). Essentially one may model this as a ranking problem. Concretely the key question we seek to answer in this paper is: “Given the social media data about neighborhoods, can we rank the neighborhoods such that the ranked list is close to the ranked list of neighborhoods ordered by collective efficacy – thereby saving on the cost of expensive surveys?”.

Our approach, a first of its kind study at a city-scale, seeks to characterize neighborhood collective efficacy by levering spatially conditioned linguistic features extracted from social media. These features are related to the type of urban activity, language use, visible signs of crime and anti-social behavior reported on such media, familiarity of residents with one another, and public mood of the neighborhood. We lever additional sociological, and spatial features and develop a simple pairwise learning to rank model based on these features. We empirically show the effectiveness of our model on a real world city-scale dataset, with ground truth values of collective efficacy computed from a traditional surveybased study (details in section 3). Additionally, we conduct a comprehensive analysis of the predictive power of specific features in the learning to rank task to better understand the relative importance of individual features. In terms of broader impacts such ideas can be used as a cost-effective early warning mechanism to monitor the transformations of the neighborhoods and prioritize the resources.

2. Background and Related Work

Twitter has been used by researchers to make sense of human behavior. The behavior of users on this platform has been used in particular to assist prediction of criminal violence (Wang, Gerber, and Brown 2012; Gerber 2014; Wang, Brown, and Gerber 2012; Wang and Gerber 2015; Aghababaei and Makrehchi 2016; Williams, Burnap, and Sloan 2017; Bendler et al. 2014). The link between the prediction of social unrest and the user’s online activity on Twitter has been studied by (Compton et al. 2013). Moreover, Twitter has been employed to study the online behavior of gang members (Patton 2015) and to measure the population at risk, considering violent crime (Malleson and Andresen 2015). Several studies have used Twitter to study the trust relations (Vedula, Parthasarathy, and Shalin 2017) among online users. Researchers have also leveraged Twitter data for studying social disorganization by evaluating entropy of individuals’ opinion about soccer teams (Pacheco, Oliveira, and Menezes 2017). Although the concepts of trust, crime, and social disorganization are related to collective efficacy, to the best of our knowledge estimating individuals’ perception of their social climate and expectation of intervention using social media data has not been addressed till now.

3. Data Collection

AHDC study

The adolescent health and development in context (AHDC) study is a longitudinal data collection effort in a representative and diverse urban setting that focuses on the contribution of social and spatial environments to the health and developmental outcomes of urban youth. The study area is a contiguous space in Columbus Ohio. In the first wave of the study 1403 Columbus residents participated in the study. Participants were asked a series of questions about their neighborhood and routine activity locations. Questions specifically focused on informal social control items measuring the participant’s perception of the social climate in the area at and around each location and in the neighborhood. Participants reported agreement with the following questions: 1- whether people on the streets can be trusted? 2- whether people are watching what is happening on the street?, and 3- whether people would come to the defense of others being threatened? Responses ranged from 1 (“strongly disagree”) to 5 (“strongly agree”). This step resulted in roughly 9000 location reports (4031 unique locations) nested within 567 block groups. In order to achieve the collective efficacy value of each neighborhood, we aggregated individual responses to the three social control items at the report level. Then, we aggregated report-level results for each block group. Finally, we normalized the scores in range of 0 to 1 and use it as the ground truth for our study. This methodology is aligned with the traditional measurement approach employed to compute collective efficacy at neighborhood level (Bandura and Wessels 1997; Paskevich et al. 1999; Sampson, Morenoff, and Earls 1999). Note that while we adopt a similar ground truth model (Sampson, Morenoff, and Earls 1999), we lever survey reports from individuals who both reside within and frequently visit a particular neighborhood (the original model focused just on residents). Concomitantly, the social media postings included in our study includes postings from both individuals that reside within and frequently visit a particular neighborhood. In order to increase the reliability of the aggregation we only include the neighborhoods having at least 5 reports. Figure 1 shows a collective efficacy map of Columbus (ground truth).

Figure 1:

Figure 1:

Collective efficacy map of Columbus, OH

Twitter Data

Matching the offline survey instrument of the AHDC study we collected a significant corpus of Twitter feeds from Columbus area and its suburbs. Our goal was to capture the informal language of local citizenry focused on local venues and localities. We chose Twitter because of its national appeal2 and easy availability through the API. For the purpose of our study, collectively, more than 50 million publicly available tweets were collected from the accounts of 54k Twitter users who identified their location as Columbus. These users were identified through Snowball sampling (Goodman 1961). Details of our data collection process can be found in Appendix A.1.

3.1. Associating Tweets to Neighborhoods

Following our data collection, we excluded the tweets that did not contain a mention of locations within our study area. For doing so, we used a state-of-the-art publicly available location name extractor LNEx (Al-Olimat et al. 2018). LNEx extracts location entries from tweets, handles abbreviations, tackles appellation formation and metonomy pose disambiguation problems given gazetteer and region information. Open Street Map gazetteer was used and region was set to Columbus. However, there are cases in which ambiguous locations were reported by LNEx. In our study, we exclude tweets containing ambiguous location entities. For more details on the ambiguous cases and pruning steps see Appendix A.1. This pruning step resulted in 4846 unique locations that were spotted in 545k tweets and were mapped to 424 neighborhoods.3

4. Methodology

In this section, we formalize our prediction task and the proposed ranking model. As mentioned earlier, the goal of our study is to rank the neighborhoods based on features extracted from tweets such that the ranked list is similar to the list of neighborhoods ordered by collective efficacy. Hence, we formulated our problem as pairwise learning to rank task (Liu and others 2009).

4.1. Definitions and Problem Formulation

First, we define the terms related to ordinal ranking. Tied objects refer to the set of two or more objects that are interchangeable in ranking with respect to the quality under consideration (Kendall 1945). The ranking in which ties are allowed is called a weak ordering. In our study, the neighborhoods with significantly small difference in their ground truth value of collective efficacy are considered tied. As a result they are interchangeable in ranking. Ties are defined based on a threshold on the difference of collective efficacy values. Note that in this case, ties are intransitive by definition. Meaning that a tie relationship between neighborhoods na and nb; and nb and nc does not imply that neighborhoods na and nc are tied. This constraint will be reflected in the way we define the ranking matrix and will be discussed in section 4.5.

Next, we formalize our ranking task and our proposed approach. Our data consists of {t1,t2,…,tc} where ti is the set of tweets associated with neighborhood ni. We denote the collective efficacy of neighborhood ni with C(ni). The goal of our framework is to automatically generate a permutation of neighborhoods (f(n1)f(n2)…..f(nm)) where f is the ordering function that maps each neighborhood to its position such that the mapped position of neighborhood is close to its true position based on collective efficacy values. Formally the ranking task is defined as {f(ni) < f(nj) | ∀ni,nj if C(ni) ≤ C(nj)}.

In order to generate a ranking of the neighborhoods, we first predict the local rank of all pairs of neighborhoods ni and nj. In this scenario, there can be three cases for any pair; ni comes before nj, ni comes after nj, or ni and nj are interchangeable in the ranking. Thus, we formulate our local ranking task as a 3-class classification task. We then use the local rankings to generate the global ranking. Details of this process are discussed in Section 4.4. Next, we explain the features used in this study to characterize the neighborhoods.

4.2. Features

We characterize each neighborhood with features extracted from the tweets associated with the neighborhood. We compute two types of features: I) features that are computed for each neighborhood and II) features that are computed for a pair of neighborhoods. To generate the feature vector of a pair of neighborhoods, we first concatenate feature vector of each of the neighborhoods. Next, we add the pairwise features to the feature vector. These features are explained in detail in the following subsections.

TF-IDF of crime related words

“Broken Windows” (Wilson and Kelling 1982) is a well-known theory in criminology. The basic formulation of this theory is that visible signs of crime creates an urban environment that encourages further crime and disorder (Skogan 2015; Welsh, Braga, and Bruinsma 2015). Under the broken windows theory, a disordered environment, with signs of broken windows, graffiti, prostitutes, and excessive litter sends the signal that the area is not monitored and that criminal behavior has little risk of detection. Such a signal can potentially draw offenders from outside of the neighborhood. On the basis of this theory, we used a lexicon of crime4 as a proxy for visible signs of crime and disorder in neighborhoods. This lexicon contains words that people often use while talking about crime and disorder. TF-IDF captures the importance of a term in a document. With this in mind, we employed TF-IDF to capture the content surrounding the location entity in a tweet. For more details of preprocessing see the Appendix A.2.

Distribution of spatio-temporal urban activities using topic modeling

Casual, superficial interaction and the resulting public familiarity engender place-based trust among residents and ultimately the expectation of response to deviant behaviour (Jacobs 1961). Identifying the activities that individuals conduct in a city is a non-trivial step to understanding the ecological dynamics of a neighborhood such as the potential for street activity and public contact. Following the same methodology as in (Fu et al. 2018) we applied Latent Dirichlet Allocation (LDA) (Blei, Ng, and Jordan 2003) to tweets associated with a given neighborhood to identify the main activity types in each neighborhood. The number of topics in a set of tweets is an important prior parameter in LDA model. To evaluate the topic model and determine the optimal number of topics, perplexity (Blei, Ng, and Jordan 2003) is used by convention in language modeling. Perplexity is defined as:

Perplexity(T)=exp{t=1Mlogp(wt)t=1MNt} (1)

Where T is the set of test tweets that are held from the tweet set for building the LDA model; M is size of T; Nt is the number of words in a tweet t from tweet set T; and p(wt) is the probability of word distribution in the tweet. (Zhao et al. 2015) highlights a few issues with using perplexity to find the appropriate number of topics and proposes additional metric called rate of the perplexity change (RPC) for this purpose. Formally, RPC is defined as:

RPC(i)=|PiPi1titi1| (2)

Where ti is the number of topics from an increasing sequence of candidate numbers and Pi is the corresponding perplexity. We varied the number of topics from 10 to 150 and observed that RPC is maximized at 70 topics. Thus, we trained the LDA model with 70 topics on a subset of 5M tweets collected from user profiles. For more details see Appendix A.3.

Document embeddings

In order to represent the variable length tweets of neighborhoods with a fixed-length feature vector, we used Doc2vec (Le and Mikolov 2014), an unsupervised framework that learns continuous distributed vector representations for pieces of texts. Details on training the doc2vec model can be found in Appendix A.4.

Sentiment Distribution

Sentiment analysis, has been used by researchers for quantifying public moods in the context of unstructured short messages in online social networks (Bertrand et al. 2013). We also characterize the neighborhoods in our study using the mood of the tweets mentioning a venue located inside the neighborhood. As reported in (Ribeiro et al. 2016) the existing methods for sentiment analysis vary widely regarding their agreement; meaning that depending on the choice of sentiment analysis tool, same content could be interpreted very differently. Thus, we use a combination of several methods to make our framework more robust to the limitations of each method. We used five of the best methods for sentiment analysis (Ribeiro et al. 2016) including Vader (Gilbert 2014), Umigon (Levallois 2013), SentiStrength (Thelwall et al. 2010), Opinion Lexicon (Hu and Liu 2004), and Sentiment140 (Go, Bhayani, and Huang 2009). We applied the methods on each tweet and normalized the values. Next, we categorized the observed sentiment values in 4 bins and reported the distribution of tweets sentiment for each neighborhood. For more details see Appendix A.5.

Spatial Distance

In order to represent the spatial relationship of the neighborhoods, we computed the geodesic distance between the center points of each pair of neighborhoods. We then normalize the distance values using minmax normalization.

Common Users

Frequent interaction and the resulting public familiarity engender place-based trust among residents and ultimately the expectation of response to deviant behaviour (Jacobs 1961). For a pair of neighborhoods we assume that the greater the number of users that tweeted about both neighborhoods, the higher is the level of the public familiarity of the residents and the more similar are the neighborhoods in terms of level of collective efficacy. Thus, for each pair of neighborhoods we computed the number of users that tweeted about both of the neighborhoods. Then we divided this value by the total number of users that tweeted about at least one of the neighborhoods in the pair.

4.3. Model

In this section we discuss our ranking task and model architecture. We use a pairwise approach to automatically generate a ranking of neighborhoods with respect to their collective efficacy. Ranking the objects with a function is equivalent to projecting the objects into a vector and sorting the objects according to the projections. The goal here is to use the extracted features for generating a permutation which is close to the ranking of neighborhoods if sorted by collective efficacy values. In the pairwise approach, the ranking task is transformed into a pairwise classification problem. In our case, given representations of a pair of neighborhoods < na,nb > the goals is to predict if na should be ranked higher than nb or na should come later in the ranking. In the first case, a value +1 is the label to be predicted and in the latter case the value −1 is assigned as the true label. We consider a label value of 0 for a pair of tied neighborhoods since we do not want to move one of them higher or lower in the list with respect to the other one.

We then use this local ordering to generate a global ordering of the neighborhoods. We employed different classifiers for the local ordering task including a neural ranker which is a feed-forward neural network. Extensive experiments were conducted to evaluate the effect of the model architecture as well as the predictive power of the features. Our experimental setup and the results are provided in Sections 5 and 6.

4.4. Ordering

We train the local ranker model for each pair of the neighborhoods < ni,nj > and their corresponding local rank label rij which can take −1, +1, or 0. For each pair, we also include another training instance < nj,ni > as the input and −rij as the ground truth value. Given the set of tweets associated with neighborhoods in our study we rank the neighborhoods as follows: for every pair < ni,nj > we first extract features from tweets of neighborhood ni and neighborhood nj then we compute the pairwise features including the spatial distance, and normalized common users count for each pair. We concatenate all the features for every pair mentioned in section 4.2. The model then predicts the local ranking for each pair of neighborhoods using the feature representation of each pair. Let R(ni,nj) be the local rank value of neighborhoods ni and nj predicted by our model. In order to get the global rank of the neighborhoods, we compute the final score C(ni) for all neighborhoods by computing:

C(ni)=ninjNR(ni,nj)

Then we rank the neighborhoods in decreasing order of these scores. The lower the score the lower the degree of collective efficacy of the neighborhood. Similar ranking setup has been used in (Glavas andˇ Stajner 2015Paetzold and Specia 2017; Maddela and Xu 2018) for substitution ranking.

4.5. Evaluation

We evaluated the accuracy of our model by measuring the agreement between the generated ranking and the ground truth ranking. As mentioned in section 4.1, the ranked list of neighborhoods has non-transitive ties. The quality of predicted ranking in this setting can be computed using τx rank correlation coefficient (Emond and Mason 2002). τb is another metric that is used for measuring ranking consensus, however (Emond and Mason 2002) uncovers fundamental issues with the usage of τb metric in the presence of ties.

The τx rank correlation coefficient

Let A be a ranking of n objects. Then (Emond and Mason 2002) defines a weak ordering A of n objects using the n × n score matrix. Element aij of this matrix is defined as follows:

aij={1ifobjectiisrankedaheadofortiedwithobjectj1ifobjectiisrankedbehindobjectj0ifi=j

The τx rank correlation coefficient between two weak orderings A and B is computed by the dot product of their score matrices.

τx(A,B)=i=1nj=1naijbijn(n1)

We further evaluate our proposed framework by computing the τx ranking correlation between the generated ranking and the ground truth ranking. It is important to note that the cumulated gain-based metrics (Järvelin and Kekäläinen 2002) such as Discounted Cumulated Gain (DCG) and the normalized version of it (NDCG) widely used in information retrieval literature for examining the retrieval results are not appropriate to evaluate our framework. The main reason being these measures penalize the ranking mistakes more on the higher end of the ranking while devaluing late retrieved items. However, such an objective does not work for our context - mistakes in ranking on the higher end should be penalized the same as the mistakes in the middle or end of of the list. Thus, employing a measure of ranking agreement is a more appropriate way to evaluate our model.

5. EXPERIMENTS

In this section, we empirically evaluate our hypothesis that “one can leverage social media data to quantify collective efficacy of neighborhoods”.

5.1. Dataset and empirical setup

We lever the survey data obtained from the AHDC study. The details about the AHDC study and the computation of collective efficacy for each neighborhood is shared in section 3. We sorted the neighborhoods based on the number of tweets collected for each of them and (if not stated otherwise) used the top 40% of neighborhoods in this list in our experiments. This list contains 157 neighborhoods that were mentioned in 3,047 tweets on average. The information on the count of block groups in each set of top k% of this list as well as minimum, maximum, mean, and median number of tweets collected for each set is reported in Table 1. For more information on the distribution of collective efficacy in each group of the neighborhoods see the Appendix A.6. We learn our proposed learning to rank model based on 90/10 train/test split of collected tweets. The train/test split also maintains the temporal order where train split is treated as current tweets while test split is treated as future tweets.

Table 1:

The neighborhood count, minimum, mean, and median number of tweets at each set of top k% neighborhoods of the sorted list of the block groups based on their tweet count. The maximum number of tweets in each set of top k% neighborhoods is 98,951.

% of top tweeted Neighborhoods Neighborhood Count Min. Median Mean Standard Deviation of Collective Efficacy

20 78 490 1394.5 5903.5 0.1826
40 157 110 473 3047.7 0.1821
60 235 39 197 2058.8 0.1962
80 314 14 110 1546.9 0.2038
100 393 1 63 1237.2 0.2063

5.2. Baselines for the ranking task

Following are the baselines for the ranking task:

  • Venue count: Neighborhoods were sorted by the number of venues located in them that were mentioned in the tweets.

  • Population: Neighborhoods were sorted by total population of them. The values are extracted from the 2013 report of the United States census bureau5.ssss

  • Tweet count: We sort the neighborhoods by number of tweets that mentioned a venue located in them.

  • User count: We sort the neighborhoods by number of users that tweeted about a venue located in them.

  • Random: We generate 100 random permutations of the neighborhoods and report the average τx.

5.3. The classifier for the local ranking task

Since we rely on pairwise learning to rank, we experiment with below classifiers for our local ranking task. The parameters are tuned using grid search with cross-validation parameter set to 5 and scoring function set to ‘f1’.

  • Logistic Regression (LR): The estimator penalty is set to ‘L1’ and the inverse of regularization strength is set to 0.1.

  • Support Vector Machine (SVM): The kernel is set to ‘rbf’, the penalty parameter C is set to 1, and the gamma kernel coefficient for rbf is set to 0.1.

  • Random Forest (RF): The number of estimators is set to 200. The minimum number of samples required to be at a leaf node is set to 5, and the function to measure the quality of a split is set to ‘gini’.

  • Multi-layer Perceptron (MLP): We use a feed-forward neural network with 3 hidden layers and 100 units at each hidden layer, and a task-specific output layer. We use cross entropy loss and Adam algorithm (Kingma and Ba 2015) for optimization.

As discussed in section 4.1 we define the tied neighborhoods as the ones having a significantly small difference in collective efficacy value. Tied neighborhoods are considered interchangeable in the ranking. We define the ties based on a threshold on collective efficacy difference. We compute the standard deviation of the collective efficacy value of the neighborhoods in our study and define our threshold based on different coefficients of the standard deviation of the collective efficacy. We vary the coefficient from 0 to 1 with 0.2 increments and evaluate the ranking consensus using a ranking correlation metric discussed in section 4.5. More details on tied neighborhoods in shared in Appendix A.7. The results are discussed in section 6.

6. Results

6.1. Ranking performance

In this section, we present the results of ranking agreement of the permutation generated by our framework using 4 different classifiers when the most informative combination of features discussed in Section 4.2 were used. More specifically, we used doc2vec, distribution of topics, distribution of sentiment, normalized common user count, and spatial distance to characterize each pair of neighborhoods in our study. More details on parameter setting for classifiers and feature analysis is presented in Section 5.3 and Section 6.2. As shown in the Figure 2, our framework even when used with a linear classifier such as logistic regression outperforms the baselines by at least 20%. Also, it can be seen that random forest closely followed by multi-layer perceptron is consistently giving better ranking correlation results in comparison to other classifiers.

Figure 2:

Figure 2:

Ranking performance of our proposed model and the baselines. We used 4 classifiers for local ranking module. The x axis indicates the coefficient that is multiplied by standard deviation to make the tie threshold. The standard deviation of the ground truth collective efficacy for the 157 block groups included in this experiment is 0.18.

6.2. Model drill down

In this section we discuss our experiments related to the effect of each feature discussed in Section 4.2 on ranking. To determine the best context feature, we experimented with features in this group namely TF-IDF of crime lexicon, topic distribution of urban activities, and doc2vec on the top 40% highly tweeted neighborhoods. We performed experiments with all different combinations of our 3 content factors. Each content feature is enabled in 3 combinations and disabled in 3 other corresponding paired combinations. Each factorial experiment was conducted using 3 classifiers for the local ranking module including random forest, multi layer perceptron, and logistic regression. We repeated this process for 6 tie coefficients. Tie coefficients varied from 0 to 1 with an interval of 0.2. This resulted in 54 experiments in which a content feature is enabled and 54 experiments in which a content feature is disabled. We observed that adding doc2vec increases the ranking performance and this boost is statistically significant (Wilcoxon signed-rank test with pvalue < 0.001) (Wilcoxon, Katti, and Wilcox ). However, this was not the case for two other content features. The box plot of these experiments is shared in Appendix A.8. Next, we examine the impact of additional features along with doc2vec on the ranking performance. In each experiment we computed the ranking correlation of the generated ranking with the ground truth ranking with tie coefficient values ranging from 0 to 1 with interval of 0.2. The results are presented in Table 2. As it can be seen from the Table, regardless of choice of feature combination or tie coefficient our model consistently outperforms the baselines. For summarizing the ranking correlation results, we rely on AUCERC. AUC-ERC is area under the curve of graph created by plotting tie coefficients against τx. From Table 2, we see that models 7 to 11, show better AUC-ERC score than the baselines. Model number 10 achieves the highest AUC-ERC score. We see that model 11 which includes all the features is not the best performing model. We conjecture that this behaviour is due to the over-fitting of the model on the training set. The TF-IDF feature has a dimensionality of 100 and the classifier might learn a function to predict the local rank between pair of neighborhoods based on few crime lexicon words (e.g. gun, shooting) in the training set. However, the test set might not contain those words on which classifier learned the function thereby resulting in wrong prediction. To summarize, using Model 10 as our proposed model, our generated ranking of neighborhoods achieves 0.77 Kendall tau-x ranking agreement with the ground truth ranking. Our results are between 20% to 37% better than the baselines depending on choice of the tie threshold.

Table 2:

The drilldown of our proposed model. The comparison of Kendall τx and area under the collective efficacy ranking curve(AUC-ERC) of the baselines and our proposed framework. Top 40% of highly tweeted block groups were included in this experiment. Random forest was used for local ordering task.

ID Models 0 0.2 0.4 0.6 0.8 1 AUC-ERC
1 Random 0 0.09 0.20 0.30 0.40 0.49 0.25
2 Coordinates −0.04 0.05 0.17 0.27 0.37 0.47 0.22
3 User count 0.03 0.13 0.25 0.35 0.45 0.53 0.292
4 Tweet count 0.04 0.14 0.25 0.35 0.45 0.53 0.295
5 Population 0.05 0.15 0.26 0.36 0.46 0.55 0.306
6 Venue count 0.09 0.19 0.3 0.4 0.49 0.58 0.342
7 Doc2vec + Sentiment 0.3539 0.4504 0.5256 0.6347 0.713 0.7635 0.5764
8 Doc2vec + Sentiment + Common Users 0.3698 0.461 0.5497 0.6043 0.7033 0.7642 0.5770
9 Doc2vec + Sentiment + Common Users + Topics 0.368 0.4367 0.5425 0.6412 0.6988 0.7647 0.5771
10 Doc2vec + Sentiment + Common Users + Topics + Distance 0.3748 0.4565 0.5388 0.6322 0.7207 0.7735 0.5844
11 Doc2vec + Sentiment + Common Users + Topics + Distance + Tfidf 0.3686 0.4597 0.5207 0.6294 0.6957 0.7666 0.5746

6.3. Effect of data availability

In this section, we explored to what extent the result of ranking consensus is related to the amount of data we have for each neighborhood. With this in mind, we solved the ranking tasks for different set of block groups. These sets are introduced in Section 5.1. We used our ranking framework with MLP as the classifier to rank each set. As indicated in Figure 3 the more the amount of data we have for the neighborhoods in our study, the higher is the ranking consensus of the generated ranking and the ground truth ranking.

Figure 3:

Figure 3:

Ranking performance of our framework and the baselines on different sets of neighborhoods. The sets are defined based on the number of collected tweets. The x axis indicates the tie coefficient. Tie threshold is computed by multiplying the standard deviation of collective efficacy by the tie coefficient. The standard deviation of each set is reported in Table 1.

7. CONCLUSION

In this paper, we focused on the problem of costly computation of collective efficacy values for the neighborhoods. With the help of extensive experiments, we showed that this problem can be addressed by leveraging the social media data. Our proposed framework allows frequent and less costly access to collective efficacy values of the neighborhoods. In the future, we plan to leverage data from other sources (e.g., additional social forums and census) to improve our model. Additionally, we plan to explore the egonet of users on social media and weigh high importance to tweets of users who are more familiar with a particular neighbourhood. Our proposed framework can act as an early warning system to capture the transformations in the neighborhoods’ composition. This potentially can assist regulators and policymakers to prioritize resources, monitor neighborhood safety, and upkeep. It is our intent to release the tweet IDs as well as the ground truth collective efficacy values of the neighborhoods once this work is published.

8. Acknowledgements

This material is based upon work supported by the National Institute of Health (NIH) under Grant No NIH-1R01 HD088545-01A1. Any opinions, findings, and conclusions in this material are those of the author(s) and may not reflect the views of the respective funding agency.

A Supplemental Material

A.1. Data Collection

We collected a significant amount of data from Twitter. We used Snowball sampling to identify Twitter accounts of local citizens. Crawling publicly available tweets from user profiles enables us to collect significantly more amount of data in comparison to collecting streaming real-time tweets of the Columbus area. For the purpose of our study, 63 Twitter accounts that mostly posted news and information about Columbus city were identified and used as the seed users. Many local residents follow such accounts to stay informed about the local events (Kwak et al. 2010). The seed set included the twitter account of several organizations including major universities, recreational centers, medical centers, newspapers, local bloggers, local reporters, police, libraries, restaurants as well as the local sports teams. Using Twitter’s streaming API, the followers of the seed accounts were collected. Following this step, we explored user’s profiles and identified 54K public profiles that marked their locations as Columbus or one of the suburban areas included in the AHDC study. The AHDC study area included several populous suburbs. Collectively, 50 million publicly available tweets were collected from these accounts. In another wave of data collection, we collected publicly available geotagged tweets for a period of May-August of 2018. This resulted to additional 2.8 million tweets. Next, LNEx was used for location name extraction from tweets and associating tweets to neighborhoods. There are cases in which ambiguous locations were reported by LNEx. In our study, we exclude tweets containing ambiguous location entities. The location ambiguities were observed in a following cases:

  • A location entity may have several matches in the gazetteer. For example, Holiday Inn and Gamestop.

  • A location entity having a single gazetteer entry can potentially refer to a huge area. For example, gazetteer entry of Trans-Siberian Highway in Russia spans from St. Petersberg to Vladivastok. Such entities cannot be mapped to a single neighborhood.

  • Location entities extracted by LNEx having a gazetteer entry but not referring to a location in the context. For example, American Girl, Modern Male etc.

Such mentions were identified manually and excluded from the study. This pruning step resulted in 4846 unique locations in the area that were spotted in 545k tweets and were mapped to 424 neighborhoods.

A.2. TF-IDF of crime related words

We tokenized each tweet in our train set preserving the hashtags, handles, and emojis as separate words. We then removed the stopwords and lemmatized the tokens. Bigrams of the tweets were added to the token set. The top 100 crimerelated terms that had the most frequency across the tweets were chosen as our vocabulary set. For the test set, we concatenated all the tweets in each neighborhood to get a single corpus per each neighborhood. We then, transformed each corpus to get the corresponding term-document vector.

Figure 4:

Figure 4:

Distribution of the collective efficacy values in each set of neighborhoods. The collective efficacy values are computed from the survey study and are normalized in the 0 to 1 range. Refer to Section 3 for more details.

Figure 5:

Figure 5:

RPC was used to determine the appropriate number of topics. RPC is maximized for 70 topics.

A.3. Distribution of spatio-temporal urban activities

Prior to feeding the corpus to the LDA module we tokenized the tweets using a tokenizer adapted for tweets6, removed stop words, lemmatized the tokens, and added the bi-grams that appeared in more that 20 tweets to our set of tokens. Next, we removed the words that appeared in less than 20 tweets (rare words) or more than 50% of the tweets. Employing RPC, we used an increment of 10 and varied the number of topics from 10 to 150 and trained LDA model on a corpus of 5M tweets collected from users’ profiles. As depicted in Figure 5 RPC in maximized at 70 topics. Thus, we used 70 as the optimal number of topics for our model.

A.4. Document Embedding

We tokenized, lemmatized, and removed the stop words of 5M tweets collected from user profiles. Subsequently, we fit a Doc2vec model on this corpus. We set the vector size to 50.

For each neighborhood we concatenate all of the associated tweets and generate the embedding using the trained model.

Figure 6:

Figure 6:

The box plot of τx values for TF-IDF, distribution of topics and doc2vec baselines on Top 40% of tweeted neighborhoods with tie threshold from 0 to 1 with an interval of 0.2. Three classifiers including random forest, multi layer perceptron, and logistic regression were used to conduct overall of 54 experiments per content feature.

A.5. Sentiment Distribution

We applied the 5 sentiment analysis tools to each tweet and normalized the values in a range of −1 to 1. Most of these tools predict the sentiment value using a predefined lexicon. Thus, they cannot perform accurately in the absence of sentiment lexicons in the tweets. To account for this, for each tweet, we only consider the non-zero outputs and compute the average value of them. Subsequently, we use a binning step to put the tweets associated with a neighborhood in four bins - highly negative, negative, positive, and highly positive. We normalized the value of bins by dividing the counts by the total number of tweets of the neighborhood. At the end of this step, for each neighborhood, we report the distribution of sentiment of all the tweets mentioning a venue located inside the boundaries of the neighborhood.

A.6. Distribution of Collective Efficacy

The distribution of collective efficacy has been presented in Figure 4. As it can be seen in the plot, in all of neighborhood sets, the distribution of collective efficacy ground truth values in approximately similar to set of all neighborhoods. Also, it can be seen that most of the block groups have a collective efficacy value between 0.4 and 0.6.

A.7. Tied Neighborhoods

As discussed in section 4.1 we define tied neighborhoods as the ones having a significantly small difference in collective efficacy value. Tied neighborhoods are considered interchangeable in the ranking. We define the ties based on a threshold on collective efficacy difference. We refer to this value as ”Tie Threshold”. We compute the standard deviation of the collective efficacy value of the neighborhoods in our study and define our threshold based on different coefficients of the standard deviation of the collective efficacy. We refer to these coefficient as ”Tie Coefficient”. We vary the coefficient from 0 to 1 with 0.2 increments and evaluate the ranking consensus using a ranking correlation metric discussed in section 4.5. The number of neighborhoods that are considered as ”tied” at each tie threshold is shown in Figure 7. As indicated in the plot, by increasing the tie threshold, number of tied pairs increases.

Figure 7:

Figure 7:

Number of tied neighborhood pairs at each tie coefficient. Tied neighborhoods are not ranked against each other. For N number of neighborhoods at each set, the number of paired is N × N −1. Total number of pairs at each tie threshold is shown with label ”infinity”. By increasing the tie threshold, the number of tied neighborhoods increases.

A.8. Results

In order to find the best context feature, we experimented with features in this group namely TF-IDF of crime lexicon, topic distribution of urban activities, and doc2vec on the top 40% highly tweeted neighborhoods. We performed experiments with all different combinations of our 3 content factors. Each content feature is enabled in 3 combinations and disabled in 3 other corresponding paired combinations. We conducted each factorial experiment with 3 classifiers for the local ranking module including random forest, multi layer perceptron, and logistic regression. We repeated this process for 6 tie coefficients. Tie coefficients varied from 0 to 1 with an interval of 0.2. The number of tied neighborhoods at each tie coefficient is shown in Figure 7. The cross product of these parameters resulted to 3×3×6 = 54 experiments in which a content feature is enabled. The box plot of the observed ranking consensus for these 54 experiment for each content feature is presented in Figure 6. As it can be seen in the figure, by characterizing neighborhood’s tweets using doc2vec we consistently generate better rankings in comparison to TF-IDF of crime lexicon and topic distribution of urban activities.

Footnotes

1

In this paper we use the terms “neighborhood” and “block group” interchangeably. Block group refers to a census block group which is a smallest geographical unit for which the United States census bureau publishes sample data

3

It is our intent to open source the tweet IDs as well as the ground truth collective efficacy values of neighborhoods in Columbus once this work is published.

4

This lexicon has been acquired from an open source repository https://github.com/sefabey/fear of crime paper

6

We used the open source tokenizer presented in https://github.com/erikavaris/tokenizer

References

  1. Aghababaei S, and Makrehchi M 2016. Mining social media content for crime prediction. In WI. [Google Scholar]
  2. Al-Olimat HS;Thirunarayan K; Shalin V; and Sheth A 2018. Location name extraction from targeted text streams using gazetteer-based statistical language models. COLING. [Google Scholar]
  3. Bandura A, and Wessels S 1997. Self-efficacy. [Google Scholar]
  4. Bandura A 1997. Editorial. American Journal of Health Promotion. [Google Scholar]
  5. Bendler J; Brandt T; Wagner S; and Neumann D 2014. Investigating crime-to-twitter relationships in urban environments-facilitating a virtual neighborhood watch. [Google Scholar]
  6. Bertrand KZ; Bialik M; Virdee K; Gros A; and Bar-Yam Y 2013. Sentiment in new york city: A high resolution spatial and temporal view. arXiv preprint arXiv:1308.5010. [Google Scholar]
  7. Blei DM; Ng AY; and Jordan MI 2003. Latent dirichlet allocation. JMLR. [Google Scholar]
  8. Browning CR; Cagney KA; and Boettner B 2016. Neighborhood, place, and the life course. In Handbook of the life course. [Google Scholar]
  9. Chainey S, and Ratcliffe J 2013. GIS and crime mapping. [Google Scholar]
  10. Compton R; Lee C; Lu T-C; De Silva L; and Macy M 2013. Detecting future social unrest in unprocessed twitter data:emerging phenomena and big data. In ISI. [Google Scholar]
  11. Couper MP 2017. New developments in survey data collection. Annual Review of Sociology. [Google Scholar]
  12. Emond EJ, and Mason DW 2002. A new rank correlation coefficient with application to the consensus ranking problem. Journal of Multi-Criteria Decision Analysis. [Google Scholar]
  13. Fu C; McKenzie G; Frias-Martinez V; and Stewart K 2018. Identifying spatiotemporal urban activities through linguistic signatures. Computers, Environment and Urban Systems. [Google Scholar]
  14. Gerber MS 2014. Predicting crime using twitter and kernel density estimation. DSS. [Google Scholar]
  15. Gilbert CHE 2014. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In ICWSM. [Google Scholar]
  16. Glavaš G, and Štajner S 2015. Simplifying lexical simplification: do we need simplified corpora? In ACL [Google Scholar]
  17. Go A.; Bhayani R; and Huang L 2009. Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford. [Google Scholar]
  18. Goodman LA 1961. Snowball sampling. The annals of mathematical statistics. [Google Scholar]
  19. Hipp JR 2016. Collective efficacy: How is it conceptualized, how is it measured, and does it really matter for understanding perceived neighborhood crime and disorder? JCJ. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Hu M, and Liu B 2004. Mining and summarizing customer reviews. In SIGKDD. [Google Scholar]
  21. Jacobs J 1961. The death and life of great american cities. New-York, NY: Vintage. [Google Scholar]
  22. Järvelin K, and Kekäläinen J 2002. Cumulated gain-based evaluation of ir techniques. TOIS. [Google Scholar]
  23. Kendall MG 1945. The treatment of ties in ranking problems. Biometrika. [DOI] [PubMed] [Google Scholar]
  24. Kingma DP, and Ba J 2015. Adam: A method for stochastic optimization. ICLR. [Google Scholar]
  25. Kornhauser RR 1978. Social sources of delinquency: An appraisal of analytic models.
  26. Kwak H; Lee C; Park H; and Moon S 2010. What is twitter, a social network or a news media? In WWW. [Google Scholar]
  27. Le Q, and Mikolov T 2014. Distributed representations of sentences and documents. In ICML. [Google Scholar]
  28. Levallois C 2013. Umigon: sentiment analysis based on terms lists and heuristics. In SemEval. [Google Scholar]
  29. Liu T-Y, et al. 2009. Learning to rank for information retrieval. Foundations and Trends in Information Retrieval. [Google Scholar]
  30. Maddela M, and Xu W 2018. A word-complexity lexicon and a neural readability ranking model for lexical simplification. In ACL. [Google Scholar]
  31. Malleson N, and Andresen MA 2015. The impact of using social media data in crime rate calculations: shifting hot spots and changing spatial patterns. CaGIS. [Google Scholar]
  32. Morenoff JD; Sampson RJ.; and Raudenbush SW 2001. Neighborhood inequality, collective efficacy, and the spatial dynamics of urban violence. Criminology. [Google Scholar]
  33. Pacheco DF; Oliveira M; and Menezes R. 2017. Using social media to assess neighborhood social disorganization.
  34. Paetzold G, and Specia L 2017. Lexical simplification with neural ranking. In EACL. [Google Scholar]
  35. Paskevich DM; Brawley LR; Dorsch KD; and Widmeyer WN 1999. Relationship between collective efficacy and team cohesion: Conceptual and measurement issues. Group Dynamics: Theory, Research, and Practice. [Google Scholar]
  36. Patton D 2015. Gang violence, crime, and substance use on twitter: A snapshot of gang communications in detroit. In SSWR. [Google Scholar]
  37. Ribeiro FN; Araujo M; Gonçalves P.; Gonçalves MA; and Benevenuto F. 2016. Sentibencha benchmark comparison of sentiment analysis methods. EPJ Data Science. [Google Scholar]
  38. Sampson RJ, and Groves WB 1989. Community structure and crime: Testing socialdisorganization theory. AJS. [Google Scholar]
  39. Sampson RJ; Morenoff JD; and Earls F 1999. Beyond social capital: Spatial dynamics of collective efficacy for children. American sociological review. [Google Scholar]
  40. Sampson RJ.; Raudenbush SW; and Earls F 1997. Neighborhoods and violent crime: A multilevel study of collective efficacy. Science. [DOI] [PubMed] [Google Scholar]
  41. Skogan W 2015. Disorder and decline: The state of research. JRCD. [Google Scholar]
  42. Thelwall M; Buckley K; Paltoglou G; Cai D; and Kappas A 2010. Sentiment strength detection in short informal text. JASIST. [Google Scholar]
  43. Vedula N; Parthasarathy S; and Shalin VL. 2017. Predicting trust relations within a social network: A case study on emergency response. In WebSci. [Google Scholar]
  44. Wang M, and Gerber MS 2015. Using twitter for next-place prediction, with an application to crime prediction. In Computational Intelligence, IEEE Symposium Series on. [Google Scholar]
  45. Wang X; Brown DE; and Gerber MS 2012. Spatio-temporal modeling of criminal incidents using geographic, demographic, and twitterderived information. In ISI. [Google Scholar]
  46. Wang X.; Gerber MS; and Brown DE. 2012. Automatic crime prediction using events extracted from twitter posts. In SBP-BRiMS. [Google Scholar]
  47. Weisburd D; Bruinsma GJ.; and Bernasco W Units of analysis in geographic criminology: historical development, critical issues, and open questions. In Putting crime in its place. [Google Scholar]
  48. Welsh BC.; Braga AA; and Bruinsma GJ 2015. Reimagining broken windows: from theory to policy. Journal of Research in Crime and Delinquency. [Google Scholar]
  49. Wilcoxon F; Katti S.; and Wilcox RA Critical values and probability levels for the wilcoxon rank sum test and the wilcoxon signed rank test. Selected tables in mathematical statistics. [Google Scholar]
  50. Williams ML; Burnap P; and Sloan L 2017. Crime sensing with big data: The affordances and limitations of using open-source communications to estimate crime patterns. BJC. [Google Scholar]
  51. Wilson JQ, and Kelling GL 1982. Broken windows . Atlantic monthly. [Google Scholar]
  52. Zhao W; Chen JJ; Perkins R; Liu Z; Ge W; Ding Y; and Zou W 2015. A heuristic approach to determine an appropriate number of topics in topic modeling. In BMC bioinformatics. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES