Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2020 Jul 9;15(7):e0235227. doi: 10.1371/journal.pone.0235227

Using 311 data to develop an algorithm to identify urban blight for public health improvement

Jessica Athens 1,*, Setu Mehta 2, Sophie Wheelock 1, Nupur Chaudhury 1, Mark Zezza 1
Editor: Changshan Wu3
PMCID: PMC7347128  PMID: 32645013

Abstract

The growth of administrative data made available publicly, often in near-real time, offers new opportunities for monitoring conditions that impact community health. Urban blight—manifestations of adverse social processes in the urban environment, including physical disorder, decay, and loss of anchor institutions—comprises many conditions considered to negatively affect the health of communities. However, measurement strategies for urban blight have been complicated by lack of uniform data, often requiring expensive street audits or the use of proxy measures that cannot represent the multifaceted nature of blight. This paper evaluates how publicly available data from New York City’s 311-call system can be used in a natural language processing approach to represent urban blight across the city with greater geographic and temporal precision. We found that our urban blight algorithm, which includes counts of keywords (‘tokens’), resulted in sensitivity ~90% and specificity between 55% and 76%, depending on other covariates in the model. The percent of 311 calls that were ‘blight related’ at the census tract level were correlated with the most common proxy measure for blight: short, medium, and long-term vacancy rates for commercial and residential buildings. We found the strongest association with long-term (>1 year) commercial vacancies (Pearson’s correlation coefficient = 0.16, p < 0.001). Our findings indicate the need of further validation, as well as testing algorithms that disambiguate the different facets of urban blight. These facets include physical disorder (e.g., litter, overgrown lawns, or graffiti) and decay (e.g., vacant or abandoned lots or sidewalks in disrepair) that are manifestations of social processes such as (loss of) neighborhood cohesion, social control, collective efficacy, and anchor institutions. More refined measures of urban blight would allow for better targeted remediation efforts and improved community health.

Introduction

Public health concerns have contributed to key urban planning strategies since the late 19th century, where reforms in sanitation, the introduction of zoning, and land use regulation were a means of acknowledging the health risks of exposure to contaminated air and water [1,2]. Such planning tools were also intended to address social ills, as residential overcrowding and limited access to green space were considered to be risks to psychological and ‘moral’ well-being [2,3,4]. The changes sought were specifically targeted to the built environment, best summarized as ‘… all of the physical parts of where we live and work (e.g., homes, buildings, streets, open spaces, and infrastructure)’ [3].

The prominence of the built environment as a determinant of population health rose again in the late twentieth and early twenty-first centuries, with significant research outlining its impacts on chronic disease, mental health, and injuries [4]. Starting in the mid-twentieth century, the concept of ‘urban blight’ emerged to reflect physical, economic, and social decline of neighborhoods [5]. For the purpose of this research, urban blight is considered as physical disorder (e.g., litter, overgrown lawns, or graffiti) and decay (e.g., abandoned lots or sidewalks in disrepair) that are manifestations of social processes such as (loss of) neighborhood cohesion, social control, collective efficacy, and anchor institutions. Whereas many aspects of urban planning and the built environment (such as street grids, capital investments in utilities and other infrastructure), and development of housing stock are either fixed or could only be changed with long-term interventions, the physical decay and disorder that comprise urban blight can be addressed over relatively short time periods.

Standard approaches to measuring blight with secondary data sets include measures of physical decay, such as vacant commercial property, vacant housing units, and vacant lots within a neighborhood [6,7]. Other researchers have looked at measures of physical disorder, otherwise considered neighborhood quality measures, such as mown lawns, litter and debris, delinquent vehicles, and presence/absence of graffiti [6,8]. These quality-related indicators are more difficult to capture, often requiring street audits in person or virtually, using Google Street View [9,10]. Finally, given that blight is considered a physical manifestation of poor social cohesion, a third dimension of blight includes social and economic investment, which can be evaluated through measures such as perceptions of safety, presence of anchor institutions, and community organizing efforts [11].

The body of literature linking urban blight to community health is constrained due to differing definitions of blight and the availability of secondary data sources for operationalizing that definition. As outlined in Maghelal et al., measures such as vacancy rates or tax delinquency rates assess only one facet of blight, whereas other measures—including income, single-parent households, or racial/ethnic composition—are used to proxy for urban blight [12,5]. These proxies are problematic because they rely on correlations between social disadvantage and physical environment characteristics but overlook the systemic causes of both.

The association between built environment features and health behaviors, biomarkers indicative of chronic disease, and health outcomes has been well documented [1216]. In particular, quality of the built environment—including tree cover and green space, park amenities, sidewalk coverage and maintenance, and presence/absence of environmental toxins (air pollution, lead and other heavy metals in the soil)—are associated with community health. Though the literature specific to urban blight and public health outcomes is less developed, there is strong evidence that urban blight is specifically associated with higher violent crime and gun crime [5], poor mental health [17,18], and even adverse pregnancy outcomes [19]. Experimental research on the causal relationships between urban blight/blight remediation (specifically urban greening efforts) and (1) biomarkers and (2) mental health demonstrates significant improvements on heart rate and depression measures [18, 20]. These improvements were evident in both high- and low-poverty census tracts.

The municipal 311 data systems that started first in Baltimore, MD (1997) and were introduced to New York City in 2003, could serve as a source of information that addresses the paucity of secondary data for a uniform evaluation of urban blight across a municipality [21]. The 311 data system is designed to log non-urgent calls to municipal agencies regarding anything from requests for information on municipal services and benefits enrollment, to reporting a housing violation or making a noise complaint. These data are updated daily, and include geo-location (latitude and longitude), responsible agency, category of complaint, and free text description of the call. Given the volume of data (~3 million calls annually in New York City), and the noise inherent in the data, 311 data analysis requires a ‘big data’ approach. The free text nature of call descriptions suggests applying natural language processing to identify key words or strings predictive of urban blight.

Objectives

This research effort aims to evaluate how regularly updated administrative data, namely the 311-call system data, can be used to represent urban blight at fine-grained geographies and, to a more modest extent, time periods. We apply a natural language processing approach to develop an algorithm to identify urban blight-related calls in the 311 data system and explore their distribution across New York City. These results are compared to American Community Survey residential vacancy data and HUD USPS vacancy rates for commercial and residential buildings. These indicators are commonly used proxies for assessing urban blight [5, 17]. An advantage of a 311-based measure of blight is that it provides finer-grained geographic and temporal data. Although not fully developed in this analysis, the 311 data has the potential to disambiguate different types of blight (i.e., decay, disorder, and social/economic investment). Having more refined measures of blight could also allow for better targeted blight remediation efforts. Future work will explore the relationship between a finalized measure of urban blight drawn from 311 calls and community health conditions.

Data and methods

Data

New York City 311 Service Requests were the primary data source for algorithm development. Six months of data (January 1, 2018–June 30, 2018), comprising 1.3 million records, were pulled from New York City’s Open Data Portal directly into RStudio (v. 3.5.1) using the ‘RSocrata’ library [2224]. Supplemental data on census tract-level population, residential vacancies, and median home value were drawn from the American Community Survey (ACS) 5-Year Estimates, 2013–2017, table DP04 [25]. As with 311 call data, these data were called into RStudio with the ‘ACS’ library [26]. Data on short- and long-term residential and commercial vacancies originated from the Residential and Commercial Vacancy data set from the US Postal Service and Department of Housing and Urban Development for the first quarter of 2018 [27].

Methods

Following standard practice for Natural Language Processing (NLP), we followed a four-step process: (1) data training, (2) cleaning and tokenization, (3) classification, and (4) validation [28].

Training the data requires the manual designation of a random sample of calls to an urban blight versus a non-urban blight category. We established seven domains of urban blight based on extant literature to guide our data training: social conditions, abandoned property, air quality, street/sidewalk maintenance, noise, sanitary conditions, and building safety. Coding any call into one of these categories signifies the call is urban blight-related. Domains were assigned based on complaint types and free-text call descriptions in the 311 data. Complaint type is a variable designated by the City of New York and included as part of the 311 system data, which comprises 236 complaint types. We focused on ‘high frequency’ complaints (≥ 1,000 records) to simplify the training process. The high frequency list included 93 complaint types.

We selected half of the call records (~650,000) as the training data set using simple random sampling. Two raters reviewed a 10% sample of data to assign complaint types (N = 93) to one of the urban blight domains (N = 7) or to a non-urban blight-related category. Consistency in coding was evaluated using Cohen’s Kappa statistic, which adjusts for the probability that raters will agree by chance.

After data training, we cleaned the call description text field to address misspelling and to standardize text formatting. String variables were then separated into ‘tokens’ or unique words or strings that comprise text data. Common tokens such as articles or prepositions were omitted from analysis. Of the remaining tokens, we calculated the percent that appeared exclusively in ‘blight’ calls and those that were unique to ‘non-blight’ calls. Ultimately, only tokens that appeared in urban blight-related calls were used in the following stage of analysis.

For data classification, we used a logistic regression model on a 50% sample of the trained data (~325,000 records) to determine how effective the blight-related tokens are at identifying a blight-related call. A token or series of tokens can be used as predictors in the regression model. In order to have parsimonious models, we calculated the total of blight-related tokens that appeared in each record and used this count as the primary predictor of urban blight (0/1) in our logistic regressions.

Our first model was a basic model predicting urban blight, with the unique blight-related token count as the only independent variable (Eq 1). In the second model, we included a categorical variable for borough as an independent variable (Eq 2). Finally, our third model included the variable for borough as well as a variable for the agency assigned responsibility for the 311 call. As with borough, agency was coded as a categorical variable with 15 levels (Eq 3). Responsible agencies included departments of sanitation, police, finance, health and mental hygiene, and consumer affairs, among others (See Table 3 for a full list of agencies).

Table 3. Logistic regression model results.

Pred(Urban blight = 1) Model 1 Model 2 Model 3
Coefficient (SE) Coefficient (SE) Coefficient (SE)
Intercept -1.27 (0.009)** -0.74 (0.014)** -17.75 (86.15)
Unique token count 0.37 (0.002)** 0.37 (0.002)** 0.38 (0.002)**
Borough (reference = Bronx)
Brooklyn -0.58 (0.014)** -0.37 (0.017)**
Manhattan -0.38 (0.016)** -0.08 (0.020)**
Queens -0.76 (0.014)** -0.58 (0.018)**
Staten Island -0.21 (0.022)** 0.13 (0.028)**
Agency (reference = not specified)
Environmental Protection 18.46 (86.15)
Department for the Aging -1.85 (168.36)
Buildings 17.03 (86.15)
Education 32.77 (210.93)
Finance -0.40 (95.46)
Health and Mental Hygiene 17.03 (86.15)
Transportation 21.01 (86.15)
Parks and Recreation 19.45 (86.15)
Sanitation 15.81 (86.15)
Housing Preservation and Development 18.24 (86.15)
Human Resources Administration 4.45 (130.11)
Police 15.53 (86.15)
Taxi and Livery Commission 0.35 (107.37)

* p < 0.05.

** p <0.001.

Model 1: Intercept and unique token count.

Model 2: Intercept, unique token count, and borough.

Model 3: Intercept, unique token count, borough, and agency.

logit(yi)=β0+β1(uniquetokencounti)+εi Eq 1
logit(yi)=β0+β1(uniquetokencounti)+β2(boroughi)+εi Eq 2
logit(yi)=β0+β1(uniquetokencounti)+β2(boroughi)+β3(agencyi)+εi Eq 3

Urban blight (yi) is a binary variable indicating whether a call is considered blight-related. “Unique token count” represents the number of unique keywords related to blight that appear in any text field within the 311 data. These keywords were those identified as urban related during the data training step. “Borough” is a 5-level factor variable representing Bronx, Brooklyn, Manhattan, Queens, and Staten Island. Bronx serves as the reference category. “Agency” is the agency assigned responsibility for addressing the 311 call. The reference category is unassigned.

The coefficients from each of these three models were used to predict the probability that a call was urban blight-related for the balance of the training data (i.e. categorized data not used in regression models). We used confusion matrices to evaluate how well each model predicted whether a call represented urban blight, calculating accuracy (ACC), positive predictive values (PPV), and negative predictive values (NPV) (Table 1).

Table 1. Sensitivity, specificity, and accuracy of urban blight algorithm.

Model 1
Urban blight = 1 Urban blight = 0 PPV (Sensitivity) 91%
Pred(Urban blight = 1) 210,413 43,701 NPV (Specificity) 55%
Pred(Urban blight = 0) 20,718 54,211 ACC (Accuracy) 80%
Model 2
Urban blight = 1 Urban blight = 0 PPV (Sensitivity) 90%
Pred(Urban blight = 1) 208,947 42,172 NPV (Specificity) 57%
Pred(Urban blight = 0) 22,184 55,740 ACC (Accuracy) 80%
 Model 3 
Urban blight = 1 Urban blight = 0 PPV (Sensitivity) 90%
Pred(Urban blight = 1) 208,213 23,884 NPV (Specificity) 76%
Pred(Urban blight = 0) 22,918 74,028 ACC (Accuracy) 86%

As a first effort to validate our results, we calculated correlations between census tract-level measures of housing vacancies from the American Community Survey and blight-related calls, both as count variables and expressed as percentages. Similarly, we calculated correlations between blight-related calls (%) with percent of residential and commercial addresses considered long (>12 months), medium (6–12 months), and short-term (< 6 months) vacancies from the USPS/HUD data set.

Results

Of the 1.3 million call records across 236 complaint types, we identified 93 ‘high frequency’ types (≥ 1,000 records) that comprised 98% of all calls over the 6-month time frame (Table 2). After parallel coding of a sample of the data, in which raters assigned a call to one of the seven urban blight domains (social conditions, abandoned property, air quality, street/sidewalk maintenance, noise, sanitary conditions, building safety) or ‘not urban blight related,’ we calculated Cohen’s Kappa statistic for categorical outcomes [29]. The resulting value, κ = 0.81, indicated high values of agreement between raters.

Table 2. Summary of 311 call data.

Total calls, January 1-June 30, 2018 1,344,402
Total complaint types 236
Complaint types with ≥ 1,000 complaints (‘high frequency’) 93
Percent of complaints in ‘high frequency’ category 98.0%
Total complaint types considered ‘urban blight’ 55

The next stage, text cleaning and tokenization, resulted in 1,113 unique words (tokens) that were represented in the call description field. Of these, 46% (516) appeared exclusively in ‘blight’ calls and 37% (415) appeared only in ‘non-blight’ calls. The remaining 17% (182) were found in both blight and non-blight records. Of the 182 tokens that appeared in both ‘blight’ and ‘non-blight’ calls, many appeared in roughly the same proportion in both categories, so we chose to restrict our analysis to the 46% of tokens found exclusively in blight-related calls. Using this list of 516 tokens, we calculated the count of unique blight-related terms in each record (mean 7.08, SD 4.4; min/max 0–22) to use as predictor variable for the probability that a call was related to urban blight. The count of unique tokens was a significant predictor of blight-related calls when used as a single predictor (~ 0.37 [0.002], p < 0.0001), combined with borough (~ 0.37 [0.002], p < 0.0001), and when combined with borough and responsible agency (~ 0.38 [0.002], p < 0.0001) (see Table 3 for all coefficient statistics in each model).

The coefficients for each of the three models were used to predict the probability that a call was blight-related in sample of the data not used in modeling. The predicted values from each model were used to calculate ACC, PPV, and NPV. All models displayed similar PPV levels (90% - 91%); however, model 3 resulted in the best NPV (76%) and ACC (86%) (Table 1).

Model 3 coefficients were then applied to the full data set to calculate and map the predicted percentage of calls that were blight-related by census tract (Fig 1). We found that blight-related calls were concentrated in upper Manhattan, specifically Harlem and the Upper West Side, and the Bronx, which is just north of Manhattan (Example 1A), with some areas of concentration in central Brooklyn—Bedford-Stuyvesant, Crown Heights, Flatbush, and Brownsville (Example 1B). These areas of the city are historically among the most economically distressed, but also observing rapid gentrification. The lowest proportion of calls were observed in the Bay Ridge and Bensonhurst neighborhoods in the southwest of Brooklyn (Example 2A), and in Ridgewood, Middle Village, and Forest Hills in central Queens (Example 2B). These areas in Brooklyn and Queens are highly residential communities with a more pronounced suburban character.

Fig 1. Map of blight-related 311 calls by census tract.

Fig 1

Although the confusion matrices tested internal validity of our models, we next sought to evaluate the construct’s external validity by comparing the results to census tract-level housing vacancies from American Community Survey and more current housing and commercial vacancy data from USPS/HUD. The correlations between housing vacancies from the American Community Survey and blight-related calls at the tract-level was 0.32 (p < 0.001) for count variables. When expressed as percentages, the correlation was -0.1 (p < 0.001). It seems likely that the positive association between count variables reflects size of the census tracts. When normalized by total calls and total housing units per census tract, there is a slight, negative relationship between vacancy rates and blight-related calls.

Census tract vacancy data from USPS/HUD presented as percent of residential, commercial, or total vacant addresses showed null or positive associations with percent of blight-related calls. Long-term commercial vacancies had the strongest association with our blight metric, with a correlation of 0.16 (p < 0.0001). Long-term residential vacancies were also associated with blight-related calls, though the correlation was not as strong (0.10, p < 0.0001). Short-term commercial vacancies were also mildly correlated with urban blight-related calls (0.05, p < 0.0001), but none of the remaining short- or medium-term vacancies were statistically significantly correlated with the blight metric (Table 4).

Table 4. Correlations between urban blight-related calls and short-, medium-, and long-term vacancies (residential and commercial).

Vacancy type Vacancy Duration Correlation Coefficient
Residential Short-term—<6 mo (%) 0.026
Medium-term—6–12 mo (%) 0.019
Long-term—>1 yr (%) 0.098**
Commercial Short-term—<6 mo (%) 0.049*
Medium-term—6–12 mo (%) 0.035
Long-term—>1 yr (%) 0.16**
Total Short-term—<6 mo (%) 0.032
Medium-term—6–12 mo (%) -0.03
Long-term—>1 yr (%) -0.006

* p < 0.05.

** p <0.001.

Discussion

The identified key words were an effective predictor of blight-related calls, but the small, inverse relationship between percent of blight-related calls and vacancy rates based on American Community Survey data was unexpected. The American Community Survey data is based on survey responses over a five-year time frame, so even though it was the most current available, these data preceded the 311 data by 3 years on average. When using HUD/USPS vacancy data for the same time period as the 311 data, disambiguating between residential and commercial vacancies, and specifying duration of vacancies, we found that commercial vacancies—at least in the New York City context—were positively associated with the urban blight metric. The strongest correlation was between long-term (> 12 months) commercial vacancies and percent of calls identified as blight related, which suggests that longer-term vacancies better reflect the ‘physical disorder and decay’ that we are considering to be urban blight, whereas shorter term vacancies may reflect a certain degree of turnover or ‘churning’ in the real estate market. Even though the correlation between long-term commercial vacancy and percent of 311 calls related to blight is the strongest association we observed, it is still relatively small. This finding is not necessarily problematic, as we would anticipate each measure to reflect different (unobserved) characteristics of a neighborhood.

Our approach to coding calls as urban blight versus non-urban blight-related relies upon our identification of seven ‘domains’ of blight which guided our call assignment. The natural extension of this approach is to further develop our algorithm to identify which tokens are predictive of each blight domain. Continuing our algorithm development to predict blight domains will be useful in identifying variations in neighborhoods based on specific components of urban blight. Together with qualitative data collection, these steps can help us determine if areas of blight concentration align with residents’ perception of disinvestment in their communities.

These findings are subject to a series of limitations. Most notably, our analysis does not control for a neighborhood’s propensity to call 311. As Weaver and Bagchi-Sen note, urban blight can be considered to represent a threshold of ‘non-acceptance,’ or the point at which community residents find that neighborhood quality has fallen below community-specific norms [8]. These norms are highly variable across neighborhoods, so sidewalk damage on the Upper East Side of Manhattan, which is one of the more affluent communities in New York City, may elicit many more 311 calls relative to a similar condition in Brownsville, Brooklyn, which is a predominantly low-income neighborhood with a high density of public housing. Second, a recent trend in community activation and engagement has emerged in which residents flood 311 dispatch with complaints to motivate city government to repair long-ignored problems in their neighborhoods [11]. Such engagement, while laudable, could make 311-based estimates of urban blight less reliable. If researchers use historical trends to identify propensities for calling 311, a sudden spike in 311 engagement may erroneously indicate rapid deterioration of the neighborhood’s built environment. Finally, a most salient critique of this analysis is how the results may be misused. The findings are not intended as a referendum on residents’ interest or willingness to invest in their communities. Placing blame or responsibility on residents without acknowledging that municipal government and the private sector often resist investing in less-affluent, majority-minority neighborhoods only reinforces a cycle of continued disinvestment.

Conclusion

There is a strong utility for this research amongst urban planners, public health practitioners, and government officials. For urban planners, geographic and temporal patterns in urban blight-related 311 calls (i.e., variation in residents’ acceptance of blighted conditions) will help prioritize community needs and desires when determining new planning projects. Such information is essential to develop neighborhoods and amenities that address the most pressing issues communities face.

Moreover, given the connection between blight, biomarkers, and mental health measures, a clearer view of how urban blight is distributed geographically will help public health practitioners identify areas of concentrated poor health, or areas at risk of negative health outcomes across the city. Data on blight-related 311 calls will help public health officials understand where best to concentrate health interventions. For government officials more generally, understanding the key links between urban blight, public health and community investment will help identify where and how cities can maximize the benefits of neighborhood interventions.

The utility of the 311 algorithm will be expanded as it is refined for predicting different domains of urban blight. Although not fully explored here, the category types in the 311 data is a useful lever for distinguishing domains of urban blight. Positive associations with temporally aligned HUD/USPS vacancy data—a commonly used proxy for urban blight—represents a positive step in external validation. As noted, validation will be a continuing process, incorporating domains of urban blight and insights from focus groups drawn from various neighborhoods across New York City. A second step is to assess how predictive our 311 measure of urban blight and its domains are of health-related measures, including but not limited to prevalence of chronic disease, injuries (accidents, interpersonal violence), and mental health conditions. Such associations would provide policy guidance for focusing public health interventions.

Acknowledgments

We would like to thank Susan Kum, Ph.D., New York City Department of Health and Mental Hygiene for her supporting background research.

Data Availability

Data for New York City 311 Service Requests from 2010 to Present are available for download at https://data.cityofnewyork.us/Social-Services/311-Service-Requests-from-2010-to-Present/erm2-nwe9. Data were loaded into RStudio (version 3.5.1) via the RSocrata library using the following code. # Install RSocrata library library(RSocrata) register_google(key="[include user-specific key]") # Call in NYC data for 01-01-2018 through 06-30-2018 nyc <- as.data.frame(read.socrata("https://data.cityofnewyork.us/resource/fhrw-4uyv.json?$where=created_date between '2018-01-01T12:00:00' and '2018-06-30T23:59:59'")).

Funding Statement

The authors received no specific funding for this work.

References

  • 1.Perdue WC, Gostin LO, Stone LA. Public health and the built environment: historical, empirical, and theoretical foundations for an expanded role. J Law Med Ethics. 2003;31(4):557–566. 10.1111/j.1748-720x.2003.tb00123.x [DOI] [PubMed] [Google Scholar]
  • 2.Corburn J. Reconnecting with our roots: American urban planning and public health in the twenty-first century. Urban Aff Rev. 2007;42(5):688–713. [Google Scholar]
  • 3.Centers for Disease Control and Prevention [Internet]. Impact of the built environment on health; 2011 [cited 2019 Dec 30]. National Center for Environmental Health. Available from: https://www.cdc.gov/nceh/publications/factsheets/impactofthebuiltenvironmentonhealth.pdf
  • 4.Diez Roux AV, Mair C. Neighborhoods and health. Ann NY Acad Sci. 2010;1186:125–45. 10.1111/j.1749-6632.2009.05333.x [DOI] [PubMed] [Google Scholar]
  • 5.Charting the multiple meanings of blight: a national literature review on addressing the community impacts of blighted properties [Internet]. Keep America Beautiful; 2015 [cited 2019 Dec 30]. Available from: https://kab.org/wp-content/uploads/2019/08/ChartingtheMultipleMeaningsofBlight_FinalReport.pdf
  • 6.Maghelal P, Andrew S, Arlikatti S, Jang HS. Assessing blight and its economic impacts: a case study of Dallas, TX. WIT Transactions on Ecology and the Environment. 2014;181:187–197. [Google Scholar]
  • 7.Kondo MC, Morrison C, Jacoby SF, Elliott L, Poche A, Theall KP, et al. Blight abatement of vacant land and crime in New Orleans. Pub Health Rep. 2018;133(6):650–657. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Weaver RC, Bagchi-Sen S. Spatial analysis of urban decline: the geography of blight. Appl Geogr. 2013;40:61–70. [Google Scholar]
  • 9.Rundle AG, Bader MDM, Richards CA, Neckerman KM, Teitler JO. Using Google Street View to audit neighborhood environments. Am J Prev Med. 2011;40(1):94–100. 10.1016/j.amepre.2010.09.034 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Mooney SJ, Bader MDM, Lovasi GS, Neckerman KM, Teitler JO, Rundle AG. Validity of an ecometric neighborhood physical disorder measure constructed by virtual street audit. Am J Epidemiol. 2014;180(6):626–635. 10.1093/aje/kwu180 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Teixeira S, Kolke D. Using local data to address abandoned property: lessons learned from a community health partnership. Prog Community Health Partnersh. 2017;11(2):175–182. 10.1353/cpr.2017.0022 [DOI] [PubMed] [Google Scholar]
  • 12.Rappaport SM. Discovering environmental causes of disease. Journal of Epidemiology and Community Health. 2012;66(2):99–102. 10.1136/jech-2011-200726 [DOI] [PubMed] [Google Scholar]
  • 13.Sears ME, Genuis SJ. Environmental Determinants of Chronic Disease and Medical Approaches: Recognition, Avoidance, Supportive Therapy, and Detoxification [Review Article]. Journal of Environmental and Public Health. 2012; 2012:e356798. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Remoundou K, Koundouri P. Environmental Effects on Public Health: An Economic Perspective. International Journal of Environmental Research and Public Health. 2009;6(8):2160–2178. 10.3390/ijerph6082160 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Prüss-Üstün A, Bonjour S, Corvalán C. The impact of the environment on health by country: A meta-synthesis. Environmental Health. 2008;7(1):7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Sly PD, Carpenter DO, Van den Berg M, Stein RT, Landrigan PJ, Brune-Drisse MN, et al. Health Consequences of Environmental Exposures: Causal Thinking in Global Environmental Epidemiology. Annals of Global Health. 2016;82(1):3–9. 10.1016/j.aogh.2016.01.004 [DOI] [PubMed] [Google Scholar]
  • 17.de Leon E, Schilling J. Urban blight and public health: addressing the impact of substandard housing, abandoned buildings, and vacant lots. Washington, DC: Urban Institute; 2017. [Google Scholar]
  • 18.South EC, Hohl BC, Kondo MC, MacDonald JM, Branas CC. Effect of greening vacant land on mental health of community-dwelling adults: a cluster randomized trial. JAMA Netw Open. 2018;1(3):e180298 10.1001/jamanetworkopen.2018.0298 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Mayne SL, Pellissier BF, Kershaw KN. Neighborhood physical disorder and adverse pregnancy outcomes among women in Chicago: a cross-sectional analysis of electronic health record data. J Urban Health. 2019;96:823–834. 10.1007/s11524-019-00401-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.South EC, Kondo MC, Cheney RA, Branas CC. Neighborhood blight, stress, and health: a walking trial of urban greening and ambulatory heart rate. Am J Pub Health. 2015;105(5):909–913. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Goodyear S. 3-1-1: a city services revolution. [cited 2019 Dec 30]. In: CityLab. City Makers: Connections [Internet]. Available from: https://www.citylab.com/city-makers-connections/311/
  • 22.311 Service Requests from 2010 to Present; 2019 [cited 2019 Dec 30]. Database: NYC Open Data [Internet]. Available from: https://data.cityofnewyork.us/resource/fhrw-4uyv.json?$where=created_date between '2018-01-01T12:00:00' and '2018-06-30T23:59:59.
  • 23.RStudio Team. RStudio: Integrated Development Environment for R, version 1.1.463 [software]. RStudio, Inc. 2016 [cited 2019 Dec 30]. Available from: http://www.rstudio.com
  • 24.RSocrata [dataset on Internet]. [cited 2019 Dec 30]. Available from: https://CRAN.R-project.org/package=RSocrata.
  • 25.American Community Survey 5-Year Estimates, 2013–2017, table DP04 [dataset on Internet]. [cited 2019 Dec 30]. Available from: https://factfinder.census.gov/bkmk/table/1.0/en/ACS/17_5YR/DP04.
  • 26.Glenn EH. Download, Manipulate, and Present American Community Survey and Decennial Data from the US Census. 2019 February 19. [cited 2019 Dec 30]. Available from: https://cran.r-project.org/web/packages/acs/acs.pdf.
  • 27.HUD Aggregated USPS Administrative Data on Address Vacancies [dataset on Internet]. U.S. Department of Housing and Urban Development's Office of Policy Development and Research. [cited 2019 Aug 1]. Available from: https://www.huduser.gov/portal/datasets/usps.html.
  • 28.Silge J, Robinson D. Text mining with R: a tidy approach. Sebastopol: O’Reilly Media, Inc.; 2017. [cited 2019 Dec 30]. Available from: https://www.tidytextmining.com/. [Google Scholar]
  • 29.StatsDirect [Internet]. Agreement of categorical measurements. [cited 2019 Dec 17]. Available from: https://www.statsdirect.com/help/agreement/kappa.htm.

Decision Letter 0

Changshan Wu

20 Apr 2020

PONE-D-20-05006

Using 311 data to develop an algorithm to identify urban blight for public health improvement

PLOS ONE

Dear Dr Athens,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

We would appreciate receiving your revised manuscript by Jun 04 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Changshan Wu

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements:

1.    Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at http://www.plosone.org/attachments/PLOSOne_formatting_sample_main_body.pdf and http://www.plosone.org/attachments/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. PLOS requires an ORCID iD for the corresponding author in Editorial Manager on papers submitted after December 6th, 2016. Please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field. This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. Please see the following video for instructions on linking an ORCID iD to your Editorial Manager account: https://www.youtube.com/watch?v=_xcclfuvtxQ

3. We note that Figure 2 in your submission contains map images which may be copyrighted. All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For these reasons, we cannot publish previously copyrighted maps or satellite images created using proprietary data, such as Google software (Google Maps, Street View, and Earth). For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright.

We require you to either (1) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (2) remove the figures from your submission:

1.    You may seek permission from the original copyright holder of Figure 2 to publish the content specifically under the CC BY 4.0 license. 

We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text:

“I request permission for the open-access journal PLOS ONE to publish XXX under the Creative Commons Attribution License (CCAL) CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). Please be aware that this license allows unrestricted use and distribution, even commercially, by third parties. Please reply and provide explicit written permission to publish XXX under a CC BY license and complete the attached form.”

Please upload the completed Content Permission Form or other proof of granted permissions as an "Other" file with your submission.

In the figure caption of the copyrighted figure, please include the following text: “Reprinted from [ref] under a CC BY license, with permission from [name of publisher], original copyright [original copyright year].”

2.    If you are unable to obtain permission from the original copyright holder to publish these figures under the CC BY 4.0 license or if the copyright holder’s requirements are incompatible with the CC BY 4.0 license, please either i) remove the figure or ii) supply a replacement figure that complies with the CC BY 4.0 license. Please check copyright information on all replacement figures and update the figure caption with source information. If applicable, please specify in the figure caption text when a figure is similar but not identical to the original image and is therefore for illustrative purposes only.

The following resources for replacing copyrighted map figures may be helpful:

USGS National Map Viewer (public domain): http://viewer.nationalmap.gov/viewer/

The Gateway to Astronaut Photography of Earth (public domain): http://eol.jsc.nasa.gov/sseop/clickmap/

Maps at the CIA (public domain): https://www.cia.gov/library/publications/the-world-factbook/index.html and https://www.cia.gov/library/publications/cia-maps-publications/index.html

NASA Earth Observatory (public domain): http://earthobservatory.nasa.gov/

Landsat: http://landsat.visibleearth.nasa.gov/

USGS EROS (Earth Resources Observatory and Science (EROS) Center) (public domain): http://eros.usgs.gov/#

Natural Earth (public domain): http://www.naturalearthdata.com/

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors used natural language processing to categorize 311 data from NYC into different blight-related categories, and constructed models to identify blight-related calls using keywords/tokens. In my opinion, the manuscript presents an interesting and promising use of municipal open data to better understand how residents perceive and report blight in a city. However, the manuscript needs to be improved in several areas before it should be considered for publication.

First, the ACS vacancy data were analyzed as both raw counts and as percentages, and the authors found that the percentage data are likely more appropriate because the count data largely reflect the size of the census tract (p13). Given this strong rationale for analyzing data as percentages, why was the USPS/HUD data analyzed as counts but not as a percentage? The USPS/HUD data set provides total residential/business addresses in each tract, so this analysis is feasible and should be added unless there is a compelling reason not to do so.

Writing:

There is room to improve the writing and organization throughout the manuscript, but particularly in the Introduction. This will help the reader better understand the motivations and implications of the study. Some primary suggestions are as follows:

* Better transitions within and between sentences and paragraphs would be helpful. For example, on p3 para1, the colon in “…social ills: Residential overcrowding…” is not appropriate and the R should not be capitalized. On p4 para2, the sentence would read smoother as “…are highly problematic, because they rely…” On p4 para3, the colon after numerous does not work there, and I suggest placing a period after numerous to end the sentence.

* More strategic organization of text. For example, on p3 para2 the first sentence is out of order chronologically with the second sentence, and it actually seems to me like the first sentence should be dropped or moved because it doesn’t fit very well with the rest of the paragraph.

* Be more specific where possible. On p3 para 2, both ‘many aspects’ and ‘the facets of’ are vague and could be described more specifically. Other examples from the Discussion are noted below.

* Per journal instructions, please include line numbers. This makes it much easier to suggest edits to the manuscript.

Line item comments:

Abstract:

* p<0.001 instead of >

* ‘different facets’: Which facets? Can you be more specific here, or offer a couple examples?

Introduction:

* p3 para1: How can a direct quote have two sources? Cite the original source only.

* p3 para3: (physical decay) could be incorporated more smoothly into the sentence to avoid parentheses. Same with (physical) disorder on the next page.

* p4 para3: ‘affects’ not ‘affect.’

* p4 para3: References are needed after ‘well documented’, and in the following sentence. In the sentence after that, please place each reference after the specific outcome it addresses rather than grouping the references at the end.

* p5 para3: Maybe ‘validated’ is the best word here, but I wonder if ‘compared to’ isn’t a more appropriate verb. I understand model validation to be the process of making sure the model performs as expected; here, the authors use the ACS & USPS data sets to draw inferences about the relationships between blight-related calls and vacancy rates rather than using ACS & USPS data to characterize the quality of the algorithm itself. To me, this seems less like model validation and more like a statistical comparison in which the algorithm results are assumed to be valid.

Methods:

* This section is much clearer than the Introduction.

Results:

* ‘46% of token’ should be tokens

* p12 para1: missing ‘the’ in first sentence

* p12 para1: Table 1, not Table 4

* p12 para 2: Noting specific locations within NYC is only useful to those familiar with NYC because these locations are not shown on the map figure

* p13 and elsewhere: Inconsistent use of ACS, American Community Survey, and Census. Please define the abbreviation at first use and then use it consistently thereafter.

* p13 para1: correlation, not correlations

* p13 para1: why is count italicized?

Discussion:

* p14: The paragraph beginning ‘Nevertheless’ is not a complete paragraph.

* p14: change to ‘further develop our algorithm’

* p15 para1: Why would calls differ between these two places? I am not familiar with these places, so please explain why we should expect differences.

* p15 para1: Please add an explanation for why flooding the 311 lines would make estimates of blight less reliable.

* p15 para1: Please explain how blight on vacant commercial properties (which I take to be an important finding based on earlier discussion material) shows disinvestment by municipal government. And why is municipal government italicized here?

Table 1: Please provide a more descriptive caption so the table can stand alone

Fig 1: Please explain the axis ranges here. How can a word frequency go below 0%?

Fig 2: This is not a density map, so the caption should be reworded to more accurately describe the contents. In addition, the map really ought to show & label the borough outlines because boroughs are discussed in the text. As someone who is largely unfamiliar with NYC, labeling places within boroughs that are discussed in the text would be helpful for context (JFK, Brownsville, etc.).

Reviewer #2: This manuscript attempted to identify urban blights using 311 data. Particularly, several major types of "blights" were identified, and regression analysis was performed to examine the effectiveness of the model. This is a well-written and straightforward paper with contributions to the literature. My major comments are as follows.

1) Methodology: this paper does not provide a detailed algorithm for extracting the "blight" information from natural languages. As the authors claimed that one major contribution is to develop a new algorithm. The authors have specify the algorithm, and highlight their contributions.

2) Conclusions: this paper does not have a conclusion session. The authors need summarize their major results of the paper.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Jul 9;15(7):e0235227. doi: 10.1371/journal.pone.0235227.r002

Author response to Decision Letter 0


4 Jun 2020

Response to Reviewers

Comments from Editor

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. We confirmed that formatting met PLOS ONE’s style requirements.

2. PLOS requires an ORCID iD for the corresponding author in Editorial Manager on papers submitted after December 6th, 2016. Please ensure that you have an ORCID iD and that it is validated in Editorial Manager. We included the ORCID ID for the corresponding author.

3. We require you to either (1) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (2) remove the figures from your submission. We modified the figures so that we are not using any copyrighted materials.

4. The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. R script for calling the underlying 311 data is provided as an appendix to the manuscript.

Global Comments from Reviewer 1

• Writing: Better transitions within and between sentences and paragraphs would be helpful. For example, on p3 para1, the colon in “…social ills: Residential overcrowding…” is not appropriate and the R should not be capitalized. On p4 para2, the sentence would read smoother as “…are highly problematic, because they rely…” On p4 para3, the colon after numerous does not work there, and I suggest placing a period after numerous to end the sentence. Thank you to reviewer 1 for your careful review and edits. We attempted to address all of your comments, particularly those related to manuscript organization, transitions between topics, and specificity in examples. I believe though that the manuscript is in much better shape due to the changes.

Specific edits are outlined below

• More strategic organization of text. For example, on p3 para2 the first sentence is out of order chronologically with the second sentence, and it actually seems to me like the first sentence should be dropped or moved because it doesn’t fit very well with the rest of the paragraph. Specific edits are outlined below.

Comments on Introduction

• Be more specific where possible. On p3 para 2, both ‘many aspects’ and ‘the facets of’ are vague and could be described more specifically. Other examples from the Discussion are noted below We added further details on page 2, lines 28-32.

• Per journal instructions, please include line numbers. This makes it much easier to suggest edits to the manuscript. Line numbers have been added to the manuscript.

p<0.001 instead of >

‘different facets’: Which facets? Can you be more specific here, or offer a couple examples? The > sign has been correct. ‘Different facets’ have been further explicated in lines 28-32: ‘These facets include physical disorder (e.g., litter, overgrown lawns, or graffiti) and decay (e.g., vacant or abandoned lots or sidewalks in disrepair) that are manifestations of social processes such as (loss of) neighborhood cohesion, social control, collective efficacy, and anchor institutions.’

• p3 para1: How can a direct quote have two sources? Cite the original source only. The citations were corrected so that the quote has a single, correct citation.

• p3 para3: (physical decay) could be incorporated more smoothly into the sentence to avoid parentheses. Same with (physical) disorder on the next page. These paragraphs were edited to remove the use of the parenthetical.

“Standard approaches to measuring blight with secondary data sets include measures of physical decay, such as vacant commercial property, vacant housing units, and vacant lots within a neighborhood [7,8]. Other researchers have looked at measures of physical disorder, otherwise considered neighborhood quality measures, such as mown lawns, litter and debris, delinquent vehicles, and presence/absence of graffiti [7,9]. These quality-related indicators are more difficult to capture, often requiring street audits in person or virtually, using Google Street View [10,11]. Finally, given that blight is considered a physical manifestation of poor social cohesion, a third dimension of blight includes social and economic investment, which can be evaluated through measures such as perceptions of safety, presence of anchor institutions, and community organizing efforts [12].”

• p4 para3: ‘affects’ not ‘affect.’ Made this edit.

• p4 para3: References are needed after ‘well documented’, and in the following sentence. In the sentence after that, please place each reference after the specific outcome it addresses rather than grouping the references at the end. All of these citations are review papers that address all or most of the aspects referred to in the sentence. However, citations are broken out later in the paragraph.

“Though the literature specific to urban blight and public health outcomes is less developed, there is strong evidence that urban blight is specifically associated with higher violent crime and gun crime [6], poor mental health [18-19], and even adverse pregnancy outcomes [20].”

• p5 para3: Maybe ‘validated’ is the best word here, but I wonder if ‘compared to’ isn’t a more appropriate verb. I understand model validation to be the process of making sure the model performs as expected; here, the authors use the ACS & USPS data sets to draw inferences about the relationships between blight-related calls and vacancy rates rather than using ACS & USPS data to characterize the quality of the algorithm itself. To me, this seems less like model validation and more like a statistical comparison in which the algorithm results are assumed to be valid. Thank you for this point – we made this edit.

Comments on Results:

• ‘46% of token’ should be tokens Made this correction.

• p12 para1: missing ‘the’ in first sentence Made this correction.

• p12 para1: Table 1, not Table 4 Made this correction.

• p12 para 2: Noting specific locations within NYC is only useful to those familiar with NYC because these locations are not shown on the map figure Specific communities are now identified on the map and in the text.

• p13 and elsewhere: Inconsistent use of ACS, American Community Survey, and Census. Please define the abbreviation at first use and then use it consistently thereafter. These corrections have been made.

• p13 para1: correlation, not correlations This correction has been made.

• p13 para1: why is count italicized? This has been set to normal font

Comments on Discussion:

• p14: The paragraph beginning ‘Nevertheless’ is not a complete paragraph. This sentence was combined with the previous paragraph.

• p14: change to ‘further develop our algorithm’ This change was made.

• P15 para1: Why would calls differ between these two places? I am not familiar with these places, so please explain why we should expect differences. Additional details were included to explain why these communities may differ in their propensity to call 311.

“These findings are subject to a series of limitations. Most notably, our analysis does not control for a neighborhood’s propensity to call 311. As Weaver and Bagchi-Sen note, urban blight can be considered to represent a threshold of ‘non-acceptance,’ or the point at which community residents find that neighborhood quality has fallen below community-specific norms [9]. These norms are highly variable across neighborhoods, so sidewalk damage on the Upper East Side of Manhattan, which is one of the more affluent communities in New York City, may elicit many more 311 calls relative to a similar condition in Brownsville, Brooklyn, which is a predominantly low-income neighborhood with a high density of public housing.”

• p15 para1: Please add an explanation for why flooding the 311 lines would make estimates of blight less reliable The text has been revised to better address this question.

“Such engagement, while laudable, could make 311-based estimates of urban blight less reliable. If researchers use historical trends to identify propensities for calling 311, a sudden spike in 311 engagement may erroneously indicate rapid deterioration of the neighborhood’s built environment.”

• p15 para1: Please explain how blight on vacant commercial properties (which I take to be an important finding based on earlier discussion material) shows disinvestment by municipal government. And why is municipal government italicized here? This text has been updated to better represent the pattern of disinvestment. Italics have been removed from ‘municipal’

“Placing blame or responsibility on residents without acknowledging that municipal government and the private sector often resist investing in less-affluent, majority-minority neighborhoods only reinforces a cycle of continued disinvestment.”

Comments on Tables and Figures

• Table 1: Please provide a more descriptive caption so the table can stand alone The caption has been changed to “Table 1. Sensitivity, specificity, and accuracy of urban blight algorithm.”

• Fig 1: Please explain the axis ranges here. How can a word frequency go below 0%? This figure has been removed because it has limited added value to the paper and its findings. The text has been modified to the following:

“The remaining 17% (182) were found in both blight and non-blight records. Of the 182 tokens that appeared in both ‘blight’ and ‘non-blight’ calls, many appeared in roughly the same proportion in both categories, so we chose to restrict our analysis to the 46% of tokens found exclusively in blight-related calls.”

• Fig 2: This is not a density map, so the caption should be reworded to more accurately describe the contents. In addition, the map really ought to show & label the borough outlines because boroughs are discussed in the text. As someone who is largely unfamiliar with NYC, labeling places within boroughs that are discussed in the text would be helpful for context (JFK, Brownsville, etc.). The map has been relabeled as a choropleth map, and areas of the city that are discussed in the text are indicated on the map as well.

Global Comments from Reviewer 2

• Methodology: this paper does not provide a detailed algorithm for extracting the "blight" information from natural languages. As the authors claimed that one major contribution is to develop a new algorithm. The authors have specify the algorithm, and highlight their contributions We described our strategy for identifying blight information from 311 text in the following text:

“Training the data requires the manual designation of a random sample of calls to an urban blight versus a non-urban blight category. We established seven domains of urban blight based on extant literature to guide our data training: social conditions, abandoned property, air quality, street/sidewalk maintenance, noise, sanitary conditions, and building safety. Coding any call into one of these categories signifies the call is urban blight-related. Domains were assigned based on complaint types and free-text call descriptions in the 311 data. Complaint type is a variable designated by the City of New York and included as part of the 311 system data, which comprises 236 complaint types. We focused on ‘high frequency’ complaints (≥ 1,000 records) to simplify the training process. The high frequency list included 93 complaint types.”

• Conclusions: this paper does not have a conclusion session. The authors need summarize their major results of the paper. The Discussion section was expanded and used to create a “Conclusions” section, which starts at line 308.

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 1

Changshan Wu

11 Jun 2020

Using 311 data to develop an algorithm to identify urban blight for public health improvement

PONE-D-20-05006R1

Dear Dr. Athens,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Changshan Wu

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: One very minor remaining issue: update figure numbers in the text, because figure 1 was deleted.

Also, consider changing the color ramp on the map. It seems strange to have low percentages of blight shown as red, as red is typically used to connote warning and draw the eye.

Reviewer #2: The authors have successfully addressed my comments. I am happy to see this manuscript to be accepted for publication.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Acceptance letter

Changshan Wu

25 Jun 2020

PONE-D-20-05006R1

Using 311 data to develop an algorithm to identify urban blight for public health improvement

Dear Dr. Athens:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Changshan Wu

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Attachment

    Submitted filename: Response to Reviewers.docx

    Data Availability Statement

    Data for New York City 311 Service Requests from 2010 to Present are available for download at https://data.cityofnewyork.us/Social-Services/311-Service-Requests-from-2010-to-Present/erm2-nwe9. Data were loaded into RStudio (version 3.5.1) via the RSocrata library using the following code. # Install RSocrata library library(RSocrata) register_google(key="[include user-specific key]") # Call in NYC data for 01-01-2018 through 06-30-2018 nyc <- as.data.frame(read.socrata("https://data.cityofnewyork.us/resource/fhrw-4uyv.json?$where=created_date between '2018-01-01T12:00:00' and '2018-06-30T23:59:59'")).


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES