Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Mar 26.
Published in final edited form as: IEEE Access. 2019 Dec 16;8:6407–6416. doi: 10.1109/access.2019.2960010

Analyzing Associations Between Chronic Disease Prevalence and Neighborhood Quality Through Google Street View Images

MEHRAN JAVANMARDI 1, DINA HUANG 2, PALLAVI DWIVEDI 2, SAHIL KHANNA 3, KIM BRUNISHOLZ 4, ROSS WHITAKER 1, QUYNH NGUYEN 2, TOLGA TASDIZEN 1
PMCID: PMC7996469  NIHMSID: NIHMS1549873  PMID: 33777591

Abstract

Deep learning and, specifically, convoltional neural networks (CNN) represent a class of powerful models that facilitate the understanding of many problems in computer vision. When combined with a reasonable amount of data, CNNs can outperform traditional models for many tasks, including image classification. In this work, we utilize these powerful tools with imagery data collected through Google Street View images to perform virtual audits of neighborhood characteristics. We further investigate different architectures for chronic disease prevalence regression through networks that are applied to sets of images rather than single images. We show quantitative results and demonstrate that our proposed architectures outperform the traditional regression approaches.

Keywords: Set Regression, Multi-Task Learning, Permutation Invariant Network, Chronic Disease Prevalence, Google Street View Images

I. INTRODUCTION

Deep convolutional neural networks have been shown to be powerful tools to model sensory data such as speech, images, videos, etc. CNN have been extensively used in the field of computer vision for different tasks, including image classification, image segmentation, and object detection. However, the application of these networks is not limited to a specific field as they have been used in many fields of science to automate or facilitate multiple tasks that traditionally used to be performed manually and at a significant cost. One instance of the application of CNN is in neighborhood research where they accommodates virtual neighborhood audits. In this work, we address how virtual neighborhood audits can be accomplished through Google Street View images and CNN. Moreover, we leverage the data from the neighborhood virtual audits to examine associations between neighborhood environmental features and chronic disease prevalence.

Neighborhood research has become a fast growing field in the scientific community because we have realized that the environment people live in has a direct influence on their health. Previous research has found associations between neighborhood quality and mortality risk [24], [51], [68], [70], [75], life expectancy [16], mental health [69], self-related health, obesity [7], [33], [52], and diabetes [29], [44] - even after adjusting for individual characteristics of the subjects. Neighborhoods can impact health through multiple pathways. First, disadvantaged neighborhoods may have fewer resources that support physical activity and healthy diets. Poor access to healthy food [15], [50], [73], the presence of fast food chains [9], and the lack of recreational facilities [10], [61] all correlate with higher obesity, diabetes, and blood pressure rates.

Second, neighborhoods may promote poor health through psycho-social pathways. Living in neighborhoods that are unclean, noisy, and violent can be psychologically harmful, through over-activation of the stress response [45]. Negative emotions over time can damage biological systems and lead to obesity, heart disease, diabetes, stroke, and declines in cognitive function [65]. Chronic anxiety and stress can disrupt cardiac function by altering the heart’s electrical stability, promoting atherosclerosis, and increasing inflammation [8].

Indicators of physical disorder (litter, graffiti, unclean parks, streets and sidewalks) have been linked with poorer self-rated health, higher risk of mental health issues (distress [34], depression [26], anxiety [12]), substance abuse, and mortality [25]. Research has also connected physical disorder and physical health outcomes, including obesity [46]. Physical disorder may increase physiology distress which can contribute to poor diet, inadequate sleep, and irregular exercise, thereby leading to worse physical health outcomes [11].

Examining neighborhoods in Seattle, San Diego, and Baltimore, Thorton and colleagues found that neighborhoods with lower socioeconomic status and higher proportions of racial and ethnic groups had poorer aesthetics (e.g., unmaintained buildings, graffiti, broken windows, and litter). Conversely, they found that some neighborhoods with higher socioeconomic status had better pedestrian amenities in terms of sidewalks, crosswalks, and intersection-control features. However, because the study was conducted in only a few cities, the authors caution that generalizability to other locations may be a concern [67]. In other research, Duncan and colleagues demonstrated that neighborhood walkability increased walking among adults in Paris [23], and Rundle and colleagues found that neighborhood walkability was associated with residents’ weekly physical activity and obesity-related health conditions in New York City [63], [64].

As research has corroborated, the quality of the neighborhood where people live is highly associated with different chronic diseases, such as obesity, diabetes, and high blood pressure. In order to have a prospering and healthier society, it is imperative to study what characteristics of the neighborhoods can influence the prevalence of these chronic diseases. The first step in this study is to determine neighborhoods where these chronic diseases are more prevalent than in others. The focus of this work without loss of generality is on obesity as one of the most prevalent and detrimental chronic diseases. We will show how Google Street View (GSV) images can be utilized as an important resource to red-flag neighborhoods with the potential for a high risk of obesity.

Over 35% (78 million) of U.S. adults are obese [57]. Obesity is linked with mortality, morbidity, and reduced life expectancy. Comorbidities include type II diabetes, cancers, cardiovascular disease, sleep disorders, and hypertension. Chronic conditions such as obesity are the main drivers of mortality in the U.S, and they endanger the nation’s health, economic strength, and national security (by reducing the number of physically fit people who qualify for military service) [14]. Health care expenditures due to obesity, hypertension, and diabetes have been estimated at $70 billion [18], $110 billion, and $54 billion [22], respectively. These costs are compounded by lost productivity and absenteeism. Additionally, these major chronic conditions are concentrated among poor families and in poor neighborhoods, contributing to health disparities [41]. Numerous investigations have examined individual characteristics and behaviors, but researchers have only begun to establish contextual or structural factors that inhibit or encourage chronic disease health. Genetic variation cannot explain the epidemic rise in obesity and related chronic diseases in the past 20 years [39]. There is a pressing need to investigate societal and cultural processes [11].

II. RELATED WORK

Several studies have used Google Street View images as a resource for virtual audits of neighborhoods. In this section, we review some of the most prominent work in this area and discuss the drawbacks that come with current methods. We will also review related works regarding the invariance of deep networks with respect to group actions.

A. Utilizing Google Street View Images

Google Street View (GSV) images represent a massive data source that can be utilized in characterizing neighborhood built environment. As a virtual neighborhood audit tool, GSV has been validated and proved to be more cost-effective than traditional in-person audits [4], [13], [30], [36], [47], [48], [56], [63], [74]. GSV has also been used in neighborhood research to validate existing neighborhood measurement tools [1], [37], [53]. In addition, GSV cars have been used as a platform for spatial mapping in air pollution studies [2], [3]. Using GSV cars provides possibilities to map street-level traffic-related air pollution within neighborhoods at greater precision [2].

Bader and colleagues developed a Computer Assisted Neighborhood Visual Assessment System (CANVAS) as an innovative platform to perform virtual neighborhood audits [5]. CANVAS is an online application that can interact with Google Application Programming Interface (API) to collect GSV images. Neighborhood auditing toolkits, including the Irvine-Minnesota Inventory, the Pedestrian Environment Data Scan (PEDS), and the Maryland Inventory of Urban Design Qualities (MIUDQ), have been built within CANVAS to help auditors improve the reliability of GSV image labeling. However, CANVAS still relies on human labeling, making it difficult to perform large-scale neighborhood characterization. Mooney et al. utilized CANVAS for a virtual neighborhood audit of 532 intersections within New York City [49]. They found presence of crosswalks, pedestrian signals, nearby billboards, and bus stops were associated with increased pedestrian injuries at street intersections. In addition to CANVAS, the Forty Area Study Street View (FASTVIEW) and SPOTLIGHT-Virtual Audit Tool (S-VAT) are both GSV-based virtual audit tools [6], [28]. A prior study found neighborhood characteristics derived from S-VAT were not associated with total sedentary time for 219 Dutch and 128 Belgian adults who lived in 24 neighborhoods in ”Sustainable prevention of obesity through integrated strategies” (SPOTLIGHT) study [19]. We are not aware of any studies have used neighborhood characteristics derived from FASTVIEW to predict health outcomes or health behaviors.

Lu recently published a study assessing the association between urban greenness derived from GSV and walking behavior in Hong Kong [43]. In this study, deep learning of GSV images was used and greenness was assessed by the Pyramid Scene Parsing Network (PSPNet) [80], which has a high pixel-wise accuracy. Urban greenness was associated with increased walking time in both 400m and 800m buffers. This study takes advantage of publicly available GSV data and the innovative scene segmentation techniques to characterize urban greenness for each participant.

Li and colleagues derived a vegetation index from GSV images through an object-based image analysis [42]. Each image was segmented into homogeneous polygons, and each polygon was then assigned to different feature classes based on the spectral and geometrical properties [76]. The Otsu algorithm was used to optimize the threshold used for differentiating greenness versus non-greenness [58]. Li’s research was more focused on urban landscape planning than on health outcomes research.

Villeneuve suggested GSV measured greenness (GVI) was associated with hours of recreational activity in the summer. Individuals living in neighborhoods with the highest quartile of GVI had, on average, 18.1 hours of recreational activity time every week compared to 12.7 hours for those living in neighborhoods with the lowest quartile of GVI. GVI was not associated with the physical component summary (PCS) score or mental component summary (MCS) score.

Yin and colleagues detected pedestrians using the aggregated channel feature (ACF) algorithm [77]. They collected training data using a camera in a traveling car. The study found good consistency between automatically labeled GSV and human labeled GSV; however, the automatically characterized neighborhood feature was limited to pedestrians and was not linked to health outcomes. Yin et. al. also characterized neighborhood walkability using artificial neural networks and support vector machines [78].

B. Gaps in Existing GSV Studies

Most studies using Google Street View images have focused on assessing the agreement between virtual audits using GSV images and an in-person field audit. Very few studies have linked neighborhood characteristics captured from GSV to health outcomes. Mooney and colleagues found presence of crosswalk, pedestrian signals, nearby billboards, and bus stops were associated with increased pedestrian injuries at street intersections. However, the study only covered 532 intersection points in New York City and thus yielded limited generalizability. Virtual audits are cost-effective compared to traditional field visits. However, human labeling in virtual audits is still time-consuming when the study requires greater geographical coverage. Existing studies using innovative techniques to construct neighborhood characteristics mainly have focused on neighborhood greenness. Methods to construct neighborhood walkability, neighborhood safety, and neighborhood pleasurability from GSV images are under-developed. In addition, associations between neighborhood characteristics derived from GSV and chronic outcomes are under-studied.

Our research group conducted a pilot study using computer vision models to automatically label neighborhood characteristics including neighborhood greenness, crosswalks, and commercial building in Salt Lake City, Chicago, and Charleston [55]. We sampled images from all road intersections and along street segments at points that were 50m apart. We accessed GSV images from these points using Google’s Street View Image API. Deep convolutional neural networks (CNN) were trained to label unseen images [38]; more details of this approach is described in section III-A. Neighborhood characteristics were aggregated at the zip code level and were then merged to individual-level health data. We found individuals living in neighborhoods (zip codes) with the highest level of greenness, crosswalks, and commercial building had 25%–28% lower prevalence of obesity and 12%–18% lower prevalence of diabetes compared to individuals living in neighborhoods with the lowest levels of these neighborhood characteristics.

We subsequently collected street intersections of all the roads across the United States and retrieved GSV images at these intersections using Google’s Street View API. We randomly sampled two-thirds of the counties and used Google’s Vision API, Out-Of-Box, to perform neighborhood characterization. We obtained 10 neighborhood characteristics using Google’s Vision API and found that greater presence of highways was associated with lower prevalence of chronic conditions and premature mortality at the county level [54]. Individuals living in rural areas (identified from GSV) had a higher prevalence of chronic outcomes, premature mortality, physical distress, physical inactivity, and teen birth rates but a lower prevalence of excessive drinking [54].

C. Equivariance and Invariance in Deep Networks

In this paper, we discuss methods we utilized to further process Google Street View images for automated neighborhood characterization and chronic disease prevalence regression. The networks we use to regress the prevalence rates are required to be permutation invariant regarding their inputs. In the literature, we can define group actions on input/output data of a neural layer as a family of transformations on the input/output. Some instances of these group actions are rotation, translation, permutation, and so on. If this family of transformations on the input data does not change the output of the layer, this neural layer is called ‘Invariant’ to this action. However, if this action on the input transforms the output of the layer in a predictable way, this neural layer is called ‘Equivariant’ to this action.

Gens et al. [27] address the problem of symmetry groups for object detection by introducing a deep symmetry network where they generalize convnets to form feature maps over arbitrary symmetry groups. Cohen et al. [17] introduce group equivariant convolutional neural networks (G-CNN). G-CNN increase the expressive capacity of the network by exploiting the symmetries without increasing the number of parameters in the network. Ravanbakhsh et al. [60] address the equivariance of a network through its parameters. The proposed approach relates the equivariance properties of the neural layer to the symmetries of its parameter matrix. Guttenberg et al. [31] introduce a permutation invariant network to predict trajectories of sets of interacting objects. Vinyals et al. [72] address the problem of equivariance by introducing a ‘good’ ordering of the inputs. In section III-B, we will analyze two popular approaches to make our architecture permutation invariant, which is a requirement for set regression. We will further combine our permutation invariant network with a regular (single image input) network in section III-C.

III. APPROACH

In this section, we will discuss the networks and architectures that make analysis of relations between neighborhood quality and chronic disease prevalence possible. In section III-A, we will discuss how computer vision helps automate the built environmental feature classification process where millions of images are classified according to the characteristics of the neighborhood they visually represent. In section III-B, we will investigate a regression model designed to work on sets of images instead of single input images to predict chronic disease prevalence. In this setting, the network will predict a prevalence index for a given set of images corresponding to a single tract. Finally, in section III-C, we will combine models from sections III-A and III-B in a multi-tasking scenario to further increase the precision with which we can predict prevalence rates.

A. Built Environmental Feature Classification

Public health scientists are interested in specific indicators in neighborhoods such as whether the neighborhood has plants, trees, and green areas or if the community is comprised of single family houses or more apartment-based housing. These indicators are called built environmental features. Such indicators can be recognized by looking at the Google Street View images of a specific neighborhood, but since these images are in the order of millions, manual annotation of these indicators is not feasible and automation is required. In order to implement this idea, we use powerful CNN-based [40] classifiers that have shown superior performance compared to more traditional classifiers like SVMs [20].

The network used for built environmental feature classification is shown in Figure 1. The built environmental features we are interested in include i) whether more than 30 percent of the images is comprised of green space and street landscaping (a binary label of 0 and 1 is assigned to presence or absence of the features), ii) whether the neighborhood is a single family house community, and finally iii) whether crosswalks are present in the neighborhood.

Fig. 1.

Fig. 1.

Model described in section III-A. Each sample in this setting is a single image accompanied by three labels corresponding to i) greenness, ii) presence of crosswalks, and iii) type of housing. The feature extractor is VGG19 and is pretrained with ImageNet data. Each feature classifier is a single fully connected layer and the losses are cross-entropy loss. The final loss for optimization is the summation of all three losses.

As can be observed in Figure 1, the network is composed of two main parts, a feature extractor network that extracts visual features from GSV images and a feature classifier network that assigns a binary label of 0 or 1 to each single image for a specific indicator. This predicted label represents whether the corresponding indicator is present in the image. Note that for each indicator mentioned above, we will use a separate classifier network, but the feature extractor part of the network is shared among all the feature classifiers. We notice that sharing the feature extractor among these three results in a slight performance gain of the network as well as a reduction in training time. We use the VGG19 [66] network as our feature extractor network and a single fully connected layer as our feature classifier network. We have also experimented with ResNet [32] as our feature extractor network, however, we did not observe any significant differences between VGG19 and ResNet. For each single image corresponding to a specific neighborhood, we will predict three labels corresponding to the three introduced indicators. For a single given tract, all images corresponding to that tract will be assigned a label for these three indicators. Furthermore these predictions will be aggregated for each indicator and reported as percentile information, i.e., what percentage of the images in a tract contain a specific indicator. This percentile information will be further used in a linear regression model to associate built environmental features with individual chronic disease prevalence rates.

In order to predict the chronic prevalence rates for a new given tract, Google Street View images corresponding to that tract are collected. All the image indicators are predicted and aggregated and the prevalence rate is predicted through a linear regression model. This approach has a significant drawback, and that is learning the built environmental feature classifier. This network is rather large, containing millions of parameters to learn. In order to learn this network, a large amount of manual annotations is needed, which is a costly and time-consuming process. To obviate the need for annotations, we propose a regression network that directly predicts the chronic disease prevalence rates from the GSV images.

B. Chronic Disease Prevalence Regression

In order to be able to directly predict prevalence rates from GSV images at the tract level, we need to modify our current network. The prospective network needs to take as input a set of images rather than single images. Note that this network needs to be permutation invariant regarding its inputs. In other words, changing the ordering of the input set should not change the prediction result for that specific set. To make the network permutation invariant, we consider two popular approaches in the literature.

1). Permutation Invariance by Ordering:

Having a set as an input to a network is not a new idea in computer vision. This concept has been the focus of study for many different tasks such as video classification [35] and point cloud segmentation [59]. These works handle the problem of permutation invariance by utilizing a fixed ordering of the inputs based on a measure characterized by the inputs. For instance, for video classification, the visual features are extracted from each frame and concatenated in chronological order before being fed to the classifier. Inspired by this work, we propose to extract visual features from all the images in the set, sort these features based on some fixed measure, concatenate all the feature vectors accordingly, and generate the final tract representative feature vector. Further, this representative feature can be used in the regressor network to predict prevalence rates for a given tract. The network can be seen in Figure 2.

Fig. 2.

Fig. 2.

The proposed model in section III-B1. Each sample in this setting is a tract, i.e., the input is the set of corresponding images to that tract and the prevalence rate for that specific tract. The feature extractor is a pretrained VGG19 on ImageNet. The resize block is a single fully connected layer to decrease the dimensionality of feature vectors corresponding to single images. The downsized feature vectors are sorted and concatenated based on the intensity of their corresponding image. Finally, the tract representative feature is fed to a regressor network to predict prevalence rate. Final loss is the mean squared error.

This approach comes with two disadvantages. First, in order to sort the input set, we need to map the high-dimensional images into a 1-D real line. This transformation is very unstable to small perturbations since we are reducing the dimension of the input significantly, and therefore no type of sorting exists that is stable when prone to noise and small perturbations. The second problem with this approach is that the concatenation of all the feature vectors corresponding to a set will result in a large and arbitrary size of the representative feature vector for the set. Therefore, the input sets could have inconsistent input sizes. This large and arbitrary size feature vector is not very suitable for training as it can result in very long training times.

2). Permutation Invariance by Aggregation:

Deep sets has been introduced by Zaheer et al. [79] in order to manage networks that take a set of inputs rather than individual inputs. Zaheer et al. theorize that a function f(X) operating on the set X where X is countable is a valid permutation invariant set function if and only if it can be decomposed as

f(X)=ρ(xXϕ(x))

for suitable transformation functions ρ and ϕ. Motivated by the decomposition of a permutation invariant set function, we design our network accordingly. The first step is to find the mappings of all images in the input set to the feature space, which corresponds to ϕ in the decomposition function. This transformation is handled by the feature extractor in Figure 3. The next step corresponding to Σ in the decomposition function is to aggregate the mappings in the feature space; This aggregation could be any commutative operand, such as a simple summation. This aggregated feature vector is the representative mapping for the entirety of the input set. Finally, we need to predict the output of the network by feeding the input set representative feature vector into the regressor network. The regressor network in Figure 3 is the counterpart of the ρ transformation in the decomposition function.

Fig. 3.

Fig. 3.

Model described in section III-B2. Each sample in this setting is a tract, i.e., the input is the set of corresponding images from that tract and the prevalence rate for that specific tract. The feature extractor is a pretrained VGG19 on ImageNet. The feature vectors of the images in the input set are averaged to generate a tract representative feature, which is fed to the regressor network to predict the prevalence rate. The final loss is the mean squared error.

The choice of the aggregation function for the network is not critical since the whole network is trained end to end, i.e. the feature extractor will adapt accordingly to any choice of commutative operand as the aggregation function.

C. Hybrid Model

The network proposed in the former section obviates the need for manual annotations to learn the network, however, at the same time, this elimination also discards the ability of the model to directly associate built environmental features with chronic disease prevalence. Public health scientists are interested in finding associations between built environmental features and chronic disease prevalence in different neighborhoods. In order to introduce the interpretability to the model, in the last section we modified the network in a multi-tasking paradigm. We combine models in sections III-A and III-B2. We choose the model in section III-B2 over III-B1 because of the technical difficulties associated with it, although it shows slightly better performance. As mentioned before, the model in section III-B2 is massive because of the rather large tract representative feature vector, and therefore, combining it with the model in section III-A will result in a super-large model that we cannot accommodate on our commercial GPUs. In this framework, as can be seen in Figure 4, the feature extractor is influenced by two losses. The first one optimizes the feature extractor to generate features that are more suitable for built environmental feature classification, and the second loss promotes features that are more proportionate to the task of prevalence regression. As we will observe in the experiment section, optimizing these two losses at the same time not only enables more interpretability in the network, but also improves the performance of the network overall. For a more comprehensive study on multi-tasking in deep learning, refer to [62].

Fig. 4.

Fig. 4.

Model described in section III-C. Each sample in this setting is a tract, i.e., the input is the set of corresponding images from that tract, and the prevalence rate for that specific tract. For all images in a tract we have three labels corresponding to the introduced built environmental features. Note that only a small portion of the images are annotated for this purpose, and therefore the classification loss is not backpropagated for unlabeled images. The feature extractor is a pretrained VGG19 on ImageNet. Each feature classifier is a single fully connected layer, and the losses 1 to 3 are cross-entropy loss. The regression loss is the mean squared error.

1). Joint Hybrid Model:

The hybrid model can be considered a combination of models in sections III-A and III-B2. In model III-A the regressor (GLM) sees the aggregated indicators for a tract to predict the prevalence rates, however in model III-B2 the aggregated feature from the feature extractor are directly fed to the regressor. Although the input to the regressor part of the models are different, but both seem to be informative. Joining the aggregated indicators from the feature classifiers with the aggregated features from the feature extractor before feeding them to the regressor seems a reasonable next step for this model. However we notice that this approach does not seem to improve accuracy of the network. This can be due to the fact that the features immediately after the feature extractor can be considered low-level features for this task as the aggregated indicators from the feature classifier could be considered high-level. Concatenating these two vectors and feeding them to the regression part of the model is not reasonable. However if we break down the regressor network in model III-C and concatenate the aggregated indicators with the features from the layer immediate to last in the regressor, which contain high-level information about the tract, can increase the accuracy of the model. This model is depicted in Figure 5.

Fig. 5.

Fig. 5.

Model described in section III-C1. This hybrid model benefits from both the aggregated indicators from the built environmental feature classifiers and the aggregated features directly from the feature extractor. The three losses for feature classification are not shown in this graph for simplicity reasons. Note that the yellow blocks shown in the regressor represent feature maps and not fully connected layers.

IV. DATA COLLECTION

We obtained roadway data for all road types across the United States using 2017 Census Topologically Integrated Geographic Encoding and Referencing (TIGER). Street centerlines and street intersections were identified using PostGIS plugin built within PostgreSQL (an open-source object-relational database system). PostGIS is a spatial extension that allows location queries to be performed in SQL.

We obtained GSV images at all street intersections across the US using Google Street View’s Application Programming Interface (API). We collected images with a resolution of 6400×440 pixels and collected images from all four directions (the direction the camera is facing: 0 = north, 90 = east, 180 = south and 270 = west) of each intersection, allowing us to fully capture the neighborhood features at each street intersection point.

We collected over 31 million GSV images from December 15, 2017 to May 14, 2018 using Google API. These 31 million images correspond to 53,921 tracts (neighborhoods) in the United States. On average, more than 500 images are collected for each tract.

V. EXPERIMENTATION

In this section, we discuss quantitative results produced by the different architectures discussed in section III as well as technical details corresponding to each model. Since this problem is naturally a regression problem, we use ‘coefficient of determination’, R2 for quantitative comparison. Considering a dataset with n values {(xi,yi)}i=1:n with predictions {(xi,y^i)}i=1:n, the coefficient of determination can be evaluated as

R2=1SSresSStot

the residual sum of squares is calculated as SSres=1n(yiy¯)2, and the total sum of squares is SStot=1n(y^iy¯)2 where y¯=1n1nyi. values range between 0 and 1, where a higher score demonstrates that the model better represents the variation of the data compared to a simple averaging as prediction. The R2 is interpreted as the proportion of variance in the dependent variable that is predicted by the independent variable.

A. Built Environmental Feature Classification

1). Model Training:

The model introduced in section III-A and demonstrated in Figure 1 is trained in two steps. First, the feature extractor and feature classifiers in Figure 1 need to be trained. Second, a generalized linear model (GLM) is trained to use the aggregated predictions of the feature classifiers and predict the prevalence rates for each tract. Note that in the first step image labels are assigned to all the images, and then the images are categorized according to the tract they belong to and aggregated values for each indicator are calculated. Further, the aggregated indicator values are used to train a linear regression model to predict prevalence rates.

In order to train the feature extractor and feature classifiers, we need annotated data for each indicator introduced in section III-A. Our collaborators have manually annotated images from two cities, Chicago, Illinois and Charleston, South Carolina. About 15,000 images were manually annotated for three indicators to learn network weights in the first step. However, the network used for this purpose is rather large and the amount of annotated data is not enough to learn the network accurately. In order to be able to learn this network, we initialize the feature extractor network from a pre-trained VGG19 network on the ImageNet classification dataset [21] classification dataset. We alter the last layer of the VGG19 network such that it generates a 1-D feature vector of size 4096 to be fed into feature classifiers. Each feature classifier is comprised of one fully connected layer that is learned from scratch.

2). Results:

We use the 500 cities dataset [71] released by the Centers for Disease Control and Prevention (CDC). This dataset reports prevalence rates for 27 chronic disease measures, including health outcomes (e.g., asthma, diabetes, cancer), public health prevention metrics (e.g., doctor and dentist visits and cholesterol screenings), and health behaviors (e.g., smoking and physical activity), at the city and census tract level for 500 cities using small-area estimation methods.

The cities selected include the 497 largest cities in the U.S., plus Burlington, VT, Charleston, WV, and Cheyenne, WY, so that all 50 states are represented. The project’s website features reports and interactive maps on the included health measures, documentation on methodology, and downloadable datasets from 2015 to 2018. The data provide estimates on the 27 measures for the 500 cities, as well as for approximately 28,000 census tracts. The sources used to generate the estimates include the Behavioral Risk Factor Surveillance System (BRFSS), Census Bureau 2010 census population data, and American Community Survey (ACS) five-year estimates. In our analysis, we examine obesity rates for 19,562 census tracts that have obesity rates and for which we have Google Street View iamges. These 19,562 tracts correspond to 7,246,783 images collected from Google Street View. We use 15,000 tracts and their corresponding images for training and the remaining 4,562 tracts for validation. The model in section III-A results in a R2 measure of 0.06. The bottleneck for this model is that the regression model only sees aggregated indicator values, however if we are able to use more informative features from the feature extractor, we could improve the accuracy of our predictions. The model in section III-B is designed such that it utilizes more informative features for prevalence regression.

B. Chronic Disease Prevalence Regression

The model introduced in section III-B benefits from the fact that it does not need any manual annotation for built indicators. This is important not only because annotating data is time-consuming and costly but also it allows for an automated prevalence regression with minimal human interaction. Technical details and quantitative results for both approaches introduced in section III-B are described below.

1). Permutation Invariace by Ordering:

The model in section III-B1 utilizes an internal ordering module that makes the network permutation invariant. This internal ordering is based on the average intensity of input images. Basically, a set of images is fed to the feature extractor, with corresponding 4096 dimensional features produced and resized to a smaller 128 dimension through a fully connected layer. These feature vectors are then sorted according to the input image intensities and concatenated to generate the tract representative feature vector. This final feature vector is fed to a regressor to predict prevalence rates. Note that resizing the 4096 dimensional feature vector is critical since concatenating all feature vectors with large dimensions will result in a very large network that cannot be accommodated on commercial GPUs. Although resizing the feature vectors to a smaller dimension helps with training time and alleviates memory burden on GPUs, it is still a training bottleneck for the network. Training time for this network is not optimum because the tract representative feature vector is too large, and the internal ordering of the inputs needs to be performed for each tract, which can be cumbersome. Another issue with this model is its incapability to handle arbitrary sizes of input sets. As can be observed from Figure 2, different input sizes for each tract will result in different tract representative feature vector sizes, which is problematic. To overcome this problem, we fix the size of the input set to 64. If a tract has fewer images, it will be repeated to construct the input set. For tracts with more than 64 images, we randomly sub-sample the images for each iteration. In order to discard any randomness in our experiments, we train five different models and report the average and standard variation of each model. We achieve an R2 measure of 0.6404 and a standard variation of 0.0024 with this architecture.

2). Permutation Invariace by Aggregation:

The model proposed in section III-B2 uses aggregation to take care of the required permutation invariance unlike model III-B1 where it takes advantage of sorting and concatenating the features from each image. This aggregation will result in a smaller tract representative feature which enables faster training time. We choose a simple averaging as aggregation function.

Another technical benefit of this architecture is its ability to handle arbitrary sizes of input sets. The GSV images we have collected ranges from a minimum of 1 image per tract to a maximum of 5,496 images per tract. Not considering the limited memory of commercial GPUs, this model can handle different size of input sets. However using the entirety of images in large tracts is not practically feasible as we cannot fit them into GPU memory. In order to obviate this problem we randomly sub-sample images from tracts with a size larger than 64. This approach results in an R2 measure of 0.6306 and standard variation of 0.0034. As can be observed there is a significant improvement in accuracy compared to the model in section III-A.

C. Hybrid Model

The hybrid model introduced in section III-C is designed in a multi-tasking paradigm. In this architecture, the main loss is the mean squared error for the regression loss, but by introducing the classification loss for the built environmental feature classifiers, we are forcing the feature extractor to generate features that are more informative for both tasks. This will result in feature extractor, deriving representations that not only are informative enough for the task of prevalence regression but also are valid for built environmental feature classification. Built environmental features are indicators that are deliberately chosen by public health scientists and are thought to be strongly associated with chronic diseases. Therefore, by applying the classification loss we expect the generated representations to be more explanatory than the representations generated in model III-B2, which will result in a better performance of the hybrid model.

Multi-tasking can increase the performance of the model for several reasons. One can consider the auxiliary task (in this case, the built environmental feature classification) as a regularizer to the network that places an informative prior on the model. Using this auxiliary task can help with faster and better convergence, because it reduces the representation manifold to the intersection of the manifolds generated by two tasks. Intuitively, if a representation is knowledgeable for multiple tasks, we can say it is an informative representation. By adding this auxiliary task, we are able to improve the performance of our model significantly from 0.6306 in model III-B2 to a 0.6906 for the coefficient of determination. The joint model in section III-C1 achieves the highest R2 of 0.7040. A full comparison of the quantitative results for all the proposed architectures is given in Table I.

TABLE I.

QUANTITATIVE RESULTS CORRESPONDING TO THE APPROACHES PROPOSED IN SECTION III. EACH EXPERIMENT IS REPEATED FIVE TIMES, AND THE AVERAGE AND STANDARD VARIATION ARE REPORTED. WE USE THE COEFFICIENT OF DETERMINATION AS OUR EVALUATION METRIC.

Approach Coefficient of Determination (R2)
Section III-A 0.0600±0.0001
Section III-B1 0.6404±0.0024
Section III-B2 0.6306±0.0261
Section III-C 0.6906±0.0034
Section III-C1 0.7040±0.0023

VI. CONCLUSION

In summary, this research makes significant, relevant contributions to the field of neighborhood research because i) neighborhood environments are increasingly linked to an array of important health outcomes, and ii) this project addresses the limits to research resulting from the lack of neighborhood data by providing new, cost-efficient data resources and methods for characterizing neighborhoods. We significantly contribute to the field by creating national data resources for large-scale examination of neighborhood effects on health. Analyzing the findings of this study may identify community design and public policy as possible levers of change for improving population health. We proposed and evaluated regression models that take as input sets of images as input instead of single entry inputs. Further we combined set regression models with single entry input image classifiers i our hybrid models to increase the overall performance of our regression model.

VII. ACKNOWLEDGEMENT

This study was supported by the National Library of Medicine of the National Institutes of Health under Award Number R01LM012849.

REFERENCES

  • [1].Joel Adu-Brimpong Nathan Coffey, Ayers Colby, Berrigan David, Yingling Leah, Thomas Samantha, Mitchell Valerie, Ahuja Chaarushi, Rivers Joshua, Hartz Jacob, et al. Optimizing scoring and sampling methods for assessing built neighborhood environment quality in residential areas. International journal of environmental research and public health, 14(3):273, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Stacey E Alexeeff Ananya Roy, Shan Jun, Liu Xi, Messier Kyle, Joshua S Apte Christopher Portier, Sidney Stephen, and Stephen K Van Den Eeden. High-resolution mapping of traffic related air pollution with google street view cars and incidence of cardiovascular events within neighborhoods in oakland, ca. Environmental Health, 17(1):38, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Apte Joshua S, Messier Kyle P, Gani Shahzad, Brauer Michael, Kirchstetter Thomas W, Lunden Melissa M, Marshall Julian D, Portier Christopher J, Vermeulen Roel CH, and Hamburg Steven P. High-resolution air pollution mapping with google street view cars: exploiting big data. Environmental Science & Technology, 51(12):6999–7008, 2017. [DOI] [PubMed] [Google Scholar]
  • [4].Bader Michael DM, Mooney Stephen J, Bennett Blake, and Rundle Andrew G. The promise, practicalities, and perils of virtually auditing neighborhoods using google street view. The ANNALS of the American Academy of Political and Social Science, 669(1):18–40, 2017. [Google Scholar]
  • [5].Bader Michael DM, Mooney Stephen J, Lee Yeon Jin, Sheehan Daniel, Neckerman Kathryn M, Rundle Andrew G, and Teitler Julien O. Development and deployment of the computer assisted neighborhood visual assessment system (canvas) to measure health-related neighborhood conditions. Health & place, 31:163–172, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Bethlehem John R, Mackenbach Joreintje D, Ben-Rebah Maher, Compernolle Sofie, Glonti Ketevan, Bardos Helga, Rutter Harry R, Charreire Hélène, Oppert Jean-Michel, Brug Johannes, et al. The spotlight virtual audit tool: a valid and reliable tool to assess obesogenic characteristics of the built environment. International journal of health geographics, 13(1):52, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Black Jennifer L, Macinko James, Dixon L Beth, and Fryer George E Jr. Neighborhoods and obesity in new york city. Health & place, 16(3):489–499, 2010. [DOI] [PubMed] [Google Scholar]
  • [8].Bleil Maria E, Gianaros Peter J, Jennings J Richard, Flory Janine D, and Manuck Stephen B. Trait negative affect: toward an integrated model of understanding psychological risk for impairment in cardiac autonomic function. Psychosomatic Medicine, 70(3):328–337, 2008. [DOI] [PubMed] [Google Scholar]
  • [9].Block Jason P, Scribner Richard A, and DeSalvo Karen B. Fast food, race/ethnicity, and income: a geographic analysis. American journal of preventive medicine, 27(3):211–217, 2004. [DOI] [PubMed] [Google Scholar]
  • [10].Brownson Ross C, Hoehner Christine M, Day Kristen, Forsyth Ann, and Sallis James F. Measuring the built environment for physical activity: state of the science. American journal of preventive medicine, 36(4):S99–S123, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Burdette Amy M and Hill Terrence D. An examination of processes linking perceived neighborhood disorder and obesity. Social science & medicine, 67(1):38–46, 2008. [DOI] [PubMed] [Google Scholar]
  • [12].Casciano Rebecca and Massey Douglas S. Neighborhood disorder and anxiety symptoms: new evidence from a quasi-experimental study. Health & place, 18(2):180–190, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Charreire Hélène, Mackenbach Joreintje D, Ouasti M, Lakerveld Jeroen, Compernolle Sofie, Ben-Rebah M, McKee Martin, Brug Johannes, Rutter Harry, and Oppert J-M. Using remote sensing to define environmental characteristics related to physical activity and dietary behaviours: a systematic review (the spotlight project). Health & place, 25:1–9, 2014. [DOI] [PubMed] [Google Scholar]
  • [14].Christeson William, Taggart Amy Dawson, and Messner-Zidell Soren. Too fat to fight: retired military leaders want junk food out of America’s schools: a report by Mission: Readiness. Mission, 2010. [Google Scholar]
  • [15].Karina MH Christiansen Farah Qureshi, Schaible Alex, Park Sohyun, and Gittelsohn Joel. Environmental factors that impact the eating behaviors of low-income african american adolescents in baltimore city. Journal of nutrition education and behavior, 45(6):652–660, 2013. [DOI] [PubMed] [Google Scholar]
  • [16].Christina A Clarke Tim Miller, Ellen T Chang Daixin Yin, Cockburn Myles, and Gomez Scarlett L. Racial and social class gradients in life expectancy in contemporary california. Social science & medicine, 70(9):1373–1380, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].Cohen Taco and Welling Max. Group equivariant convolutional networks. In International conference on machine learning, pages 2990–2999, 2016. [Google Scholar]
  • [18].Colditz Graham A. Economic costs of obesity and inactivity. Medicine and science in sports and exercise, 31(11 Suppl):S663–7, 1999. [DOI] [PubMed] [Google Scholar]
  • [19].Compernolle Sofie, De Cocker Katrien, Mackenbach Joreintje D, Van Nassau Femke, Lakerveld Jeroen, Cardon Greet, and De Bourdeaudhuij Ilse. Objectively measured physical environmental neighbourhood factors are not associated with accelerometer-determined total sedentary time in adults. International Journal of Behavioral Nutrition and Physical Activity, 14(1):94, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [20].Cortes Corinna and Vapnik Vladimir. Support-vector networks. Machine learning, 20(3):273–297, 1995. [Google Scholar]
  • [21].Deng Jia, Dong Wei, Socher Richard, Li Li-Jia, Li Kai, and Fei-Fei Li. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009. [Google Scholar]
  • [22].Druss Benjamin G, Marcus Steven C, Olfson Mark, Tanielian Terri, Elinson Lynn, and Pincus Harold Alan. Comparing the national economic burden of five chronic conditions. Health Affairs, 20(6):233–241, 2001. [DOI] [PubMed] [Google Scholar]
  • [23].Duncan Dustin, Meline Julie, Kestens Yan, Day Kristen, Elbel Brian, Trasande Leonardo, and Chaix Basile. Walk score, transportation mode choice, and walking among french adults: a gps, accelerometer, and mobility survey study. International journal of environmental research and public health, 13(6):611, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [24].Eames Margaret, Ben-Shlomo Yoav, and Marmot Michael G. Social deprivation and premature mortality: regional comparison across england. Bmj, 307(6912):1097–1102, 1993. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [25].Furr-Holden C Debra M, Lee Myong Hwa, Milam Adam J, Johnson Renee M, Lee Kwang-Sig, and Ialongo Nicholas S. The growth of neighborhood disorder and marijuana use among urban adolescents: a case for policy and environmental interventions. Journal of studies on alcohol and drugs, 72(3):371–379, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [26].Galea Sandro, Ahern Jennifer, Rudenstine Sasha, Wallace Zachary, and Vlahov David. Urban built environment and depression: a multilevel analysis. Journal of Epidemiology & Community Health, 59(10):822–827, 2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [27].Gens Robert and Domingos Pedro M. Deep symmetry networks. In Advances in neural information processing systems, pages 2537–2545, 2014. [Google Scholar]
  • [28].Griew Pippa, Hillsdon Melvyn, Foster Charlie, Coombes Emma, Jones Andy, and Wilkinson Paul. Developing and testing a street audit tool using google street view to measure environmental supportiveness for physical activity. International Journal of Behavioral Nutrition and Physical Activity, 10(1):103, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [29].Diana S Grigsby-Toussaint Rebecca Lipton, Chavez Noel, Handler Arden, Johnson Timothy P, and Kubo Jessica. Neighborhood socioeconomic change and diabetes risk: findings from the chicago childhood diabetes registry. Diabetes Care, 33(5):1065–1068, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [30].Gullón Pedro, Badland Hannah M, Alfayate Silvia, Bilal Usama, Escobar Francisco, Cebrecos Alba, Diez Julia, and Franco Manuel. Assessing walking and cycling environments in the streets of madrid: comparing on-field and virtual audits. Journal of urban health, 92(5):923–939, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [31].Guttenberg Nicholas, Virgo Nathaniel, Witkowski Olaf, Aoki Hidetoshi, and Kanai Ryota. Permutation-equivariant neural networks applied to dynamics prediction. arXiv preprint arXiv:1612.04530, 2016. [Google Scholar]
  • [32].He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. [Google Scholar]
  • [33].Heinrich Katie M, Lee Rebecca E, Regan Gail R, ReeseSmith Jacqueline Y, Howard Hugh H, Haddock C Keith, Poston Walker S Carlos, and Ahluwalia Jasjit S. How does the built environment relate to body mass index and obesity prevalence among public housing residents? American Journal of Health Promotion, 22(3):187–194, 2008. [DOI] [PubMed] [Google Scholar]
  • [34].Hill Terrence D, Burdette Amy M, and Hale Lauren. Neighborhood disorder, sleep quality, and psychological distress: testing a model of structural amplification. Health & place, 15(4):1006–1013, 2009. [DOI] [PubMed] [Google Scholar]
  • [35].Karpathy Andrej, Toderici George, Shetty Sanketh, Leung Thomas, Sukthankar Rahul, and Fei-Fei Li. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 1725–1732, 2014. [Google Scholar]
  • [36].Kelly Cheryl M, Wilson Jeffrey S, Baker Elizabeth A, Miller Douglas K, and Schootman Mario. Using google street view to audit the built environment: inter-rater reliability results. Annals of Behavioral Medicine, 45(suppl 1):S108–S112, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [37].Kepper Maura M, Sothern Melinda S, Theall Katherine P, Griffiths Lauren A, Scribner Richard A, Tseng Tung-Sung, Schaettle Paul, Cwik Jessica M, Felker-Kantor Erica, and Broyles Stephanie T. A reliable, feasible method to observe neighborhoods at high spatial resolution. American journal of preventive medicine, 52(1):S20–S30, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [38].Krizhevsky Alex, Sutskever Ilya, and Hinton Geoffrey E. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012. [Google Scholar]
  • [39].Marjaana Lahti-Koski Pirjo Pietinen, Heliovaara Markku, and Vartiainen Erkkï. Associations of body mass index and obesity with physical activity, food choices, alcohol intake, and smoking in the 1982–1997 finrisk studies. The American journal of clinical nutrition, 75(5):809–817, 2002. [DOI] [PubMed] [Google Scholar]
  • [40].Yann LéCun Leon Bottou, Bengio Yoshua, Haffner Patrick, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. [Google Scholar]
  • [41].James A Levine. Poverty and obesity in the us, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [42].Li Xiaojiang, Zhang Chuanrong, Li Weidong, Ricard Robert, Meng Qingyan, and Zhang Weixing. Assessing street-level urban greenery using google street view and a modified green view index. Urban Forestry & Urban Greening, 14(3):675–685, 2015. [Google Scholar]
  • [43].Lu Yi. The association of urban greenness and walking behavior: Using google street view and deep learning techniques to estimate residents exposure to urban greenness. International journal of environmental research and public health, 15(8):1576, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [44].Lysy Zoe, Booth Gillian L, Shah Baiju R, Austin Peter C, Luo Jin, and Lipscombe Lorraine L. The impact of income on the incidence of diabetes: a population-based study. Diabetes research and clinical practice, 99(3):372–379, 2013. [DOI] [PubMed] [Google Scholar]
  • [45].McEwen Bruce S. Stress, adaptation, and disease: Allostasis and allostatic load. Annals of the New York academy of sciences, 840(1):33–44, 1998. [DOI] [PubMed] [Google Scholar]
  • [46].Molnar Beth E, Gortmaker Steven L, Bull Fiona C, and Buka Stephen L. Unsafe to play? neighborhood disorder and lack of safety predict reduced physical activity among urban children and adolescents. American journal of health promotion, 18(5):378–386, 2004. [DOI] [PubMed] [Google Scholar]
  • [47].Mooney Stephen J, Bader Michael DM, Lovasi Gina S, Neckerman Kathryn M, Teitler Julien O, and Rundle Andrew G. Validity of an ecometric neighborhood physical disorder measure constructed by virtual street audit. American journal of epidemiology, 180(6):626–635, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [48].Mooney Stephen J, Bader Michael DM, Lovasi Gina S, Teitler Julien O, Koenen Karestan C, Aiello Allison E, Galea Sandro, Goldmann Emily, Sheehan Daniel M, and Rundle Andrew G. Street audits to measure neighborhood disorder: virtual or in-person? American journal of epidemiology, 186(3):265–273, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [49].Mooney Stephen J, DiMaggio Charles J, Lovasi Gina S, Neckerman Kathryn M, Bader Michael DM, Teitler Julien O, Sheehan Daniel M, Jack Darby W, and Rundle Andrew G. Use of google street view to assess environmental contributions to pedestrian injury. American journal of public health, 106(3):462–469, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [50].Morland Kimberly, Wing Steve, Roux Ana Diez, and Poole Charles. Neighborhood characteristics associated with the location of food stores and food service places. American journal of preventive medicine, 22(1):23–29, 2002. [DOI] [PubMed] [Google Scholar]
  • [51].Morris JN, Blane DB, and White IR. Levels of mortality, education, and social conditions in the 107 local education authority areas of england. Journal of Epidemiology & Community Health, 50(1):15–17, 1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [52].Mujahid Mahasin S, Roux Ana V Diez, Shen Mingwu, Gowda Deepthiman, Sanchez i, Shea Steven, Jacobs David R Jr, and Jackson Sharon A. Relation between neighborhood environments and obesity in the multi-ethnic study of atherosclerosis. American journal of epidemiology, 167(11):1349–1357, 2008. [DOI] [PubMed] [Google Scholar]
  • [53].Nesoff Elizabeth D, Milam Adam J, Pollack Keshia M, Curriero Frank C, Bowie Janice V, Gielen Andrea C, and FurrHolden Debra M. Novel methods for environmental assessment of pedestrian injury: creation and validation of the inventory for pedestrian safety infrastructure. Journal of urban health, 95(2):208–221, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [54].Quynh C Nguyen Sahil Khanna, Dwivedi Pallavi, Huang Dina, Huang Yuru, Tasdizen Tolga, Kimberly D Brunisholz Feifei Li, Gorman Wyatt, Nguyen Thu T, et al. Using google street view to examine associations between built environment characteristics and us health outcomes. Preventive medicine reports, 14:100859, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [55].Quynh C Nguyen Mehdi Sajjadi, Matt McCullough Minh Pham, Thu T Nguyen Weijun Yu, Meng Hsien-Wen, Wen Ming, Li Feifei, Ken R Smith, et al. Neighbourhood looking glass: 360° automated characterisation of the built environment for neighbourhood effects research. J Epidemiol Community Health, 72(3):260–266, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [56].Odgers Candice L, Caspi Avshalom, Bates Christopher J, Sampson Robert J, and Moffitt Terrie E. Systematic social observation of childrens neighborhoods using google street view: a reliable and cost-effective method. Journal of Child Psychology and Psychiatry, 53(10):1009–1017, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [57].Ogden Cynthia L, Carroll Margaret D, Fryar Cheryl D, and Flegal Katherine M. Prevalence of obesity among adults and youth: United states, 2011–2014. 2015. [PubMed] [Google Scholar]
  • [58].Otsu Nobuyuki. A threshold selection method from gray-level histograms. IEEE transactions on systems, man, and cybernetics, 9(1):62–66, 1979. [Google Scholar]
  • [59].Charles R Qi Hao Su, Mo Kaichun, and Guibas Leonidas J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 652–660, 2017. [Google Scholar]
  • [60].Ravanbakhsh Siamak, Schneider Jeff, and Poczos Barnabas. Equivariance through parameter-sharing. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 2892–2901. JMLR. org, 2017. [Google Scholar]
  • [61].Roemmich James N, Epstein Leonard H, Raja Samina, Yin Li, Robinson Jodie, and Winiewicz Dana. Association of access to parks and recreational facilities with the physical activity of young children. Preventive medicine, 43(6):437–441, 2006. [DOI] [PubMed] [Google Scholar]
  • [62].Ruder Sebastian. An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098, 2017. [Google Scholar]
  • [63].Andrew G Rundle Michael DM Bader, Catherine A Richards, Kathryn M Neckerman, and Julien O Teitler. Using google street view to audit neighborhood environments. American journal of preventive medicine, 40(1):94–100, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [64].Rundle Andrew G and Heymsfield Steven B. Can walkable urban design play a role in reducing the incidence of obesity-related conditions? Jama, 315(20):2175–2177, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [65].Seeman Teresa E, Singer Burton H, Rowe John W, Horwitz Ralph I, and McEwen Bruce S. Price of adaptationallostatic load and its health consequences: Macarthur studies of successful aging. Archives of internal medicine, 157(19):2259–2268, 1997. [PubMed] [Google Scholar]
  • [66].Simonyan Karen and Zisserman Andrew. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. [Google Scholar]
  • [67].Thornton Christina M, Conway Terry L, Cain Kelli L, Gavand Kavita A, Saelens Brian E, Frank Lawrence D, Geremia Carrie M, Glanz Karen, King Abby C, and Sallis James F. Disparities in pedestrian streetscape environments by income and race/ethnicity. SSM-population health, 2:206–216, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [68].Townsend Peter, Phillimore Peter, and Beattie Alastair. Health and deprivation: inequality and the North. Routledge, 1988. [Google Scholar]
  • [69].Truong Khoa D and Ma Sai. A systematic review of relations between neighborhoods and mental health. The journal of mental health policy and economics, 9(3):137–154, 2006. [PubMed] [Google Scholar]
  • [70].Tyroler HA, Wing S, and Knowles MG. Increasing inequality in coronary heart disease mortality in relation to educational achievement: profile of places of residence, united states, 1962 to 1987. Ann Epidemiol, 3(Suppl):S51–S54, 1993. [Google Scholar]
  • [71].https://www.cdc.gov/500cities/index.html”.
  • [72].Vinyals Oriol, Bengio Samy, and Kudlur Manjunath. Order matters: Sequence to sequence for sets. arXiv preprint arXiv:1511.06391, 2015. [Google Scholar]
  • [73].May C Wang Soowon Kim, Gonzalez Alma A, MacLeod Kara E, and Winkleby Marilyn A. Socioeconomic and food-related physical characteristics of the neighbourhood environment are associated with body mass index. Journal of Epidemiology & Community Health, 61(6):491–498, 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [74].Wilson Jeffrey S, Kelly Cheryl M, Schootman Mario, Baker Elizabeth A, Banerjee Aniruddha, Clennin Morgan, and Miller Douglas K. Assessing the built environment using omnidirectional imagery. American journal of preventive medicine, 42(2):193–199, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [75].Wing Steve, Barnett Elizabeth, Casper Michele, and Tyroler HA. Geographic and socioeconomic variation in the onset of decline of coronary heart disease mortality in white women. American Journal of Public Health, 82(2):204–209, 1992. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [76].Woebbecke DM, Meyer GE, Von Bargen K, and Mortensen DA. Shape features for identifying young weeds using image analysis. Transactions of the ASAE, 38(1):271–281, 1995. [Google Scholar]
  • [77].Yin Li, Cheng Qimin, Wang Zhenxin, and Shao Zhenfeng. big datafor pedestrian volume: Exploring the use of google street view images for pedestrian counts. Applied Geography, 63:337–345, 2015. [Google Scholar]
  • [78].Yin Li and Wang Zhenxin. Measuring visual enclosure for street walkability: Using machine learning algorithms and google street view imagery. Applied geography, 76:147–153, 2016. [Google Scholar]
  • [79].Zaheer Manzil, Kottur Satwik, Ravanbakhsh Siamak, Poczos Barnabas, Salakhutdinov Ruslan R, and Smola Alexander J. Deep sets. In Advances in neural information processing systems, pages 3391–3401, 2017. [Google Scholar]
  • [80].Zhao Hengshuang, Shi Jianping, Qi Xiaojuan, Wang Xiaogang, and Jia Jiaya. Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2881–2890, 2017. [Google Scholar]

RESOURCES