Abstract
Increasing population levels of cycling has the potential to improve public health by increasing physical activity. As cyclists have begun using smartphone apps to record trips, researchers have begun using data generated from these apps to monitor cycling levels and evaluate cycling-related interventions.
The goal of this research is to assess the extent to which app-using cyclists represent the broader cycling population to inform whether use of app-generated data in bike-infrastructure intervention research may bias effect estimates.
Using an intercept survey, we asked 95 cyclists throughout Atlanta, Georgia, USA about their use of GPS-based smartphone apps to record bike rides. We asked respondents to draw their common bike routes, from which we assessed the proportion of ridership captured by app-generated data sources overall and on types of bicycle infrastructure. We measured socio-demographics and bike-riding habits, including cyclist type, ride frequency, and most common ride purpose.
Cyclists who used smartphone apps to record their bike rides differed from those who did not across some but not all socio-demographic characteristics and differed in several bike-riding attributes. App users rode more frequently, self-classified as stronger riders, and rode proportionately more for leisure. Although groups had similar infrastructure preferences at the person level, differences appeared at the level of the estimated ride, where, for example, the proportion of ridership captured by an app on protected bike lanes was lower than the overall proportion of ridership captured. A sample calculation illustrated how such differences may induce selection bias in smartphone-data-based research on infrastructure and motor-vehicle-cyclist crash risk. We illustrate in the sample scenario how the bias can be corrected, assuming inverse-probability-of-selection weights can be accurately specified. The presented bias-adjustment method may be useful for future bike-infrastructure research using smartphone-generated data.
Keywords: bias, selection bias, cycling, representativeness, smartphone apps
1. Introduction
Bicycling has many benefits to society, including less automobile-traffic congestion and air pollution, and potentially better public health through increased physical activity,1 although safety and air-pollution exposure remain public-health concerns.2 Given its societal benefits, extensive research has examined cyclists’ patterning and determinants. Traditional methods for measuring cycling, such as stationary counters, have limited spatial detail. To supplement these methods, planners and researchers have begun to use crowdsourced data generated from GPS-enabled smartphone apps, such as Strava® and others.3–6 Although these app-generated data are detailed across space and time, their use comes with challenges.3
First, they may not represent the general cycling population. Blanc and colleagues compared the socio-demographic distribution of users of several smartphone apps with active-travel surveys and found that the smartphone data tended to underrepresent women, older adults, and lower-income populations.7 The study included several smartphone apps developed for transportation-planning purposes across many North American cities, but did not address characteristics of users of popular fitness-oriented apps, such as Strava® or MapMyRide®. The spatial patterning and distribution of age and sex of users of fitness-oriented apps has been examined,4,8–11 but to our knowledge, race, socio-economic status, and person-level bike-riding habits of users of fitness-oriented apps have not been assessed, as they are not typically asked in the set-up of the app. Understanding representativeness is important so researchers know to whom results apply, a question of external validity.
The second concern is a potential threat to internal validity, specifically selection bias in etiologic questions. Suppose a researcher interested in the effect of bike infrastructure on bike safety wishes to compare the rate of bike-automobile collisions on protected versus conventional bike lanes. The investigator uses app-generated data to estimate ridership on bike infrastructure and agency data to count collisions on the infrastructure. To account for the fact that the agency’s collision counts (ideally) arise from the risk set of all cyclists, the researcher might weight the app-generated ridership data by the inverse of the overall proportion expected to be captured by the app. Such an approach may lead to bias, however, because app users—who may be more confident or aggressive riders—may use infrastructure differently from non-users.5,6 If they, for example, are less likely to ride on protected bike lanes, the rate of collisions on that infrastructure would be comparatively overestimated. This bias could be addressed with estimates of the ridership proportion for each type of infrastructure. While research has assessed correlations of smartphone-generated ridership with stationary counts on some types of infrastructure,6,8 to our knowledge, infrastructure-specific estimates of the selection probability have not been estimated.
The overarching goal of this research is thus to assess the extent to which cyclists contributing to smartphone-generated data sources represent the cycling population to inform whether reliance on this data source may bias effect estimates in bike-infrastructure research. We first aim to compare socio-demographic characteristics and bike-riding habits of cyclists who use smartphone apps to record bike rides with those who do not. Secondly, we estimate the proportion of ridership that would be captured by each type of smartphone app and three apps specifically by ride purpose and cycling infrastructure type. In this aim, we illustrate an example of how estimated selection probabilities can be used to quantify and adjust for selection bias in a hypothetical study on the effect of cycling infrastructure on risk of a bike crash.
2. Methods
2.1. Venue-based sampling
We conducted a venue-based survey in Atlanta, Georgia, USA, between June 2016 and April 2017. The target population was any adult who at least sometimes rode a bike for any purpose in the Atlanta urban core (defined loosely as the area inside I-285). We conceived the sampling frame as a geographically diverse list of venues (n=67) around the city where we expected to find utilitarian and recreational cyclists. The initial list included large employers, including hospitals, corporate headquarters, and government offices; bike-related businesses, including service, retail, and advocacy; colleges and universities; transit stations; public parks; restaurants; grocery stores and large retail centers; and open-streets events, called Atlanta Streets Alive (Supplement 1, Figure S1). During Atlanta Streets Alive events (henceforth, Streets Alive), selected streets are temporarily closed to motor vehicles for the afternoon to encourage the space to be used for recreation and socialization, allowing car-free access to walk, ride a bike, or participate in other non-motorized activities.12 The event, like many others, is based on the Ciclovía model.13,14 An estimated 45,000 people attend each Streets Alive event, the seventh highest attendance of similar events in 47 U.S. cities reporting data in a recent review.15,16 (Most of the cities with higher attendance also have a higher population.) Survey results from 2010 and 2012 estimated that 40% of attendees were non-white,12 a percentage lower than that of the city (about 60% in the 2010 Census) but nonetheless indicative of diverse attendance.
We surveyed at 29 sites. Excluding Streets Alive, we stayed at each site for an average of 1.0 hours if at least two bikes were locked nearby. At Streets Alive events, we surveyed for an average of 3.1 hours. In total, we surveyed for 36 hours on 18 days throughout the summer and fall of 2016 and 2 days in the spring of 2017. Most surveying occurred in the afternoon and evening to intercept commutes, errands, and recreation during this time.
2.2. Survey administration
The strategy for surveying participants differed at Streets Alive compared with the other locations due to the high volume of bike traffic at Streets Alive. When not at Streets Alive, we intercepted people by standing near bike racks and approaching people as they exited the venue to unlock their bike. At Streets Alive, with the permission of the event organizer, we intercepted cyclists as they rode or walked their bicycle near our table. At all locations, we administered a face-to-face questionnaire to respondents, which took between 5 and 10 minutes. Eligible participants were 18 years or older and at least sometimes rode a bicycle in the Atlanta urban core. Participant consent was considered implied upon agreement to take the survey. The study protocol was approved by the Georgia Institute of Technology Review Board (H12330).
2.3. Measures
2.3.1. Bicycle infrastructure
Geospatial data for bicycle infrastructure in the Atlanta area were obtained from the Atlanta Regional Commission17 and the City of Atlanta Chief Bicycle Officer (personal correspondence). We merged the two data sources and resolved discrepancies using the sf package in R18 and classified the infrastructure as the following types: hard-surface multi-use path (92 km1), protected bike lane (4 km), buffered bike lane (5 km), conventional bike lane (101 km), and shared-travel lane also known as “sharrows” (73 km) (R script available here: https://osf.io/awms9/; map: http://rpubs.com/michaeldgarber/coa_arc). Data were restricted to infrastructure built before December 31, 2016.
2.3.2. Bike-riding habits and routes of common rides
The survey (Supplement 2) asked about bike-riding habits, including reasons for biking, general biking frequency, cyclist self-classification,19 and tendency to go out of way and why. We also asked respondents to draw up to two of the routes of their most common rides with a dry-erase marker on a laminated map, and we digitally photographed the drawings. For each shared route, we asked its purpose and frequency. Ride purpose was classified as utilitarian or leisure (including for exercise), and person-level patterns were considered in three ways to capture potential nuances: 1) ever riding for each purpose: 2) most frequent purpose, in terms of number of rides; and 3) the share of weekly distance for each purpose (additional detail in Supplement 1, Section 1). For reference, study results were compared to a local agency-sponsored travel survey20 (ARC Survey; Supplement 1, Section 2).
2.3.3. Length of bike rides and distance ridden overall and on infrastructure
We digitized the route of each bike ride using the trace-segment tool in ArcMap 10.3 (© ESRI) and measured its overall length and its length on each type of infrastructure with the sf package.18 Next, we estimated the weekly distance ridden on each ride in total and on each type of infrastructure by multiplying the corresponding length by a numerical estimate of the ride’s reported daily frequency (Supplement 1, Table S1). We then summed each person’s ride-level values.
2.3.4. Use of smartphone app to record bike rides
We asked participants if they ever used a smartphone app to record their bike rides and if so, which app(s), and how frequently each was used. In analyses, apps were classified as fitness-oriented (Strava, MapMyRide, Garmin Connect, Moves, Samsung S Health, Jawbone Up, Nike+, Runkeeper, Trails) or transportation-planning-oriented (Ride Report, Cycle Atlanta). Participants were classified into four app-use groups (Table 1).
Table 1.
Group | Criteria |
---|---|
Non-user | Less than occasionally uses any app to record rides |
Any-app user | Uses any app to record rides at least on occasion |
Fitness-app user1 | Uses a fitness-oriented app to record rides at least on occasion |
Planning-app user1 | Uses a transportation-planning-oriented app to record rides at least on occasion |
Because participants could use multiple apps, these groups are not mutually exclusive.
2.3.5. Socio-demographic characteristics and access to infrastructure
To assess socio-demographic characteristics, we asked respondents about age, gender, access to a private automobile, household income, occupation, race/ethnicity, and, as applicable, zip code of home, work, and school. To measure area-level socio-economic status (SES) and access to infrastructure, we geocoded each person’s home location21 by estimating a plausible street address from the respondent’s bike routes, where possible, and cross-checked with the zip code reported. Area-level SES was defined as the median home value of the home census tract using the 2012-2016 American-Community Survey 5-year estimates. To assess access to bike infrastructure, we measured the distance from home location to nearest infrastructure by infrastructure type.
2.4. Person-level analyses
Each of the three app-user groups was compared to non-users across socio-demographic and bike-riding characteristics. Continuous measures were compared with a two-sided Wilcoxon Test, and categorical measures were compared with a two-sided Fisher’s exact test. In combination with considerations of sample size and the estimate’s magnitude, p-values were used in their continuous form to inform conclusions.22,23
2.5. Estimating ridership captured by smartphone data
In addition to patterns at the individual level, we sought to understand app-use patterns at the ride level, as this format is typically how app-generated data are delivered to researchers. Importantly, app use may not only vary between people, but also within people between rides. Cyclists may, for example, record only exercise rides and not commutes. Using responses from intercept surveys, we simulated datasets of an expected one year of rides by weighting each of the rides described as ‘typical’ by an estimate of how often it would occur over one year (Supplement 1, Table S1). We then assigned a probability for whether each of these rides in the simulated one-year datasets was recorded in an app based on the respondent’s app-use pattern and the ride’s purpose (Supplement 1, Table S2). To estimate the proportion of ridership captured (i.e., the selection probability) by each app-type category (fitness or planning) and by each of the three most used apps, we estimated the number of rides and distance ridden recorded by the app category or app and divided by the overall estimated number of rides and overall distance ridden. We did the same for each ride purpose and type of infrastructure.
2.6. Illustration of use of selection probabilities to recover from selection bias
We finally illustrate how the estimated selection probabilities might be used to correct for potential selection bias using inverse-probability-of-selection weighting (as described elsewhere24,25) in a hypothetical study investigating the effect of cycling infrastructure on risk of a bike crash which relies on bike-ridership estimates from an app.
3. Results
3.1. Study population
We approached 143 people, 130 of whom were eligible (18 years or older and sometimes rode a bike in the Atlanta urban core), 99 of whom took the survey (Supplement 1, Figure S1). Among these 99 participants, app use was not missing for 95 (69% of eligible), 92 drew at least one typical bike ride (45 reported one; 47 reported two; n, rides = 139), and both app use and a ride were available for 90 participants. The distribution of attempts and response rate varied by location type, geographic quadrant of the city, sex, age, and race (Supplement 1, Table S3). Most responses occurred at Streets Alive events (58%) while most eligible attempts occurred elsewhere (53%).
3.2. App use of participants
A total of 43 participants (45%) responded ‘yes’ to ever using a smartphone app to record bike rides (Table 2). Of the 95 respondents, 39% used any app at least on occasion, 33% used a fitness app at least on occasion, and 9% used a planning app at least on occasion. The three most commonly reported apps were Map My Ride (13%), Strava (11%), and Ride Report (9%). The reported use pattern of each app is available in Supplement 1, Table S4.
Table 2.
n | % | |
---|---|---|
Ever uses an app to record bike rides | 43 | 45% |
Frequency of use1 | ||
Every ride | 20 | 21% |
On most rides, including most commutes and most recreational rides | 4 | 4% |
Only for recreational or exercise rides | 9 | 9% |
Only for commutes | 0 | 0% |
On occasion, but no pattern | 4 | 4% |
On phone, but never or very rarely | 4 | 4% |
Missing | 2 | 2% |
App-use group | ||
Non-user | 58 | 61% |
Any-app user | 37 | 39% |
Fitness-app user2 | 31 | 33% |
Planning-app user2 | 9 | 9% |
Reported use of app at least on occasion, by app3 | ||
Fitness-oriented | ||
Map My Ride | 12 | 13% |
Strava | 11 | 12% |
Garmin Connect | 5 | 5% |
Moves | 4 | 4% |
Google Fit | 2 | 2% |
Samsung S Health | 2 | 2% |
Jawbone Up | 1 | 1% |
Nike+ | 1 | 1% |
Runkeeper | 1 | 1% |
Trails | 1 | 1% |
Transportation-planning-oriented | ||
Ride Report | 9 | 9% |
Cycle Atlanta | 1 | 1% |
If respondents report using more than one app, this value represents their use pattern for their most frequently used app.
Three participants used both app types at least on occasion.
More than one app possible per respondent.
3.3. Socio-demographics
Socio-demographic characteristics of the app-use groups are presented in Table 3. Compared with non-users, app users were similar with respect to gender (70% vs. 72% male) and race (e.g., 72% vs. 67% White), but were moderately older (mean age: 36.8 vs. 33.9 years) and had moderately higher income (mean: $96,034 vs. $81,798). Among app users, users of fitness apps and those of planning apps differed in some respects. Compared with planning-app users, fitness-app users were more likely to be male (77% vs. 44%) and were more racially diverse (e.g., 68% vs. 88% White). Income distribution was higher among fitness-oriented users (mean: $97,200 vs. $79,583), but area-level SES was lower among fitness-app users (mean: $263,921 vs. $354,314).
Table 3.
Characteristics | Total study population, N (%) | Never uses an app, N (%) | Any-app user, N (%) | Fitness-app user,1 N (%) | Planning-app user,1 N (%) | ARC Survey2,3 % ± SE% | p-value3: any app vs. none | p-value:3 fitness app vs. else | p-value:3 planning app vs. else | p-value:3 ARC vs. study |
---|---|---|---|---|---|---|---|---|---|---|
Total | 95 | 58 (61%) | 37 (39%) | 31 (33%) | 9 (9%) | N = 18 ± 5 | ||||
Male gender | 68 (72%) | 42 (72%) | 26 (70%) | 24 (77%) | 4 (44%) | 79% ± 16% | 0.820 | 0.470 | 0.112 | 0.528 |
Age (years)4 | ||||||||||
Mean ± SD | 35.1 ± 11.5 | 33.9 ± 12.2 | 36.8 ± 10.3 | 36.3 ± 10.3 | 38.1 ± 12.1 | 38.2 ± 3.0 | 0.089 | 0.227 | 0.400 | 0.099 |
18-24 | 18 (20%) | 12 (22%) | 6 (16%) | 6 (19%) | 1 (11%) | 11% ± 10% | 0.479 | 0.904 | 0.743 | 0.402 |
25-34 | 35 (38%) | 23 (42%) | 12 (32%) | 10 (32%) | 3 (33%) | 33% ± 17% | ||||
35-44 | 21 (23%) | 11 (20%) | 10 (27%) | 8 (26%) | 2 (22%) | 24% ± 10% | ||||
45-54 | 11 (12%) | 4 (7%) | 7 (19%) | 5 (16%) | 2 (22%) | 26% ± 10% | ||||
55-64 | 6 (7%) | 4 (7%) | 2 (5%) | 2 (7%) | 1 (11%) | 6% ± 4% | ||||
65+ | 1 (1%) | 1 (2%) | 0 (0%) | 0 (0%) | 0 (0%) | 0% ± 0% | ||||
Race | ||||||||||
White | 65 (69%) | 39 (67%) | 26 (72%) | 21 (68%) | 7 (88%) | 79% ± 12% | 0.421 | 0.405 | 0.674 | 0.128 |
Black | 14 (15%) | 7 (12%) | 7 (19%) | 7 (23%) | 0 (0%) | 21% ± 19% | ||||
Hispanic or Latino | 4 (4%) | 4 (7%) | 0 (0%) | 0 (0%) | 0 (0%) | 0% ± 0% | ||||
Asian | 10 (11%) | 7 (12%) | 3 (8%) | 3 (10%) | 1 (13%) | 0% ± 0% | ||||
Other | 1 (1%) | 1 (2%) | 0 (0%) | 0 (0%) | 0 (0%) | 0% ± 0% | ||||
Annual household income ($)4 | ||||||||||
Mean ± SD | 87,230 ± 47,445 | 81,798 ± 49,293 | 96,034 ± 43,679 | 97,200 ± 42,477 | 79,583 ± 46,809 | 84,214 ± 11,182 | 0.187 | 0.156 | 0.598 | 0.919 |
<19,999 | 8 (11%) | 7 (15%) | 1 (3%) | 1 (4%) | 0 (0%) | 0% ± 0% | 0.531 | 0.383 | 0.721 | 0.037 |
20,000-39,999 | 4 (5%) | 3 (6%) | 1 (3%) | 0 (0%) | 1 (17%) | 16% ± 13% | ||||
40,000-59,999 | 13 (17%) | 8 (17%) | 5 (17%) | 4 (16%) | 2 (33%) | 25% ± 16% | ||||
60,000-74,999 | 13 (17%) | 6 (13%) | 7 (24%) | 7 (28%) | 1 (17%) | 16% ± 16% | ||||
75,000-99,999 | 5 (7%) | 4 (9%) | 1 (3%) | 1 (4%) | 0 (0%) | 13% ± 7% | ||||
100,000-149,999 | 19 (25%) | 12 (26%) | 7 (24%) | 6 (24%) | 1 (17%) | 23% ± 11% | ||||
≥150,000 | 14 (18%) | 7 (15%) | 7 (24%) | 6 (24%) | 1 (17%) | 6% ± 5% | ||||
Regular access to a private automobile | 63 (79%) | 33 (73%) | 30 (86%) | 25 (86%) | 8 (89%) | 100% ± 0% | 0.271 | 0.266 | 0.273 | 0.001 |
Own a smartphone or GPS-capable device | 94 (99%) | 57 (98%) | 37 (100%) | 31 (100%) | 9 (100%) | N/A | 1.000 | 1.000 | 1.000 | N/A |
Area-level median home value ($), Mean (SD) | 259,672 (125,061) | 249,544 (107,493) | 275,386 (148,990) | 263,921 (130,418) | 354,314 (227,897) | N/A | 0.682 | 0.890 | 0.385 | N/A |
Distance from home to nearest infrastructure (km), Mean (SD) | N/A | N/A | ||||||||
Hard-surface multi-use trail | 0.97 (1.04) | 1.00 (0.98) | 0.92 (1.14) | 0.99 (1.23) | 0.71 (0.64) | N/A | 0.283 | 0.570 | 0.392 | N/A |
Protected bike lane | 3.24 (3.64) | 3.19 (3.46) | 3.33 (3.96) | 3.18 (4.04) | 4.00 (3.23) | N/A | 0.930 | 0.676 | 0.184 | N/A |
Buffered bike lane | 3.10 (3.70) | 3.05 (3.52) | 3.16 (4.03) | 3.05 (4.12) | 3.69 (3.23) | N/A | 0.898 | 0.676 | 0.286 | N/A |
Conventional bike lane | 0.92 (1.53) | 0.89 (1.53) | 0.96 (1.56) | 0.88 (1.56) | 1.03 (1.48) | N/A | 0.332 | 0.684 | 0.740 | N/A |
Shared travel lane | 0.98 (1.66) | 0.96 (1.71) | 1.00 (1.61) | 1.14 (1.73) | 0.47 (0.42) | N/A | 0.312 | 0.067 | 0.662 | N/A |
SD, standard deviation; SE, standard error.
Three participants used both app types at least on occasion.
ARC Survey, 2011 Atlanta Regional Commission Regional Travel Survey. Additional detail can be found in Supplement 1, Section 2.
P-values are from two-sided Fisher’s Exact Test for categorical measures or Wilcoxon test for continuous measures.
Age was surveyed as an integer; some respondents responded in ten-year categories. Income was surveyed in categories. For both, where a continuous value was not available, continuous analyses used the category midpoint, except for the bottom and top categories, which were set at their inner bound.
3.4. Bike-riding habits of respondents
3.4.1. Bike-riding frequency, cyclist type, and ride purposes
App-users vs. non-users
In general, app users reported riding more frequently than non-users (e.g., at least several times per week: 94% vs. 76%), were more likely to self-classify as a strong and fearless cyclist (41% vs. 32%; Table 4) and rode a greater estimated weekly distance (median: 43 vs. 16 km; Figure 1; Supplement 1 Table S5). As mentioned, ride-purpose patterns were considered in three ways: ever riding for each purpose, most common purpose, and share of estimated weekly distance for each purpose. For each, app users and non-users were similar, although the distribution of estimated ridership for leisure was slightly higher for app users (Figure 1; Supplement 1 Table S5).
Table 4.
Characteristics | Total study population, N (%) | Never uses an app, N (%) | Any-app user, N (%) | Fitness-app user,1 N (%) | Planning-app user,1 N (%) | ARC Survey2,3 % ± SE% | p-value3: any app vs. none | p-value3: fitness app vs. else | p-value3: planning app vs. else | p-value3: ARC vs. study |
---|---|---|---|---|---|---|---|---|---|---|
Total | 95 | 58 (61%) | 37 (39%) | 31 (33%) | 9 (9%) | N = 18 ± 5 | ||||
Bike-riding frequency | ||||||||||
5 or more times per week | 44 (49%) | 25 (46%) | 19 (53%) | 17 (55%) | 4 (50%) | 59% ± 17% | 0.090 | 0.196 | 0.338 | 0.001 |
Several times per week | 31 (34%) | 16 (30%) | 15 (42%) | 13 (42%) | 3 (38%) | 18% ± 11% | ||||
Several times per month | 11 (12%) | 10 (19%) | 1 (3%) | 1 (3%) | 0 (0%) | 3% ± 3% | ||||
< once per month, at least once every six months | 2 (2%) | 1 (2%) | 1 (3%) | 0 (0%) | 1 (3%) | 0% ± 0% | ||||
Less than once per year | 2 (2%) | 2 (4%) | 0 (0%) | 0 (0%) | 0 (0%) | 20% ± 11% | ||||
Reason(s) for bicycling, ever4 | ||||||||||
Utilitarian | 79 (83%) | 50 (86%) | 29 (78%) | 25 (81%) | 6 (67%) | N/A | 0.583 | 0.810 | 0.174 | N/A |
Leisure | 86 (91%) | 51 (88%) | 35 (95%) | 29 (94%) | 9 (100%) | N/A | 0.475 | 0.713 | 0.594 | N/A |
Utilitarian is most frequent ride purpose | 59 (62%) | 39 (67%) | 20 (54%) | 16 (52%) | 6 (67%) | 0.577 | 0.177 | 1.000 | N/A | |
Bicyclist self-classification | ||||||||||
Strong and fearless | 31 (35%) | 17 (32%) | 14 (41%) | 13 (43%) | 2 (29%) | N/A | 0.025 | 0.022 | 0.920 | N/A |
Enthused and confident | 34 (39%) | 17 (32%) | 17 (50%) | 15 (50%) | 3 (43%) | N/A | N/A | |||
Comfortable but cautious | 19 (22%) | 16 (30%) | 3 (9%) | 2 (7%) | 2 (29%) | N/A | N/A | |||
Interested but concerned | 4 (5%) | 4 (7%) | 0 (0%) | 0 (0%) | 0 (0%) | N/A | N/A | |||
Ever do not take the most direct route | 68 (83%) | 39 (80%) | 29 (88%) | 25 (86%) | 7 (100%) | N/A | 0.384 | 0.761 | 0.597 | N/A |
Why4 | ||||||||||
Safety / avoid car traffic | 57 (84%) | 32 (82%) | 25 (86%) | 22 (88%) | 6 (86%) | N/A | 1.000 | 0.735 | 1.000 | N/A |
To avoid hills | 15 (22%) | 8 (21%) | 7 (24%) | 6 (24%) | 2 (29%) | N/A | 0.769 | 0.765 | 0.837 | N/A |
More exercise | 8 (12%) | 4 (10%) | 4 (14%) | 4 (16%) | 0 (0%) | N/A | 0.708 | 0.444 | 1.000 | N/A |
More attractive scenery | 17 (25%) | 7 (18%) | 10 (35%) | 9 (36%) | 1 (14%) | N/A | 0.284 | 0.266 | 0.665 | N/A |
Relaxation or fun | 19 (28%) | 11 (28%) | 8 (28%) | 7 (28%) | 1 (14%) | N/A | 1.000 | 1.000 | 0.664 | N/A |
Better surface | 2 (3%) | 1 (3%) | 1 (3%) | 1 (4%) | 0 (0%) | N/A | 1.000 | 0.549 | 1.000 | N/A |
Ever ride on the sidewalk | 66 (77%) | 47 (90%) | 19 (56%) | 17 (57%) | 3 (43%) | N/A | 0.001 | 0.003 | 0.048 | N/A |
Why3 | ||||||||||
Safety / avoid car traffic | 51 (84%) | 36 (86%) | 15 (79%) | 13 (77%) | 3 (100%) | N/A | 0.710 | 0.444 | 1.000 | N/A |
Better surface | 9 (15%) | 6 (14%) | 3 (16%) | 3 (18%) | 0 (0%) | N/A | 1.000 | 0.700 | 1.000 | N/A |
Travelling uphill | 3 (5%) | 1 (2%) | 2 (11%) | 2 (12%) | 1 (33%) | N/A | 0.558 | 0.248 | 0.261 | N/A |
Travelling wrong way on one-way streets | 4 (6%) | 2 (4%) | 2 (11%) | 2 (12%) | 0 (0%) | N/A | 0.641 | 0.594 | 1.000 | N/A |
Other | 2 (3%) | 2 (4%) | 0 (0%) | 0 (0%) | 0 (0%) | N/A | 0.519 | 1.000 | 1.000 | N/A |
Three respondents used both app types at least on occasion, which is why these groups add to 40 and not 37.
ARC Survey, 2011 Atlanta Regional Commission Regional Travel Survey. Additional detail can be found in Supplement 1, Section 2
P-values are from two-sided Fisher’s Exact Test for categorical measures or Wilcoxon test for continuous measures.
More than one response possible. Percents are proportions.
N/A, not available
Fitness-app users vs. planning-app users
Fitness-app users and planning-app users reported riding at a similarly high frequency (5 or more times per week: 55% vs. 50%; Table 4), though fitness-app users were more likely to self-classify as a strong and fearless cyclist (43% vs. 29%) and rode a lesser share of their estimated weekly ridership for utilitarian purposes. Fitness-app users split their estimated weekly distance about equally between utilitarian and leisure purposes (median, utilitarian = 58%), while planning-app users rode a larger share for utilitarian purposes (median = 82%; Figure 1; Supplement 1 Table S5).
3.4.2. Person-level patterns on infrastructure
Generally, the percent of estimated weekly distance per cyclist on infrastructure was similar between app-user groups for most infrastructure types (Figure 2; Supplement 1 Table S5).
3.5. Estimated ridership selection probabilities by app type, by ride purpose, and by infrastructure type
Whereas Figures 1 and 2 illustrate patterns of reported ridership at the person level, Table 5 shows expected ridership at the ride level. The ride-level data, estimated by extrapolating rider-described typical rides to a one-year period, allow app use to vary both between and within individual users. Table 5 shows the estimated selection probabilities—the likelihood that a given ride for a given respondent was recorded by an app—based on the participant’s response to questions about frequency of app use and the ride’s purpose. A total of 34.6% of the rides were estimated to have been captured by any app, 29.0% by fitness apps, 8.2% by planning apps, 10.4% by Map My Ride, 9.5% by Strava, and 8.1% by Ride Report. In comparison, at the person-level, 39% of cyclists reported use of any app, 33% of a fitness app, and 9% of a planning app (Table 4). Of all the estimated distance traveled by cyclists in the ride-level data, 47.2% was estimated to have been recorded by any app, 42.0% by a fitness app, 7.9% by a planning app, 18.1% by Map My Ride, 16.8% by Strava, and 7.7% by Ride Report.
Table 5.
All rides | Recorded with any app | Recorded with a fitness app | Recorded with a planning app | Recorded with Map My Ride | Recorded with Strava | Recorded with Ride Report | |
---|---|---|---|---|---|---|---|
Value (SP)1 | Value (SP)1 | Value (SP)1 | Value (SP)1 | Value (SP)1 | Value (SP)1 | ||
Number of rides2 | 34,121 | 5,932 ± 11 (34.6% ± 0.1%) | 4,966 ± 25 (29.0% ± 0.1%) | 1,409 ± 12 (8.2% ± 0.1%) | 1,785 ± 25 (10.4% ± 0.1%) | 1,627 ± 27 (9.5% ± 0.2%) | 1,382 ± 12 (8.1% ± 0.1%) |
Distance ridden, overall (km)2 | 456,473 | 108,262 ± 333 (47.2% ± 0.1%) | 96,212 ± 378 (42.0% ± 0.2%) | 18,081 ± 178 (7.9% ± 0.1%) | 41,534 ± 508 (18.1% ± 0.2%) | 38,506 ± 512 (16.8% ± 0.2%) | 17,723 ± 197 (7.7% ± 0.1%) |
Number of rides by purpose | |||||||
Utilitarian | 25,766 | 4,123 ± 16 (31.9% ± 0.1%) | 3,247 ± 17 (25.1% ± 0.1%) | 1,211 ± 10 (9.4% ± 0.1%) | 1,085 ± 19 (8.4% ± 0.1%) | 617 ± 13 (4.8% ± 0.1%) | 1,190 ± 4 (9.2% ± 0.0%) |
Leisure | 8,355 | 1,808 ± 14 (43.0% ± 0.3%) | 1,719 ± 20 (40.9% ± 0.5%) | 198 ± 6 (4.7% ± 0.1%) | 700 ± 13 (16.6% ± 0.3%) | 1,010 ± 21 (24.0% ± 0.5%) | 192 ± 11 (4.6% ± 0.3%) |
Distance by ride purpose (km) | |||||||
Utilitarian | 282,409 | 64,537 ± 313 (43.0% ± 0.2%) | 53,692 ± 300 (35.8% ± 0.2%) | 15,133 ± 121 (10.1% ± 0.1%) | 19,451 ± 375 (13.0% ± 0.3%) | 11,791 ± 215 (7.9% ± 0.1%) | 14,829 ± 55 (9.9% ± 0.0%) |
Leisure | 157,451 | 43,725 ± 342 (55.2% ± 0.4%) | 42,521 ± 478 (53 .7% ± 0.6%) | 2,948 ± 106 (3.7% ± 0.1%) | 22,083 ± 385 (27.9% ± 0.5%) | 26,714 ± 423 (33.7% ± 0.5%) | 2,894 ± 172 (3.7% ± 0.2%) |
Distance by infrastructure type (km) | |||||||
Hard-surface multi-use trail | 103,557 | 22,510 ± 113 (43.3% ± 0.2%) | 20,510 ± 187 (39.4% ± 0.4%) | 3,501 ± 51 (6.7% ± 0.1%) | 8,963 ± 127 (17.2% ± 0.2%) | 8,422 ± 156 (16.2% ± 0.3%) | 3,412 ± 83 (6.6% ± 0.2%) |
Protected bike lane | 13,089 | 1,812 ± 21 (27.6% ± 0.3%) | 1,368 ± 24 (20.8% ± 0.4%) | 483 ± 4 (7.3% ± 0.1%) | 325 ± 13 (5.0% ± 0.2%) | 763 ± 18 (11.6% ± 0.3%) | 470 ± 6 (7.2% ± 0.1%) |
Buffered bike lane | 12,479 | 1,965 ± 12 (31.4% ± 0.2%) | 1,888 ± 9 (30.1% ± 0.1%) | 394 ± 5 (6.3% ± 0.1%) | 528 ± 8 (8.4% ± 0.1%) | 885 ± 12 (14.1% ± 0.2%) | 389 ± 4 (6.2% ± 0.1%) |
Conventional bike lane | 71,253 | 18,051 ± 117 (50.5% ± 0.3%) | 14,840 ± 85 (41.5% ± 0.2%) | 4,356 ± 48 (12.2% ± 0.1%) | 6,962 ± 123 (19.5% ± 0.3%) | 5,178 ± 82 (14.5% ± 0.2%) | 4,355 ± 16 (12.2% ± 0.0%) |
Shared-travel lane | 55,959 | 11,179 ± 37 (39.8% ± 0.1%) | 9,794 ± 42 (34.9% ± 0.1%) | 3,261 ± 35 (11.6% ± 0.1%) | 3,494 ± 53 (12.4% ± 0.2%) | 2,778 ± 50 (9.9% ± 0.2%) | 3,204 ± 38 (11.4% ± 0.1%) |
SP, selection probability, where the value for all rides (leftmost column) is the denominator.
App use was set probabilistically according to the participants’ reported app-use patterns and the ride’s reported purpose (Supplement 1, Table S2). Values represent the mean ± standard deviation of ten simulations.
Utilitarian rides that were commutes were considered to occur a maximum of once per day, so a daily commute of one out trip and one return was counted as one ride. Its distance was calculated including both the out and return trip.
For each app type, the estimated selection probabilities differed by ride purpose. For example, an estimated 40.9% of leisure rides but 25.1% of commutes were recorded by a fitness app, whereas an estimated 4.7% of leisure rides but 9.4% of utilitarian rides were recorded by a planning app. The selection probabilities also varied by type of bike infrastructure. For example, of all the distance ridden on hard-surface multi-use trails, 39.4% was estimated to have been recorded in a fitness-oriented app, while 20.8% of distance ridden on protected bike lanes was estimated to have been recorded in a fitness-oriented app.
3.6. Illustration of use of infrastructure-specific selection probability to avoid selection bias
Finally, we consider a hypothetical scenario in which the infrastructure-specific selection probabilities can inform research. Suppose researchers are interested in assessing the relative rate of a bike crash per distance ridden on protected bike lanes compared with conventional bike lanes in City A and that the researchers use one year of data from Strava to estimate the distance ridden on these lanes, e.g. using values from Table 5. (Of the three most commonly used apps in our study, we choose Strava for the illustration because it is frequently used for research, e.g., 4–6,8,26.) The researchers also acquire one year of data with geo-located bike crashes from a registry maintained by the local Department of Transportation, allowing the crashes to be assigned to infrastructure. To account for the fact that the denominator for the crashes ought to represent all cyclists in the area, not solely app users, the researchers weight the bike-distance from the app by the inverse of the overall expected proportion of ridership recorded by the app, i.e., weighting by the inverse of the selection probability.25 Suppose they use data from the present study for this calculation, which estimates, overall, that 16.8% of distance ridden was recorded in Strava (Table 5). Under this scenario, the ratio comparing the rate of crashes on a conventional bike lane with that of a protected bike lane is 1.01 (Table 6).
Table 6.
Conventional bike lane | Protected bike lane | |
---|---|---|
Crashes | 55 | 8 |
Bike-distance ridden (km) | 5,178 *(1/0.168) = 30,821 | 763*(1/0.168) = 4,542 |
Crash rate per bike-km ridden | 0.00178 | 0.00176 |
Rate ratio | 0.00178/0.00176 = 1.01 |
Although it may be plausible that protected bike lanes are equally or more dangerous than conventional bike lanes, the null result is surprising to the researchers, who expected protected bike lanes to have a lower rate of crashes.27,28 The researchers posit that the null association may be due to selection bias, or a distortion in the measure of association due to the way in which the sample was selected. Accordingly, the researchers incorporate infrastructure-specific selection probabilities of bike distance (from Table 5) rather than the overall selection probability and re-tabulate the results (Table 7).
Table 7.
Conventional bike lane | Protected bike lane | |
---|---|---|
Crashes | 55 | 8 |
Bike-distance ridden (km) | 5,178*(1/0.145) = 35,710 | 763*(1/0.116) = 6,578 |
Crash rate per bike-km ridden | 0.00154 | 0.00122 |
Rate ratio | 0.00154/0.0012 = 1.27 |
In the revised calculation, the researchers estimate that the rate of crash per bike-kilometer ridden is 1.27 times as high on conventional bike lanes as that on protected bike lanes, a result more consistent with the researchers’ hypothesis. The rate ratio changed because the differential use of infrastructure in the app data changed the denominator of each rate. Assuming the results from Table 7 reflect the hypothetical truth, the results in Table 6 are biased down and towards the null.
4. Discussion
In this sample of cyclists from Atlanta, Georgia, USA, cyclists who used GPS-based smartphone apps to record their bike rides were, on the whole, similar to those who did not by socio-demographic characteristics but differed in several bike-riding characteristics. App users rode more frequently and were more likely to self-describe as stronger and more fearless riders. In addition, by analyzing the reported routes of participating cyclists, we estimated that ridership captured by an app included a smaller proportion on some types of bike infrastructure, including protected bike lanes, compared with the overall proportion of ridership captured. As illustrated in a sample calculation, such differences, if ignored, may give rise to selection bias in cycling-infrastructure research relying on smartphone-generated data. The method applied in the sample calculation, inverse-probability-of-selection weighting24,25 with selection-probability estimates specified for each stratum of exposure, may be used to adjust for this bias.
4.1. Representativeness
4.1.1. Socio-demographics
As smartphone-generated data are increasingly used for surveillance in planning and public health, understanding their representativeness across socio-demographic factors is important. The socio-demographic distribution of the app users differed slightly by whether they reported using fitness or planning apps. Results from this study agreed with previous research suggesting users of fitness-oriented apps, such as Strava, tend to be majority male (77% in this study),4,5,8,10 although the proportion male was similar to that of non-app-using cyclists in this study (72% male), cycling commuters in the city’s county in a 2011 travel survey (79% male),20 and the U.S., generally.29 In contrast with previous literature, which has shown that fitness-oriented app users tend to over-represent the 25-44 age range,4,5,8,10 fitness-oriented app users were similar in age to both non users in this study and to cyclists in the agency travel survey. Users of planning-oriented apps were also similar in age to the non-user groups, but in contrast with previous research, were majority female.7 Understanding the distribution of age and gender of app-using cyclists is important as data from app-generated data sources are considered for physical-activity surveillance.30
To our knowledge, this is the first study to assess the racial and socioeconomic distribution of users of fitness-oriented cycling apps. Users of fitness apps had a similar racial distribution to non-users and were more diverse than commuters in the 2011 travel survey, although individuals identifying as Hispanic were not represented among the fitness-oriented users. This result is somewhat reassuring for future public-health surveillance efforts which may use smartphone data to monitor racial disparities in physical activity.31 On the other hand, compared with previous research which showed that Hispanics were well represented among planning-app users,7 this study’s results suggest app data may not be sufficient to understand the patterns of Hispanic cyclists, a racial group with a disparately high prevalence of physical-activity-modifiable chronic diseases,32 who, in the U.S., is expected to cycle for utilitarian purposes at a level similar to other racial groups.33
With respect to SES, users of both app types had a similar if slightly higher household income than non-users in this study and cycling commuters in the local travel survey. This result suggests that app-generated sources of cycling data may serve as a fair but not completely adequate source with which to estimate cycling levels in lower-income populations and is consistent with results reporting individuals of lower SES use wearable activity monitors less frequently than their higher-SES counterparts.30
4.1.2. Bike-riding frequency, cyclist type, and ride purpose
Previous research has found, in general, a fairly high correlation between patterns in app-generated data sources and those from traditional counting methods, in particular in dense, urban areas.4,5,8–10,26 Due to the anonymization of the data in these studies, person-level cycling characteristics were not examined. In the present sample of cyclists, app users, particularly fitness-app users, rode more frequently and rode a greater weekly distance than their non-app-user counterparts. This result is consistent with studies finding higher physical-activity levels among users of wearable activity trackers.30,34 Surveillance relying on such datasets to estimate per-person cycling levels may thus be an overestimate. These results, taken together with the fact that fitness-app users were more likely to self-classify as strong and fearless, support the notion that users of fitness-apps such as Strava and MapMyRide tend to be experienced, frequent, and enthusiastic cyclists. Nonetheless, all groups of cyclists, including fitness-app users, were most likely to report a utilitarian ride as their most common, which serves as evidence against the perception that users of fitness-apps exclusively ride for recreation or exercise. Still, a sizeable share of app users (21%) reported only using an app to record their exercise rides and not their commutes. This tendency suggests that app-generated ride data may over-represent recreational rides even if the individuals contributing to the data frequently also ride for utilitarian purposes.
4.2. Addressing selection bias due to reliance on smartphone-generated data
A lack of representativeness does not necessarily give rise to selection bias in an etiologic sense if the lack of representativeness does not distort the association between the exposure and outcome of interest.25 In this study, we assessed patterns by infrastructure, possibly an exposure of interest. Some studies have suggested that trips recorded with an app are less likely to deviate to take improved bike infrastructure, such as multi-use trails or protected bike lanes.5,8 In an earlier study from Atlanta comparing ridership from a fitness app and a planning app, both groups showed preferences for off-street infrastructure with a stronger preference in the planning-app ridership.11 The present study found similar results, with some variation between preferences at the person and the ride level. App users were equally as likely to verbally report deviating for better bike infrastructure as non-users, and the proportion of estimated ridership on infrastructure was generally similar between individuals in each user group. However, at the ride level, fitness apps, especially Map My Ride, captured less of the estimated ridership on protected bike lanes than on multi-use trails. This distinction emerged partly because many respondents stated only recording leisure rides and not commutes, and leisure rides had a greater share of estimated distance on trails.
The application of the estimated infrastructure-specific selection probabilities to the hypothetical example of infrastructure on bike crashes illustrates how selection bias can manifest if selection probabilities specific to each level of the exposure are not considered. Although the empirical estimates of the selection probabilities by ride purpose and by infrastructure may serve as a useful starting point for other settings, the specific values may not be transportable. In general, app use was high in this sample (e.g., 18.1% of ridership was recorded in Map My Ride and 16.8% in Strava). Other studies have estimated that, for example, between 2% and 10% of ridership is captured in Strava.4,10,35,36 The high level of app use reported here may be because many of the participants were frequent cyclists (46% rode 5 or more times per week), and cyclists reporting riding more frequently also reported more app use per ride (Table 4). Still, although the absolute values may not be directly transportable, we conjecture that the relative rank of the ride-purpose and infrastructure-specific probabilities may be applicable to other settings, at least in low-cycling populations and in the absence of concerted app-promotion efforts. For example, if, in a given setting it can be determined that 5% of utilitarian rides are recorded in Strava, results from this study may serve as prior evidence that the proportion of leisure rides recorded in Strava is higher than 5%.
While the empirical results may not necessarily generalize to Atlanta or elsewhere, the framework we present ought to be broadly applicable, assuming selection probabilities can be estimated. If selection probabilities are inestimable, sensitivity analyses may be conducted, asking the question, how different would the selection probabilities have to be across the levels of the exposure of interest to meaningfully change the effect estimate?37,38
On a related note, if an app were to capture the outcome of interest, e.g., by crowdsourcing the reporting of bike crashes,39 and the distribution of exposure were captured through another means, perhaps a separate app, then selection bias may manifest similarly with the outcome variable and could be adjusted using the same basic method (inverse-probability-of-selection weighting).25 A similar approach has been described when the goal is generalizeability or transportability of results rather than internal validity.40,41
Although bias analyses are often appropriate, there may be situations in the bike-app context in which they are less necessary. For example, in the Netherlands, it was reported that over 50,000 cyclists used an app called Bike PRINT during the country’s National Bike Counting Week.42 The high use was attributed to a successful mass-media marketing campaign during the event by the government and other stakeholders. If the app is promoted widely, and all sectors of the bike-riding population have roughly equal access to it, then adjustment via stratum-specific selection probabilities may not be necessary. The key question to ask is whether the proportion of ridership captured in the app is expected to differ across levels of the exposure of interest. The answer may be yes even in high-cycling populations with high app use.
4.3. Strengths
A strength of this study is its venue-based-survey methodology, which allowed us to characterize the types of people who use smartphone apps, resulting data from which can serve to augment the existing studies which used counts of ridership to assess the validity of data from app-generated sources.4,5,8–11 From these surveyed cyclists, we gathered information about complete routes, which is typically obscured in the data-delivery process of app-generated data to protect user privacy and may thus only be available through primary data collection. In addition, we drew a large portion of our sample from open-streets events whose attendees, were, if not entirely representative of Atlanta, quite diverse.12
4.4. Limitations
The empirical results ought to be considered with the study’s limitations in mind. First, although we endeavored to sample cyclists at a broad range of places, neighborhoods, and times, the sampling method was subject to bias in at least three ways. First, places with higher bike traffic were over sampled; due to time constraints, we did not always stay for long in places with no evident bike traffic. Thus, cyclists who visited places less traveled were less likely to be sampled. Second, surveyed cyclists were more likely to be frequent cyclists, as the data evince, because they had to be out riding their bike to be approached for the survey. Third, the venue-based sampling method precluded us from approaching cyclists who did not stop along their ride, possibly under-representing cyclists who rode only for leisure.
Sampling people by postal mail or with the internet may have partly avoided these biases, but we decided against those methods due to an expected low response rate. The third limitation may have been avoided by intercepting cyclists along their route. We chose not to survey this way, as others have,43,44 because we expected a lower response rate and having to shorten our questionnaire. Moreover, the method of intercepting cyclists would ideally be informed by expected bike-traffic data with which to plan survey time.45 We did not have such data, so that method may have been subject to the same frequent-cyclist bias as was our venue-based approach: effectively sampling time rather than people. Despite potential biases in the sampling method, the resulting sample was similar across many demographic characteristics to cycling commuters from a local travel survey,20 suggesting the results may cautiously be considered representative of frequent, relatively higher-SES cyclists in the Atlanta area.
Another limitation is the small sample size, which gives rise to the chance that the conclusions are subject to random error, particularly in subgroups. Similarly, protected bike lanes were not very common in the Atlanta area in late 2016, so reported patterns of use of this infrastructure type ought to be considered with caution. Nonetheless, app users and non-users lived a similar distance from each type of infrastructure (Table 3), so the low amount of protected bike lanes is unlikely to have altered conclusions. Finally, in simulating the ride-level data, we made a simplifying assumption that the reported ride-frequency and app-use patterns would remain constant over one year, which we expect is reasonable in aggregate given respondents reported their ‘typical’ rides.
5. Conclusions and implications for public health and policy
Data from smartphone apps are increasingly used for monitoring and evaluation of cycling, which brings about two chief concerns this study sought to address. First, on the issue of representativeness, results from this Atlanta sample suggested that app users were similarly diverse by race and had slightly higher SES compared with non-users, although Hispanic individuals were not represented among app users. Second, on the issue of selection bias in etiologic bike-infrastructure research, the varying infrastructure preferences between ridership estimated to have been captured and not captured by an app may give rise to bias of considerable magnitude. The method presented to adjust for this bias, inverse-probability-of-selection weighting,24–25 may be useful for future etiologic bicycling research using smartphone-generated data. To facilitate the use of this method, we encourage future studies relating app-generated ridership with stationary counters to report the overall selection probability and, if possible, selection probabilities across levels of exposures of interest, such as types of infrastructure.
Supplementary Material
Highlights.
App users were similar to non-users by gender and race and had slightly higher income.
App users were more frequent cyclists with a greater share of rides for exercise or leisure.
The estimated proportion of ridership captured by an app differed by some types of infrastructure.
An example calculation shows how these differences may induce selection bias in etiologic research.
Inverse-probability-of-selection weighting is illustrated as a method to adjust for selection bias.
Acknowledgments
We acknowledge Simon Berrebi, Alice Grossman, David Ederer, Rohit Ammanamanchi, Supriya Sarkar, and Sharanya Thummalapally for assistance with data collection.
Funding
Financial Disclosure
MG acknowledges funding support from Emory University Laney Graduate School and the National Heart, Lung, and Blood Institute (F31HL143900) and KW from the U.S. Department of Transportation University Transportation Center Southeastern Transportation Research, Innovation, Development, and Education (STRIDE) Center. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or the U.S. Department of Transportation.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Data Statement
Results from the survey are confidential. Selected R code has been made available at a public URL. Please see the Methods section.
The measured length for each type of infrastructure corresponds to that which is inside of I-285.
References
- 1.Kelly P, Kahlmeier S, Götschi T, et al. Systematic review and meta-analysis of reduction in all-cause mortality from walking and cycling and shape of dose response relationship. Int J Behav Nutr Phys Act. 2014;11(1):132. doi: 10.1186/s12966-014-0132-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Götschi T, Garrard J, Giles-Corti B. Cycling as a Part of Daily Life: A Review of Health Perspectives. Transp Rev. 2016;36(1):45–71. doi: 10.1080/01441647.2015.1057877 [DOI] [Google Scholar]
- 3.Romanillos G, Zaltz Austwick M, Ettema D, De Kruijf J. Big Data and Cycling. Transp Rev. 2016;36(1):114–133. doi: 10.1080/01441647.2015.1084067 [DOI] [Google Scholar]
- 4.Jestico B, Nelson T, Winters M. Mapping ridership using crowdsourced cycling data. J Transp Geogr. 2016;52:90–97. doi: 10.1016/j.jtrangeo.2016.03.006 [DOI] [Google Scholar]
- 5.Griffin GP, Jiao J. Where does bicycling for health happen? Analysing volunteered geographic information through place and plexus. J Transp Heal. 2015;2(2):238–247. doi: 10.1016/j.jth.2014.12.001 [DOI] [Google Scholar]
- 6.Heesch KKC, Langdon M The usefulness of GPS bicycle tracking data for evaluating the impact of infrastructure change on cycling behaviour. Heal Promot J Aust. 2016;227(3):222–229. doi: 10.1071/HE16032 [DOI] [PubMed] [Google Scholar]
- 7.Blanc B, Figliozzi M, Clifton K. How Representative of Bicycling Populations Are Smartphone Application Surveys of Travel Behavior? Transp Res Rec J Transp Res Board. 2016;2587:78–89. doi: 10.3141/2587-10 [DOI] [Google Scholar]
- 8.Boss D, Nelson T, Winters M, Ferster CJ. Using crowdsourced data to monitor change in spatial patterns of bicycle ridership. J Transp Heal. 2018;9:226–233. doi: 10.1016/j.jth.2018.02.008 [DOI] [Google Scholar]
- 9.Lieske SN, Leao S, Conrow L, Pettit CJ. Validating Mobile Phone Generated Bicycle Route Data in Support of Active Transportation. SOAC 2017 – State Aust Cities Conf. 2017;(November). [Google Scholar]
- 10.Conrow L, Wentz E, Nelson T, Pettit C. Comparing spatial patterns of crowdsourced and conventional bicycling datasets. Appl Geogr. 2018;92:21–30. doi: 10.1016/j.apgeog.2018.01.009 [DOI] [Google Scholar]
- 11.Watkins K, Ammanamanchi R, LaMondia J, Le Dantec CA. Comparison of Smartphone-based Cyclist GPS Data Sources. Transp Res Board 95th Annu Meet 2016;5(16-5309). https://trid.trb.org/view.aspx?id=1393960. Accessed August 30, 2017. [Google Scholar]
- 12.Torres A, Steward J, Lyn R, Stauber C, Serna R, Strasser S. Atlanta Streets Alive: A Movement Building a Culture of Health in an Urban Environment. J Phys Act Heal. 2015;13(2):239–246. doi: 10.1123/jpah.2015-0064 [DOI] [PubMed] [Google Scholar]
- 13.Sarmiento OL, Díaz del Castillo A, Acevedo MJ, Triana CA, Gonzalez SA, Pratt M. Reclaiming the streets for people: Insights from Ciclovias Recreativas in Latin America. Prev Med (Baltim). 2016;103:S34–S40. doi: 10.1016/j.ypmed.2016.07.028 [DOI] [PubMed] [Google Scholar]
- 14.Torres AD, Sarmiento OL, Stierling G, Enrique J, Pratt M, Schmid T. Recreational Ciclovias: An Urban Planning & Public Health Program Of The Americas With A Latin Flavor. Med Sci Sport Exerc. 2009;41:47. doi: 10.1249/01.mss.0000353411.00986.fd [DOI] [Google Scholar]
- 15.Cohen D, Han B, Derose KP, Williamson S, Paley A, Batteate C. CicLAvia: Evaluation of participation, physical activity and cost of an open streets event in Los Angeles. Prev Med (Baltim). 2016;90:26–33. doi: 10.1016/j.ypmed.2016.06.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Hipp JA, Bird A, van Bakergem M, Yarnall E. Moving targets: Promoting physical activity in public spaces via open streets in the US. Prev Med (Baltim). 2017;103:S15–S20. doi: 10.1016/j.ypmed.2016.10.014 [DOI] [PubMed] [Google Scholar]
- 17.Metro Atlanta Bicycle Facility Inventory 2014. | ARC Open Data & Mapping Hub. Atlanta Regional Commission; http://opendata.atlantaregional.com/datasets/metro-atlanta-bicycle-facility-inventory-2014?geometry=−84.56%2C33.721%2C-84.131%2C33.821. Accessed October 18, 2018. [Google Scholar]
- 18.Pebesma E Simple Features for R [R package sf version 0.6-0]. https://cran.r-project.org/web/packages/sf/index.html. Accessed March 19, 2018.
- 19.Dill J, McNeil N. Four Types of Cyclists? Examination of Typology for Better Understanding of Bicycling Behavior and Potential. Transp Res Rec J Transp Res Board. 2013;2387(2387):pp 129–138. doi: 10.3141/2387-15 [DOI] [Google Scholar]
- 20.Atlanta Regional Commission. Atlanta Regional Travel Survey Final Report.; 2011. https://cdn.atlantaregional.org/wp-content/uploads/tp-2011regionaltravelsurvey-030712.pdf.
- 21.Kahle D, Wickham H. ggmap : Spatial Visualization with ggplot2. R J. 2013;5(1):144–161. doi: 10.1023/A:1009843930701 [DOI] [Google Scholar]
- 22.Greenland S, Senn SJ, Rothman KJ, et al. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol. 2016;31(4):337–350. doi: 10.1007/s10654-016-0149-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Amrhein V, Greenland S, McShane B. Scientists rise up against statistical significance. Nature. 2019;567(7748):305–307. doi: 10.1038/d41586-019-00857-9 [DOI] [PubMed] [Google Scholar]
- 24.Hernán MA, Robins JM. Selection Bias In: Causal Inference. Boca Raton: Chapman & Hall/CRC, forthcoming; 2017. https://cdn1.sph.harvard.edu/wp-content/uploads/sites/1268/2017/10/hernanrobins_v1.10.33.pdf. [Google Scholar]
- 25.Hernán MA, Hernández-Díaz S, Robins JM. A structural approach to selection bias. Epidemiology. 2004;15(5):615–625. doi: 10.1097/01.ede.0000135174.63482.43 [DOI] [PubMed] [Google Scholar]
- 26.Whitfield GP, Ussery EN, Riordan B, Wendel AM. Association Between User-Generated Commuting Data and Population-Representative Active Commuting Surveillance Data — Four Cities, 2014–2015. MMWR Morb Mortal Wkly Rep. 2016;65(36):959–962. doi: 10.15585/mmwr.mm6536a4 [DOI] [PubMed] [Google Scholar]
- 27.Lusk AC, Furth PG, Morency P, Miranda-Moreno LF, Willett WC, Dennerlein JT. Risk of injury for bicycling on cycle tracks versus in the street. Inj Prev. 2011; 17(2): 131–135. doi: 10.1136/ip.2010.028696 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Strauss J, Miranda-Moreno LF, Morency P. Mapping cyclist activity and injury risk in a network combining smartphone GPS data and bicycle counts. Accid Anal Prev. 2015. doi: 10.1016/j.aap.2015.07.014 [DOI] [PubMed] [Google Scholar]
- 29.Aldred R, Woodcock J, Goodman A. Does More Cycling Mean More Diversity in Cycling? Transp Rev. 2016;36(1):28–44. doi: 10.1080/01441647.2015.1014451 [DOI] [Google Scholar]
- 30.Omura JD, Carlson SA, Paul P, Watson KB, Fulton JE. National physical activity surveillance: Users of wearable activity monitors as a potential data source. Prev Med Reports. 5:124–126. doi: 10.1016/j.pmedr.2016.10.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Brownson RC, Boehmer TK, Luke DA. DECLINING RATES OF PHYSICAL ACTIVITY IN THE UNITED STATES: What Are the Contributors? Annu Rev Public Health. 2005;26(1):421–443. doi: 10.1146/annurev.publhealth.26.021304.144437 [DOI] [PubMed] [Google Scholar]
- 32.Daviglus ML, Talavera GA, Avilés-Santa ML, et al. Prevalence of Major Cardiovascular Risk Factors and Cardiovascular Diseases Among Hispanic/Latino Individuals of Diverse Backgrounds in the United States. JAMA. 2012;308(17):1775. doi: 10.1001/jama.2012.14517 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Pucher J, Buehler R, Merom D, Bauman A. Walking and cycling in the United States, 2001-2009: Evidence from the National Household Travel Surveys. Am J Public Health. 2011;101(SUPPL. 1):310–317. doi: 10.2105/AJPH.2010.300067 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Evenson KR, Wen F, Furberg RD. Assessing Validity of the Fitbit Indicators for U.S. Public Health Surveillance. Am J Prev Med. 2017;53(6):931–932. doi: 10.1016/j.amepre.2017.06.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.McCrorie PR, Fenton C, Ellaway A. Combining GPS, GIS, and accelerometry to explore the physical activity and environment relationship in children and young people - a review. Int J Behav Nutr Phys Act. 2014; 11 (1):93. doi: 10.1186/s12966-014-0093-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Heesch KC, James B, Washington TL, Zuniga K, Burke M. Evaluation of the Veloway 1: A natural experiment of new bicycle infrastructure in Brisbane, Australia. J Transp Heal. 2016;3(3):366–376. doi: 10.1016/j.jth.2016.06.006 [DOI] [Google Scholar]
- 37.Smith LH, VanderWeele TJ. Bounding bias due to selection. Epidemiology. 2019: 30(4), 509–516. 10.1097/EDE.0000000000001032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Lash TL, Fox MP, Fink AK. Selection Bias In: Lash TL, Fox MP, Fink AK, eds. Applying Quantitative Bias Analysis to Epidemiologic Data. Statistics for Biology and Health. New York, NY: Springer New York; 2009. doi: 10.1007/978-0-387-87959-8 [DOI] [Google Scholar]
- 39.Nelson TA, Denouden T, Jestico B, Laberee K, Winters M. BikeMaps.org: A Global Tool for Collision and Near Miss Mapping. Front Public Heal. 2015;3(March): 1 −8. doi: 10.3389/fpubh.2015.00053 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Westreich D, Edwards JK, Lesko CR, Stuart E, Cole SR. Transportability of Trial Results Using Inverse Odds of Sampling Weights. Am J Epidemiol. 2017;186(8):1010–1014. doi: 10.1093/aje/kwx164 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Lesko CR, Buchanan AL, Westreich D, Edwards JK, Hudgens MG, Cole SR. Generalizing study results: a potential outcomes perspective. Epidemiology. 2017;28(4):29–39. doi: 10.1016/j.artmed.2015.09.007.Information [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Gathering data on national cycling patterns in the Netherlands | Eltis. http://www.eltis.org/discover/case-studies/gathering-data-national-cycling-patterns-netherlands. Accessed March 28, 2019.
- 43.Rose G Combining Intercept Surveys and Self-Completion Questionnaire to Understand Cyclist Use of Off-Road Paths. 2007. https://trid.trb.org/view/801590. Accessed March 28, 2019.
- 44.Cohn J, Hadden Loh T, Götschi T. Development of a Survey Tool to Quantify Health Impacts of Trail Use. J Park Recreat Admi. 2016;34(3). doi: 10.18666/jpra-2016-v34-i3 [DOI] [Google Scholar]
- 45.Götschi T, Hadden Loh T. Advancing project-scale health impact modeling for active transportation: A user survey and health impact calculation of 14 US trails. J Transp Heal. 2017;4:334–347. doi: 10.1016/j.jth.2017.01.005 [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.