Understanding the Predictors of Missing Location Data to Inform Smartphone Study Design: Observational Study

Anna L Beukenhorst; Jamie C Sergeant; David M Schultz; John McBeth; Belay B Yimer; Will G Dixon

doi:10.2196/28857

. 2021 Nov 16;9(11):e28857. doi: 10.2196/28857

Understanding the Predictors of Missing Location Data to Inform Smartphone Study Design: Observational Study

Anna L Beukenhorst ^1,^2,^✉, Jamie C Sergeant ^1,³, David M Schultz ^4,⁵, John McBeth ¹, Belay B Yimer ¹, Will G Dixon ^1,⁶

Editor: Lorraine Buis

Reviewed by: Ian Barnett, Edmund Lee, Hamed Mehdizadeh

PMCID: PMC8663442 PMID: 34783661

Abstract

Background

Smartphone location data can be used for observational health studies (to determine participant exposure or behavior) or to deliver a location-based health intervention. However, missing location data are more common when using smartphones compared to when using research-grade location trackers. Missing location data can affect study validity and intervention safety.

Objective

The objective of this study was to investigate the distribution of missing location data and its predictors to inform design, analysis, and interpretation of future smartphone (observational and interventional) studies.

Methods

We analyzed hourly smartphone location data collected from 9665 research participants on 488,400 participant days in a national smartphone study investigating the association between weather conditions and chronic pain in the United Kingdom. We used a generalized mixed-effects linear model with logistic regression to identify whether a successfully recorded geolocation was associated with the time of day, participants’ time in study, operating system, time since previous survey completion, participant age, sex, and weather sensitivity.

Results

For most participants, the app collected a median of 2 out of a maximum of 24 locations (1760/9665, 18.2% of participants), no location data (1664/9665, 17.2%), or complete location data (1575/9665, 16.3%). The median locations per day differed by the operating system: participants with an Android phone most often had complete data (a median of 24/24 locations) whereas iPhone users most often had a median of 2 out of 24 locations. The odds of a successfully recorded location for Android phones were 22.91 times higher than those for iPhones (95% CI 19.53-26.87). The odds of a successfully recorded location were lower during weekends (odds ratio [OR] 0.94, 95% CI 0.94-0.95) and nights (OR 0.37, 95% CI 0.37-0.38), if time in study was longer (OR 0.99 per additional day in study, 95% CI 0.99-1.00), and if a participant had not used the app recently (OR 0.96 per additional day since last survey entry, 95% CI 0.96-0.96). Participant age and sex did not predict missing location data.

Conclusions

The predictors of missing location data reported in our study could inform app settings and user instructions for future smartphone (observational and interventional) studies. These predictors have implications for analysis methods to deal with missing location data, such as imputation of missing values or case-only analysis. Health studies using smartphones for data collection should assess context-specific consequences of high missing data, especially among iPhone users, during the night and for disengaged participants.

Keywords: geolocation, global positioning system, smartphones, mobile phone, mobile health, environmental exposures, data analysis, digital epidemiology, missing data, location data, mobile application

Introduction

Smartphones offer opportunities to collect sensor data frequently from people’s daily lives and to determine their exposures or behaviors. Smartphone location data can be collected frequently (eg, daily, hourly, continuously) over sustained periods of time [1]. Studies have used these data to quantify exposure to weather [2,3], air pollution [4], vicinity to tobacco outlets [5], or to deliver context-aware messages when participants visited health facilities [6,7]. Smartphones can provide complete and accurate location data, especially when participants are provided with study smartphones, studies are short, and data are collected nearly continuously [8,9]. However, in large-scale epidemiological studies, location data are often collected for longer periods, less frequently, and from participants’ own smartphones. In these cases, missing data are more common than when using research-grade location trackers [4,10,11]. In observational research studies, missing data can result in the loss of power, selection bias, and misclassification of participants’ exposure or behavior [12]. In trials, it could hamper safe and effective delivery of context-aware interventions that rely on location data [13].

To anticipate the potential impact of missing location data on study findings, we need to better understand how often, when, and why location data are missing. Previous smartphone studies have reported the amount of missing location data [4,10,14,15]. However, they typically did not investigate differences in missing data over time [4,10,14,15], between participants [4,10,14,15], or between operating systems [4,14]. In addition, they have limitations of small sample sizes.

We therefore investigated the distribution of missing location data over time, predictors of missing location data, and between-participant differences. We used data from a longitudinal smartphone study with 9665 participants using Android phones or iPhones. We anticipate that understanding the predictors of missing location data could inform researchers who want to improve data completeness during study design and data collection.

Methods

Ethics Approval and Consent to Participate

The University of Manchester Research Ethics Committee (reference, ethics/15522) and the National Health Service Integrated Research Application System (reference 23/NW/0716) approved this study. Participants were required to provide electronic consent for study inclusion. Further details are available elsewhere [2,3].

Study Design

We performed a secondary analysis of the data from an observational smartphone study that analyzed the association between weather conditions and chronic pain in the United Kingdom (study name: Cloudy with a Chance of Pain) [3]. In this study, we collected self-reported pain levels from a large group of people with chronic pain such as osteoarthritis, rheumatoid arthritis, or migraine. The exposure of interest was daily average weather conditions (ie, temperature, relative humidity, wind speed, and air pressure). To determine what daily average weather conditions participants were exposed to, the app recorded participants’ geolocation, which we could link to weather reports from local weather stations. The analysis of the weather and pain association and the details of data collection are described elsewhere [2,3].

Data Collection

People with chronic pain downloaded the app onto their Android phones or iPhones, provided informed consent, and reported baseline participant characteristics (eg, sex, year of birth, self-reported weather sensitivity). At local time of 6:24 PM each day, participants received a push notification to complete a survey, rating 10 aspects of symptoms, behavior, and well-being. To obtain weather data from the closest weather station, geolocation was required. The app was programmed to record geolocation each hour on the hour; thus, the app would ideally obtain 24 geolocations each day. The app used GPS (outdoors) and network signals (inside buildings) to determine the latitude and longitude. The app’s ability to record geolocations depended on (1) the participant granting the app access to their geolocation and (2) the participant switching on the location services on their phone. Upon downloading the study app, the participants were requested access to their geolocation. Access to geolocation was voluntary; participants who provided the app with access to their geolocation could retract access at any time or switch off location services temporarily or permanently, in which case the app would not be able to record the participant’s location. The app recorded the operating system of the smartphone, but this feature was introduced 1 week after the recruitment launch and was not collected for early enrollers.

Data Preparation and Eligible Participants

We investigated location-data completeness on calendar days that a participant completed the survey. Participants were eligible if they completed the survey at least once, excluding the day of enrollment. This exclusion ensured comparability of participant days, as recording 24 geolocations would be unlikely on the day of download. For each participant, we selected all days with survey data. For each full clock hour, we added indicators for (1) location data (1 if observed, 0 if missing), (2) number of days since the most recent survey completion (0 if less than 24 hours ago, 1 if 24-47 hours ago, etc), (3) time in study (days since first survey submission), and (4) time (weekday or weekend, part of the day where night was considered as midnight to 5:59 AM, morning as 6 AM to 11:59 AM, afternoon as noon to 5:59 PM, evening as 6 PM to 23:59 PM, and hour of the day). In addition, we added indicators for variables that did not change over time: (1) participant characteristics (eg, sex, age, self-reported weather sensitivity) and (2) operating system (eg, iPhone operating system, Android, or unknown).

Data Analysis

We reported the number of eligible participants and their characteristics. We reported location-data completeness (1) per day (number of recorded locations during a day), (2) per hour for each clock hour (percentage of participant days with a recorded location data at that hour), (3) per hour for the 4 hours before and after survey completion, and (4) averages per participant (median number of recorded locations) for all participants and stratified by operating system. We investigated predictors of the outcome “presence of a location data point” (0 if missing, 1 if observed for a given full clock hour) with a logistic regression model with a participant-specific random intercept for within-participant correlation between repeated measurements [16-18]. A multivariable model identified whether the likelihood of the missing location data were associated with time indicators (ie, weekdays vs weekend days, part of the day), participant characteristics (ie, age, sex, self-reported weather sensitivity dichotomized around the median), operating system on their phone, survey compliance (ie, days since previous survey entry), or time in study (ie, days since first survey entry). Only participants with complete data for all covariates contributed information to the model. We estimated 95% CIs with 1000 simulations as recommended in [19]. Models were fitted in R (R Core Team) version 3.6 with the package lme4 [18]; odds ratios (ORs) and CIs were estimated using the merTools package [20].

Results

The app was downloaded by 13,207 participants, of which 9665 were eligible for inclusion (mean age 49 [SD 13] years; females, 7211/9665, 74.6%). These participants contributed to 488,400 participant days (median 14 eligible days/participant; IQR 4-60 days/participant). Of 9665 participants, 3109 (32.2%) used an Android phone, 1930 (19.9%) an iPhone, and the operating system was unknown for the remaining 4626 (47.6%) participants. We expected 11.72 million location data points or clock hours: 24 for each hour in the 488,400 participant days. Of 11.72 million hours, the app collected only 4.36 million clock hours (37.2%), resulting in missing data for the remaining 7.36 million clock hours. Data completeness per participant day varied from no location data (0/24) to fully complete data (24/24, Figure 1A, median 3, IQR 1-19). Location data were complete (24/24) for 17.5% (85,606/488,400) of participant days. Participant days with no location (93,255/488,400, 19.1%), 1 location (67,963/488,400, 13.9%), or 2 locations (64,207/488,400, 13.1%) were also common. Location was most often recorded at 7 PM (232,295/488,400, 47.5% of participant days; Figure 1B) just after the default notification of 6:24 PM. Locations were least often recorded between midnight and 6 AM. Location data were often recorded for the hour before survey completion (281,767/487,391, 57.8%) and the hour after survey completion (257,743/436,263, 59.1%; Figure 1C).

Data completeness. A: Distribution of participant days with a recorded location, stratified per hour of the day (N=488,400). B: Data completeness per hour of the day (N=24 x 488,400). C: Data completeness around the moment of survey completion (N=24 x 488,400). The red X marks app usage, and 1st is the first full clock hour after data entry.

For most participants, the app collected a median of 2 out of a maximum of 24 locations (1760/9665, 18.2% of participants), no location data (1664/9665, 17.2%), or complete location data (1575/9665, 16.3%; Figure 2A). Stratification by phone operating system and participant characteristics showed that 31.0% (965/3109) of Android users had 24 recorded locations on average versus less than 1% (6/1930) of iPhone users (Figure 2B). Android users usually had averages of 24 (out of 24) (965/3109, 31.0%) or 0 (out of 24) locations (640/3109, 20.6%). iPhone users usually had averages of 2 (out of 24) (859/1930, 44.5%) or 1 (out of 24) (362/1930, 18.8%) location, while only 0.03% (6/1930) had averages of 24 (out of 24) locations.

Median locations per day per participant. A: All participants (N=9665). B: Stratified by operating system for 3109 Android users and 1930 iPhone users. iOS: iPhone operating system.

The generalized linear mixed-effects model estimated whether time indicators, operating system, time since previous survey completion, or participant characteristics predicted the presence of a location data point (N=4435). The presence of a location data point was strongly predicted by the operating system and the part of the day (Table 1). The odds of a recorded location were the highest for Android phones (OR 21.91, 95% CI 19.53-26.87, referent: iPhone operating system) and during the afternoon (OR 1.18, 95% CI 1.18-1.20, referent: morning). The odds of a recorded location were lower in the weekends (OR 0.94, 95% CI 0.94-0.95, referent: weekdays) and if previous survey completion was longer ago (OR 0.95 per additional day, 95% CI 0.95-0.95) and marginally lower if a participant’s time in study was longer (OR 0.998, 95% CI 0.9984-0.9985). Participant characteristics (eg, age, sex, self-reported weather sensitivity) did not predict the probability of location data.

Table 1.

Results of the generalized linear mixed-effects model estimating the odds of having a recorded location (N=4435).

Variable	category	Odds ratio (95% CI)
Day of the week
	Weekdays	Referent
	Weekend	0.94 (0.94-0.95)
Part of the day
	Morning	Referent
	Afternoon	1.19 (1.18-1.20)
	Evening	1.11 (1.10-1.11)
	Night	0.37 (0.37-0.38)
Time in study	Per day	0.99 (0.99-1.00)
Operating system
	iPhone operating system	Referent
	Android	22.91 (19.53-26.87)
Time since previous survey completion	Per day	0.96 (0.96-0.96)
Age	Per 10 years	1.00 (1.00-1.01)
Sex
	Female	Referent
	Male	0.99 (0.80-1.22)
Weather sensitivity
	Weak	Referent
	Strong	0.98 (0.84-1.15)

Open in a new tab

Discussion

Principal Findings

In our study, location data collected from participants’ smartphones were missing for 63% of the intended hours (7.36 million/11.72 million). This percentage is higher than that reported in 5 other studies, reporting 26% [4], 28% [14], and 50% [10,15,21] of missing data. This difference may be due to the choices during the analysis: 3 studies excluded participants with the highest amounts of missing data and only investigated Android users, possibly resulting in an underestimation of the overall percentage of missing location data [4,14,21]. The other 2 studies sampled location continuously multiple times per hour for a few minutes, suggesting that our findings may not generalize to higher frequencies of location data collection [10,15].

Why Do Time Indicators and Operating System Predict Location-Data Completeness?

Missing data were predicted by part of the day, time since previous survey completion, and participants’ operating system. Missing data at night might be caused by people being indoors where GPS signals are unavailable [11] or by their phones being switched off in airplane mode or out of battery [11,22]. Location data were most complete in the hour before and after survey completion, showing that apps are more likely to record the last known location upon restarting the app and the location on the clock hour after. In addition, we found a small but significant reduction in odds of a recorded location over time. Lower location-data completeness when people stay longer in a study is in line with the findings reported previously [22]. Less than 1% of iPhone users had complete location data. Other studies of smartphone data corroborate our finding of higher missing sensor data in iPhone users compared to Android users. iPhone’s operating system refuses geolocation requests by apps more often compared to Android. Reasons for refusing geolocation requests are, for example, to reduce the phone’s power consumption or to prioritize sensor data collection by other apps [10,15,23,24]. Of note, some studies have succeeded in obtaining higher coverage location data from iPhones compared to Android phones in spite of these operating system–specific differences [22,25]. This finding suggests that the research app used to collect data and the way this app interacts with the operating system may influence the amount of missing data. Experimental studies could further investigate this, as we cannot exclude the role of other differences between this study and our own study, such as the investigated population (eg, mean age 48 years in our study, but mean age 25 years in [22]) and sampling frequency (once an hour in our study; continuously for 1 minute every 10 minutes in [15,22]).

Implications: Consequences of Missing Data Are Context-Specific

Although missing location data reduce precision, they do not necessarily reduce a study’s validity. For example, missing data during the night may not be a problem for a study interested in identifying daytime behaviors from location data. In our study, we calculated daily average exposure to the weather based on the 24-hourly weather reports from participants’ location [3]. For days with missing data, we imputed participant location. As UK weather stations are approximately 40 km apart, missing information on small relocations would not result in assigning participants to the wrong weather station. Furthermore, misclassification would only occur if the weather conditions at the “wrong” weather station were sufficiently different to change a participant’s daily average exposure. Most previous studies investigating weather and pain measured participants’ location only once and used daily weather reports, rather than hourly [26]. Compared to those studies, weather exposure in our study is less likely to be misclassified, even for participants with only 1 or 2 observed locations per day.

Participant age and sex did not predict missing location data, suggesting that data completeness is not associated with those 2 demographic factors. However, the difference in location-data completeness between iPhone and Android users could be a source of bias. Just-in-time interventions that depend on location data could be less safe and effective for iPhone users compared to Android users. On average, Android users have a lower socioeconomic status than iPhone users—a factor that is related to many health outcomes and may be associated with health disparities in underprivileged groups [27-29]. In observational studies, this difference could introduce selection bias. For example, exclusion of participants with incomplete data (complete case analysis) could lead to results that do not generalize to wealthier iPhone users.

Observational studies could impute missing location data based on participants’ past behavior [30,31]. In that case, it is important to assess whether the imputation algorithm is also valid for iPhone users who may have fewer past data points available. If imputation is not feasible, researchers may want to consider using different devices to collect location data, such as a GPS tracker, which may be more suitable to answer certain research questions requiring complete location data for short periods of time [4,9]. Of note, although the imputation would mitigate some threats to internal validity due to selection bias, they do not address external validity. Study results may still not be generalizable to the wider population, especially not to underserved communities that tend to use health technologies less and may have fewer financial resources to purchase smartphones and pay for connection maintenance [29].

Improving Location-Data Completeness

At study design, researchers should optimize app settings and user instructions to improve location-data completeness. Our study showed that location was more often recorded around survey completion and around push notifications. Thus, encouraging participants to complete surveys and sending push notifications may improve location-data completeness as well as survey responses. As Android phone users have higher location-data completeness than iPhone users, restricting participation to Android users could improve location-data completeness. However, it could introduce important limitations to generalizability, given that many people have iPhones (market share 27% worldwide [32] and 54% in the United States [33]).

Conclusion

Missing hourly smartphone location data is common: in our study, 63% of hourly data points were missing. Missing data were more likely for iPhone users, during the night, on weekend days, and if participants had not recently used the app to complete a survey. Participant age and sex did not predict missing location data. Differences in location-data completeness between iPhone and Android users may impact the validity of observational or interventional studies. The predictors of missing data can help researchers at study design to optimize app settings and user instructions for higher location-data completeness. In addition, it may inform their assessment of context-specific consequences of missing location data.

Acknowledgments

We thank Sabine van der Veer for providing feedback on this manuscript. ALB is supported by a Medical Research Council Doctoral Training Partnership grant (MR/N013751/1). Cloudy with a Chance of Pain was supported by Versus Arthritis (grant reference 21225). This research was further supported by the Centre for Epidemiology Versus Arthritis (grant 21755) and the National Institute for Health Research Manchester Biomedical Research Centre.

Abbreviations

OR: odds ratio

Footnotes

Conflicts of Interest: WGD has received consultancy fees from AbbVie and Google, which is unrelated to this work.

References

1.Amor J, James C. Setting the scene: Mobile and wearable technology for managing healthcare and wellbeing. 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC); 25-29 Aug; Milan, Italy. 2015. pp. 7752–5. [DOI] [PubMed] [Google Scholar]
2.Schultz DM, Beukenhorst AL, Yimer BB, Cook L, Pisaniello H, House T, Gamble C, Sergeant JC, McBeth J, Dixon WG. Weather Patterns Associated With Pain In Chronic-Pain Sufferers. Bull Am Meteorol Soc 2020. 2020 May 01;101(5):E555–E566. doi: 10.1175/BAMS-D-19-0265.1. [DOI] [Google Scholar]
3.Dixon WG, Beukenhorst AL, Yimer BB, Cook L, Gasparrini A, El-Hay T, Hellman B, James B, Vicedo-Cabrera AM, Maclure M, Silva R, Ainsworth J, Pisaniello HL, House T, Lunt M, Gamble C, Sanders C, Schultz DM, Sergeant JC, McBeth J. How the weather affects the pain of citizen scientists using a smartphone app. NPJ Digit Med. 2019;2:105. doi: 10.1038/s41746-019-0180-3. doi: 10.1038/s41746-019-0180-3.180 [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Glasgow ML, Rudra CB, Yoo E, Demirbas M, Merriman J, Nayak P, Crabtree-Ide C, Szpiro AA, Rudra A, Wactawski-Wende J, Mu L. Using smartphones to collect time-activity data for long-term personal-level air pollution exposure assessment. J Expo Sci Environ Epidemiol. 2016 Jun;26(4):356–64. doi: 10.1038/jes.2014.78.jes201478 [DOI] [PubMed] [Google Scholar]
5.Lee EW, Bekalu MA, McCloud R, Vallone D, Arya M, Osgood N, Li X, Minsky S, Viswanath K. The Potential of Smartphone Apps in Informing Protobacco and Antitobacco Messaging Efforts Among Underserved Communities: Longitudinal Observational Study. J Med Internet Res. 2020 Jul 07;22(7):e17451. doi: 10.2196/17451. https://www.jmir.org/2020/7/e17451/ v22i7e17451 [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Marsh A, Hirve S, Lele P, Chavan U, Bhattacharjee T, Nair H, Campbell H, Juvekar S. Validating a GPS-based approach to detect health facility visits against maternal response to prompted recall survey. J Glob Health. 2020 Jun;10(1):010602. doi: 10.7189/jogh.10.010602. doi: 10.7189/jogh.10.010602.jogh-10-010602 [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Dale LP, White L, Mitchell M, Faulkner G. Smartphone app uses loyalty point incentives and push notifications to encourage influenza vaccine uptake. Vaccine. 2019 Jul 26;37(32):4594–4600. doi: 10.1016/j.vaccine.2018.04.018.S0264-410X(18)30487-0 [DOI] [PubMed] [Google Scholar]
8.Donaire-Gonzalez D, Valentín Antònia, de Nazelle A, Ambros A, Carrasco-Turigas G, Seto E, Jerrett M, Nieuwenhuijsen MJ. Benefits of Mobile Phone Technology for Personal Environmental Monitoring. JMIR Mhealth Uhealth. 2016 Nov 10;4(4):e126. doi: 10.2196/mhealth.5771. https://mhealth.jmir.org/2016/4/e126/ v4i4e126 [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Goodspeed R, Yan X, Hardy J, Vydiswaran VV, Berrocal VJ, Clarke P, Romero DM, Gomez-Lopez IN, Veinot T. Comparing the Data Quality of Global Positioning System Devices and Mobile Phones for Assessing Relationships Between Place, Mobility, and Health: Field Study. JMIR Mhealth Uhealth. 2018 Aug 13;6(8):e168. doi: 10.2196/mhealth.9771. https://mhealth.jmir.org/2018/8/e168/ v6i8e168 [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Boonstra TW, Nicholas J, Wong QJ, Shaw F, Townsend S, Christensen H. Using Mobile Phone Sensor Technology for Mental Health Research: Integrated Analysis to Identify Hidden Challenges and Potential Solutions. J Med Internet Res. 2018 Jul 30;20(7):e10131. doi: 10.2196/10131. https://www.jmir.org/2018/7/e10131/ v20i7e10131 [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Vhaduri S, Poellabauer C, Striegel A, Lizardo O, Hachen D. Discovering places of interest using sensor data from smartphones and wearables. IEEE Xplore; SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI; June 28; San Francisco. IEEE; 2017. [DOI] [Google Scholar]
12.Krenn PJ, Titze S, Oja P, Jones A, Ogilvie D. Use of global positioning systems to study physical activity and the environment: a systematic review. Am J Prev Med. 2011 Nov;41(5):508–15. doi: 10.1016/j.amepre.2011.06.046. http://europepmc.org/abstract/MED/22011423 .S0749-3797(11)00546-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Marsh A, Hirve S, Lele P, Chavan U, Bhattacharjee T, Nair H, Campbell H, Juvekar S. Validating a GPS-based approach to detect health facility visits against maternal response to prompted recall survey. J Glob Health. 2020 Jun;10(1):010602. doi: 10.7189/jogh.10.010602. doi: 10.7189/jogh.10.010602.jogh-10-010602 [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Do TMT, Dousse O, Miettinen M, Gatica-Perez D. A probabilistic kernel method for human mobility prediction with smartphones. Pervasive and Mobile Computing. 2015 Jul;20:13–28. doi: 10.1016/j.pmcj.2014.09.001. [DOI] [Google Scholar]
15.Torous J, Staples P, Barnett I, Sandoval LR, Keshavan M, Onnela JP. Characterizing the clinical relevance of digital phenotyping data quality with applications to a cohort with schizophrenia. NPJ Digit Med. 2018;1:15. doi: 10.1038/s41746-018-0022-8. doi: 10.1038/s41746-018-0022-8.22 [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Gelman A. Analysis of variance—why it is more important than ever. Ann. Statist. 2005 Feb 1;33(1):1–53. doi: 10.1214/009053604000001048. [DOI] [Google Scholar]
17.Vittinghoff E, Glidden D, Shiboski SM. Regression Methods in Biostatistics. Boston: Springer; 2012. Repeated measures and longitudinal data analysis; pp. 261–308. [Google Scholar]
18.Bates D, Maechler M, Bolker B, Walker S, Christensen RHB, Singmann H, Dai B, Scheipl F, Grothendieck G. Package ‘lme4’. Linear mixed-effects models using S4 classes. R Package Version 2021. 2021. [2021-12-10]. https://cran.r-project.org/web/packages/lme4/index.html .
19.Gelman A, Hill J. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge: Cambridge University Press; 2006. [Google Scholar]
20.Knowles J, Frederick C. merTools: tools for analyzing mixed effect regression models. R Package version 050 2019. [2021-10-12]. https://cran.r-project.org/web/packages/merTools/index.html .
21.Bähr S, Haas G, Keusch F, Kreuter F, Trappmann M. Missing Data and Other Measurement Quality Issues in Mobile Geolocation Sensor Data. Social Science Computer Review. 2020 Aug 06;:089443932094411. doi: 10.1177/0894439320944118. [DOI] [Google Scholar]
22.Kiang MV, Chen J, Krieger N, Buckee C, Alexander M, Baker J, Buckner R, Coombs G, Rich-Edwards J, Carlson K, Onnela JP. Sociodemographic Characteristics of Missing Data in Digital Phenotyping. Scientific Reports. 2021 Jul 29;11(1):15408. doi: 10.1101/2020.12.29.20249002. (forthcoming) [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Beukenhorst AL, Schultz DM, McBeth J, Lakshminarayana R, Sergeant JC, Dixon WG. Using Smartphones for Research Outside Clinical Settings: How Operating Systems, App Developers, and Users Determine Geolocation Data Quality in mHealth Studies. Stud Health Technol Inform. 2017;245:10–14. [PubMed] [Google Scholar]
24.Torous J, Kiang MV, Lorme J, Onnela JP. New Tools for New Research in Psychiatry: A Scalable and Customizable Platform to Empower Data Driven Smartphone Research. JMIR Ment Health. 2016 May 05;3(2):e16. doi: 10.2196/mental.5165. https://mental.jmir.org/2016/2/e16/ v3i2e16 [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Torous J, Staples P, Barnett I, Sandoval LR, Keshavan M, Onnela JP. Characterizing the clinical relevance of digital phenotyping data quality with applications to a cohort with schizophrenia. NPJ Digit Med. 2018;1:15. doi: 10.1038/s41746-018-0022-8. doi: 10.1038/s41746-018-0022-8.22 [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Beukenhorst AL, Schultz DM, McBeth J, Sergeant J, Dixon W. Are weather conditions associated with chronic musculoskeletal pain? Review of results and methodologies. Pain. 2020 Apr;161(4):668–683. doi: 10.1097/j.pain.0000000000001776.00006396-202004000-00003 [DOI] [PubMed] [Google Scholar]
27.Dorsey ER, Chan YF, McConnell MV, Shaw SY, Trister AD, Friend SH. The Use of Smartphones for Health Research. Acad Med. 2017 Feb;92(2):157–160. doi: 10.1097/ACM.0000000000001205. [DOI] [PubMed] [Google Scholar]
28.Onnela JP. Opportunities and challenges in the collection and analysis of digital phenotyping data. Neuropsychopharmacology. 2021 Jan;46(1):45–54. doi: 10.1038/s41386-020-0771-3.10.1038/s41386-020-0771-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Lee E, Viswanath K. Big Data in Context: Addressing the Twin Perils of Data Absenteeism and Chauvinism in the Context of Health Disparities Research. J Med Internet Res. 2020 Jan 07;22(1):e16377. doi: 10.2196/16377. https://www.jmir.org/2020/1/e16377/ v22i1e16377 [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Barnett I, Onnela JP. Inferring mobility measures from GPS traces with missing data. Biostatistics. 2020 Apr 01;21(2):e98–e112. doi: 10.1093/biostatistics/kxy059. http://europepmc.org/abstract/MED/30371736 .5145908 [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Song C, Qu Z, Blumm N, Barabási AL. Limits of predictability in human mobility. Science. 2010 Feb 19;327(5968):1018–21. doi: 10.1126/science.1177170.327/5968/1018 [DOI] [PubMed] [Google Scholar]
32.O'Dea S. Mobile operating systems: market share worldwide from January 2012 to October 2021. Statista. 2012. [2021-05-05]. https://www.statista.com/statistics/272698/global-market-share-held-by-mobile-operating-systems-since-2009/
33.O'Dea S. Market share of mobile operating systems in North America from January 2018 to June 2021. Statista. [2021-05-05]. https://www.statista.com/statistics/1045192/share-of-mobile-operating-systems-in-north-america-by-month/

[ref1] 1.Amor J, James C. Setting the scene: Mobile and wearable technology for managing healthcare and wellbeing. 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC); 25-29 Aug; Milan, Italy. 2015. pp. 7752–5. [DOI] [PubMed] [Google Scholar]

[ref2] 2.Schultz DM, Beukenhorst AL, Yimer BB, Cook L, Pisaniello H, House T, Gamble C, Sergeant JC, McBeth J, Dixon WG. Weather Patterns Associated With Pain In Chronic-Pain Sufferers. Bull Am Meteorol Soc 2020. 2020 May 01;101(5):E555–E566. doi: 10.1175/BAMS-D-19-0265.1. [DOI] [Google Scholar]

[ref3] 3.Dixon WG, Beukenhorst AL, Yimer BB, Cook L, Gasparrini A, El-Hay T, Hellman B, James B, Vicedo-Cabrera AM, Maclure M, Silva R, Ainsworth J, Pisaniello HL, House T, Lunt M, Gamble C, Sanders C, Schultz DM, Sergeant JC, McBeth J. How the weather affects the pain of citizen scientists using a smartphone app. NPJ Digit Med. 2019;2:105. doi: 10.1038/s41746-019-0180-3. doi: 10.1038/s41746-019-0180-3.180 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref4] 4.Glasgow ML, Rudra CB, Yoo E, Demirbas M, Merriman J, Nayak P, Crabtree-Ide C, Szpiro AA, Rudra A, Wactawski-Wende J, Mu L. Using smartphones to collect time-activity data for long-term personal-level air pollution exposure assessment. J Expo Sci Environ Epidemiol. 2016 Jun;26(4):356–64. doi: 10.1038/jes.2014.78.jes201478 [DOI] [PubMed] [Google Scholar]

[ref5] 5.Lee EW, Bekalu MA, McCloud R, Vallone D, Arya M, Osgood N, Li X, Minsky S, Viswanath K. The Potential of Smartphone Apps in Informing Protobacco and Antitobacco Messaging Efforts Among Underserved Communities: Longitudinal Observational Study. J Med Internet Res. 2020 Jul 07;22(7):e17451. doi: 10.2196/17451. https://www.jmir.org/2020/7/e17451/ v22i7e17451 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref6] 6.Marsh A, Hirve S, Lele P, Chavan U, Bhattacharjee T, Nair H, Campbell H, Juvekar S. Validating a GPS-based approach to detect health facility visits against maternal response to prompted recall survey. J Glob Health. 2020 Jun;10(1):010602. doi: 10.7189/jogh.10.010602. doi: 10.7189/jogh.10.010602.jogh-10-010602 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref7] 7.Dale LP, White L, Mitchell M, Faulkner G. Smartphone app uses loyalty point incentives and push notifications to encourage influenza vaccine uptake. Vaccine. 2019 Jul 26;37(32):4594–4600. doi: 10.1016/j.vaccine.2018.04.018.S0264-410X(18)30487-0 [DOI] [PubMed] [Google Scholar]

[ref8] 8.Donaire-Gonzalez D, Valentín Antònia, de Nazelle A, Ambros A, Carrasco-Turigas G, Seto E, Jerrett M, Nieuwenhuijsen MJ. Benefits of Mobile Phone Technology for Personal Environmental Monitoring. JMIR Mhealth Uhealth. 2016 Nov 10;4(4):e126. doi: 10.2196/mhealth.5771. https://mhealth.jmir.org/2016/4/e126/ v4i4e126 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref9] 9.Goodspeed R, Yan X, Hardy J, Vydiswaran VV, Berrocal VJ, Clarke P, Romero DM, Gomez-Lopez IN, Veinot T. Comparing the Data Quality of Global Positioning System Devices and Mobile Phones for Assessing Relationships Between Place, Mobility, and Health: Field Study. JMIR Mhealth Uhealth. 2018 Aug 13;6(8):e168. doi: 10.2196/mhealth.9771. https://mhealth.jmir.org/2018/8/e168/ v6i8e168 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref10] 10.Boonstra TW, Nicholas J, Wong QJ, Shaw F, Townsend S, Christensen H. Using Mobile Phone Sensor Technology for Mental Health Research: Integrated Analysis to Identify Hidden Challenges and Potential Solutions. J Med Internet Res. 2018 Jul 30;20(7):e10131. doi: 10.2196/10131. https://www.jmir.org/2018/7/e10131/ v20i7e10131 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref11] 11.Vhaduri S, Poellabauer C, Striegel A, Lizardo O, Hachen D. Discovering places of interest using sensor data from smartphones and wearables. IEEE Xplore; SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI; June 28; San Francisco. IEEE; 2017. [DOI] [Google Scholar]

[ref12] 12.Krenn PJ, Titze S, Oja P, Jones A, Ogilvie D. Use of global positioning systems to study physical activity and the environment: a systematic review. Am J Prev Med. 2011 Nov;41(5):508–15. doi: 10.1016/j.amepre.2011.06.046. http://europepmc.org/abstract/MED/22011423 .S0749-3797(11)00546-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref13] 13.Marsh A, Hirve S, Lele P, Chavan U, Bhattacharjee T, Nair H, Campbell H, Juvekar S. Validating a GPS-based approach to detect health facility visits against maternal response to prompted recall survey. J Glob Health. 2020 Jun;10(1):010602. doi: 10.7189/jogh.10.010602. doi: 10.7189/jogh.10.010602.jogh-10-010602 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref14] 14.Do TMT, Dousse O, Miettinen M, Gatica-Perez D. A probabilistic kernel method for human mobility prediction with smartphones. Pervasive and Mobile Computing. 2015 Jul;20:13–28. doi: 10.1016/j.pmcj.2014.09.001. [DOI] [Google Scholar]

[ref15] 15.Torous J, Staples P, Barnett I, Sandoval LR, Keshavan M, Onnela JP. Characterizing the clinical relevance of digital phenotyping data quality with applications to a cohort with schizophrenia. NPJ Digit Med. 2018;1:15. doi: 10.1038/s41746-018-0022-8. doi: 10.1038/s41746-018-0022-8.22 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref16] 16.Gelman A. Analysis of variance—why it is more important than ever. Ann. Statist. 2005 Feb 1;33(1):1–53. doi: 10.1214/009053604000001048. [DOI] [Google Scholar]

[ref17] 17.Vittinghoff E, Glidden D, Shiboski SM. Regression Methods in Biostatistics. Boston: Springer; 2012. Repeated measures and longitudinal data analysis; pp. 261–308. [Google Scholar]

[ref18] 18.Bates D, Maechler M, Bolker B, Walker S, Christensen RHB, Singmann H, Dai B, Scheipl F, Grothendieck G. Package ‘lme4’. Linear mixed-effects models using S4 classes. R Package Version 2021. 2021. [2021-12-10]. https://cran.r-project.org/web/packages/lme4/index.html .

[ref19] 19.Gelman A, Hill J. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge: Cambridge University Press; 2006. [Google Scholar]

[ref20] 20.Knowles J, Frederick C. merTools: tools for analyzing mixed effect regression models. R Package version 050 2019. [2021-10-12]. https://cran.r-project.org/web/packages/merTools/index.html .

[ref21] 21.Bähr S, Haas G, Keusch F, Kreuter F, Trappmann M. Missing Data and Other Measurement Quality Issues in Mobile Geolocation Sensor Data. Social Science Computer Review. 2020 Aug 06;:089443932094411. doi: 10.1177/0894439320944118. [DOI] [Google Scholar]

[ref22] 22.Kiang MV, Chen J, Krieger N, Buckee C, Alexander M, Baker J, Buckner R, Coombs G, Rich-Edwards J, Carlson K, Onnela JP. Sociodemographic Characteristics of Missing Data in Digital Phenotyping. Scientific Reports. 2021 Jul 29;11(1):15408. doi: 10.1101/2020.12.29.20249002. (forthcoming) [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref23] 23.Beukenhorst AL, Schultz DM, McBeth J, Lakshminarayana R, Sergeant JC, Dixon WG. Using Smartphones for Research Outside Clinical Settings: How Operating Systems, App Developers, and Users Determine Geolocation Data Quality in mHealth Studies. Stud Health Technol Inform. 2017;245:10–14. [PubMed] [Google Scholar]

[ref24] 24.Torous J, Kiang MV, Lorme J, Onnela JP. New Tools for New Research in Psychiatry: A Scalable and Customizable Platform to Empower Data Driven Smartphone Research. JMIR Ment Health. 2016 May 05;3(2):e16. doi: 10.2196/mental.5165. https://mental.jmir.org/2016/2/e16/ v3i2e16 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref25] 25.Torous J, Staples P, Barnett I, Sandoval LR, Keshavan M, Onnela JP. Characterizing the clinical relevance of digital phenotyping data quality with applications to a cohort with schizophrenia. NPJ Digit Med. 2018;1:15. doi: 10.1038/s41746-018-0022-8. doi: 10.1038/s41746-018-0022-8.22 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref26] 26.Beukenhorst AL, Schultz DM, McBeth J, Sergeant J, Dixon W. Are weather conditions associated with chronic musculoskeletal pain? Review of results and methodologies. Pain. 2020 Apr;161(4):668–683. doi: 10.1097/j.pain.0000000000001776.00006396-202004000-00003 [DOI] [PubMed] [Google Scholar]

[ref27] 27.Dorsey ER, Chan YF, McConnell MV, Shaw SY, Trister AD, Friend SH. The Use of Smartphones for Health Research. Acad Med. 2017 Feb;92(2):157–160. doi: 10.1097/ACM.0000000000001205. [DOI] [PubMed] [Google Scholar]

[ref28] 28.Onnela JP. Opportunities and challenges in the collection and analysis of digital phenotyping data. Neuropsychopharmacology. 2021 Jan;46(1):45–54. doi: 10.1038/s41386-020-0771-3.10.1038/s41386-020-0771-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref29] 29.Lee E, Viswanath K. Big Data in Context: Addressing the Twin Perils of Data Absenteeism and Chauvinism in the Context of Health Disparities Research. J Med Internet Res. 2020 Jan 07;22(1):e16377. doi: 10.2196/16377. https://www.jmir.org/2020/1/e16377/ v22i1e16377 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref30] 30.Barnett I, Onnela JP. Inferring mobility measures from GPS traces with missing data. Biostatistics. 2020 Apr 01;21(2):e98–e112. doi: 10.1093/biostatistics/kxy059. http://europepmc.org/abstract/MED/30371736 .5145908 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref31] 31.Song C, Qu Z, Blumm N, Barabási AL. Limits of predictability in human mobility. Science. 2010 Feb 19;327(5968):1018–21. doi: 10.1126/science.1177170.327/5968/1018 [DOI] [PubMed] [Google Scholar]

[ref32] 32.O'Dea S. Mobile operating systems: market share worldwide from January 2012 to October 2021. Statista. 2012. [2021-05-05]. https://www.statista.com/statistics/272698/global-market-share-held-by-mobile-operating-systems-since-2009/

[ref33] 33.O'Dea S. Market share of mobile operating systems in North America from January 2018 to June 2021. Statista. [2021-05-05]. https://www.statista.com/statistics/1045192/share-of-mobile-operating-systems-in-north-america-by-month/

PERMALINK

Understanding the Predictors of Missing Location Data to Inform Smartphone Study Design: Observational Study

Anna L Beukenhorst, PhD

Jamie C Sergeant, PhD

David M Schultz, PhD

John McBeth, PhD

Belay B Yimer, PhD

Will G Dixon, PhD

Abstract

Background

Objective

Methods

Results

Conclusions

Introduction

Methods

Ethics Approval and Consent to Participate

Study Design

Data Collection

Data Preparation and Eligible Participants

Data Analysis

Results

Figure 1.

Figure 2.

Table 1.

Discussion

Principal Findings

Why Do Time Indicators and Operating System Predict Location-Data Completeness?

Implications: Consequences of Missing Data Are Context-Specific

Improving Location-Data Completeness

Conclusion

Acknowledgments

Abbreviations

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Understanding the Predictors of Missing Location Data to Inform Smartphone Study Design: Observational Study

Anna L Beukenhorst, PhD

Jamie C Sergeant, PhD

David M Schultz, PhD

John McBeth, PhD

Belay B Yimer, PhD

Will G Dixon, PhD

Abstract

Background

Objective

Methods

Results

Conclusions

Introduction

Methods

Ethics Approval and Consent to Participate

Study Design

Data Collection

Data Preparation and Eligible Participants

Data Analysis

Results

Figure 1.

Figure 2.

Table 1.

Discussion

Principal Findings

Why Do Time Indicators and Operating System Predict Location-Data Completeness?

Implications: Consequences of Missing Data Are Context-Specific

Improving Location-Data Completeness

Conclusion

Acknowledgments

Abbreviations

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases