Abstract
Person-generated health data (PGHD) from smartphones/wearables are invaluable for precision health, a field promoting health equity through tailored disease prevention, detection, and intervention strategies. However, pervasive convenience sampling in extant PGHD research introduces selection biases that systematically underrepresent disadvantaged groups, limit model generalizability, and risk exacerbating health disparities. Benchmark PGHD (representative, validated, longitudinal, and frequently repeated) are urgently needed to support model equity. To address this fieldwide limitation, we established American Life in Realtime (ALiR), a longitudinal population health study involving PGHD collected from a probability-based, nationally representative cohort using study-provided Fitbits and (as needed) 4G tablets. As a result, ALiR's 1,038 participants are broadly representative across comprehensive sociodemographic, behavioral, and health-related US population norms, overcoming disparities in established convenience samples (e.g. NIH's All of Us; AoU). Only two sources of differential enrollment remained: older age (odds ratio [OR]: 1.27, 99% CI: 1.12–1.45) during consent, lower education (OR: 0.86, 99% CI: 0.79–0.94) during enrollment, though oversampling individuals without bachelor's degrees sufficiently counterbalanced the latter. An illustrative coronavirus disease 2019 classification model—chosen for global significance, known disparities in experience and outcomes, and methodological relevance—trained using ALiR performed equivalently when tested in sample (area under the curve [AUC] = 0.84, 95% CI: 0.79–0.89) and out of sample on AoU (AUC = 0.83, 95% CI: 0.78–0.89) overall, and in historically underserved subgroups (AUC = 0.82–1.0). Conversely, an identically trained classification model using AoU underperformed by 35% out of sample on ALiR (overall AUC = 0.68, 95% CI: 0.61–0.75 vs. AUC = 0.93, 95% CI: 0.91–0.96 in sample), with worse performance in older female and non-White subgroups (by 22–40%). Our results suggest that probability sampling and hardware provisioning enabled cohort inclusivity and generalizable model performance, supporting ALiR's benchmarking potential for equitable recruitment, PGHD collection, and precision health application.
Keywords: person-generated health data, digital health, precision health, health equity, model generalizability
Significance Statement.
Data from smartphones/wearables could help personalize healthcare, but existing research underrepresents disadvantaged groups, which risks worsening health disparities. We created American Life in Realtime (ALiR), a wearable-based population health study to address this systematic limitation. By using random, address-based sampling and providing Fitbits and 4G tablets, ALiR removed nearly all participation barriers within disadvantaged groups, resulting in broad representation of US adults compared with existing studies. Further, an illustrative coronavirus disease 2019 detection model—chosen for known disparities, global impact, and methodological comparability—trained using ALiR was equally effective across diverse populations, outperforming an identically trained model using All of Us (NIH's cohort to advance research in underrepresented groups). ALiR thus provides a potential benchmark for equitable recruitment, data collection, and modeling in wearable research.
Introduction
Precision health is an emerging field that seeks to characterize and address health disparities through approaches customized to an individual's specific context and needs (1, 2). Current applications, including individualized detection of the novel coronavirus disease 2019 (COVID-19) (3–8), involve AI and machine learning (ML) systems built upon large-scale person-generated health data (PGHD). Derived from routine digital technology engagement (e.g. smartphones and wearables) (9), PGHD can continuously capture everyday real-world social, structural, and environmental exposures, behaviors, and biometrics (10)—factors that account for 60–80% of the modifiable risk of adverse health outcomes (11–14)—providing a valuable opportunity to identify and effectively address unique pathological drivers in marginalized groups.
However, the field lacks “benchmark” PGHD (15, 16), a significant limitation that could exacerbate health disparities in those who may benefit most from precision health advances (17). Benchmarks are standardized datasets that train, evaluate, and validate AI/ML models, ensuring transparency, reproducibility, and generalizability across populations. Without such resources, the field lacks the ability to systematically evaluate model performance, identify biases, and quantify uncertainty, resulting in underperformance in key subgroups and limited real-world applicability. The ideal PGHD benchmark should (i) reflect population diversity; (ii) pair passive data with frequently repeated, validated labels of social, structural, environmental, and health exposures and outcomes; (iii) span longitudinal, and life-course designs; (iv) encompass sufficient data quantity, quality, and diversity for AI/ML applied to individuals, subgroups, or populations; and (v) provide wide accessibility (15, 16). No extant dataset fulfills all five criteria. PGHD cohorts within population studies like NIH's All of Us Research Program (AoU) (18) and UK Biobank (19, 20) systematically underrepresent Black, indigenous, older, lower-educated, or lower-income groups, rely on unstructured labels from claims or electronic health records, and rarely employ repeated, longitudinal instruments (21–25). Most PGHD applications, including COVID-19 detection studies, lack reported demographic distributions, let alone social determinants (3–8). This lack of high-quality, inclusive PGHD, and transparent demographic reporting raises the risk that AI/ML applications may amplify existing disparities, causing underperformance in minority groups (26–28), patient harm (28, 29), and billions in societal costs (30).
COVID-19 highlighted the existence of profound, socially determined healthcare disparities in the United States (31), as racism, poverty, and inaccessible healthcare and social services doubled the morbidity and mortality risk in Black and low-income groups (32). PGHD-based COVID-19 detection thus illustrates both the promise of agile, individualized care that could help circumvent traditional access barriers, and the simultaneous risk of systematic underrepresentation and disparity exacerbation in minoritized groups (3–8).
Whether PGHD inequities stem from methodological selection biases or other barriers remains an open question. Pervasive convenience sampling of easy-to-reach populations (e.g. employers and health systems) and/or “bring-your-own-device” (BYOD) recruitment of self-motivated users afford few participation opportunities for underserved individuals (3–8, 21–25). When opportunities exist, competing time pressure, poor technological literacy, mistrust, privacy/security concerns, or simple disinterest may further selectively erode participation (25, 33–35).
American Life in Realtime (ALiR) was designed as a benchmark for fostering equitable precision health, incorporating wearable PGHD and comprehensive, validated, longitudinal surveys from a nationally representative sample (36). Here we characterize how ALiR's use of probability sampling and study-provided hardware—strategies that reduce methodological participation barriers—impacted cohort inclusivity, differential enrollment, and generalizability of an illustrative COVID-19 classification model compared with data from AoU (a broad convenience cohort designed to advance precision research in underrepresented populations) (18).
Results
Design
Overview
ALiR's design (36) (Fig. 1A; Online Methods) improved on methodologies of extant population health studies with PGHD collection (18, 20, 37) by incorporating best practices from multiple disciplines such as probability sampling, data benchmarking (15–17), and FAIR (38) (findable, accessible, interoperable, and reusable) standards (Fig. 1B). ALiR's participants were invited from the Understanding America Study (UAS) (39, 40), a nationally representative research cohort. Consenters were given a Fitbit Inspire 2 (Google, LLC/Fitbit Inc.; San Francisco, CA, United States) for study wear (on the wrist) as an incentive. For each individual, at least a year of data collection overlays (i) continuous Fitbit biometrics; (ii) frequent, longitudinal surveys eliciting sociodemographics, social, structural, and environmental exposures, personality, behaviors, and health measured via high-quality, validated instruments; and (iii) geospatially and temporally matched contextual linkages. A custom study app—broadly compatible across devices, operating systems, and firmware—facilitates enrollment, data collection, and long-term engagement through earned points redeemable up to $126 in annual compensation. ALiR's infrastructure, data, and study app code will be publicly available (anticipated late 2025). ALiR was approved by USC's Social and Behavioral IRB (UP-21-00181) and follows STROBE guidelines.
Fig. 1.
ALiR design and attributes vs. other population health PGHD cohorts. A) Schematic representation of ALiR infrastructure, including an established, connected cohort, study app with current and future integrations, and benchmark PGHD dataset. B) Comparison of design features across benchmarking and FAIR criteria.
Population
ALiR participants were randomly sampled from UAS (IRB: UP-14-00148), a probability-based panel representative of US adults (18+). UAS is considered one of the richest sociodemographic data sources (39, 40), generating important, generalizable findings across multidisciplinary domains (41–43). UAS members (n = 9,281 at the time of enrollment, but actively growing to 20,000 by 2026) are recruited by address-based sampling (14.4% average empanelment rate across recruitment batches) and given a 4G Tablet if they lack Internet, minimizing selection biases (e.g. random digit dialing suffers from diminishing landline use; digital advertising excludes those with access barriers). UAS members answer English/Spanish surveys via mobile/web interface and are compensated ($40/h). Since inception in 2014, a comprehensive survey core is fielded to the whole panel and repeated biennially (40). Response rates are high (>75% cumulative) and attrition is modest (<8% annually). Sample weights (sex, race/ethnicity, age, education, and geography) match UAS demographic distributions to population norms from the Current Population Survey (CPS) (40, 44).
Study activities
Study activities occured remotely for at least 1 year (Fig. 2). A 5-min informed consent survey was fielded through the customary UAS electronic interface. During enrollment, consenting individuals were sent follow-up emails (or phone calls as needed) to confirm current mailing addresses. Fitbits and enrollment instructions were mailed to participants via signature-required, 2-day FedEx. Participants were instructed to download/install the study app from the relevant iOS/Android store, which then facilitated downloading and setting up the Fitbit device and app and permitting/syncing data exchange. Ongoing Fitbit activities involve regularly charging and continuously wearing the provided device, including during sleep. ALiR surveys are short but frequent, fielded every 1–3 days, consist of 10–15 questions, and take 1–2 minutes to complete (Appendix 1). The UAS helpdesk is available to assist on demand.
Fig. 2.
ALiR participant journey: participant onboarding steps including study app screenshots.
Data collection
Data collection, designed for label validity in supervised learning (36), uses a comprehensive array of frequently repeated consensus measures (e.g. Health and Retirement Study; HRS (45), NIH Phenotypes and Exposures; PhenX (46), Patient Reported Outcomes Measurement Information System; PROMIS (47)) to longitudinally characterize each participant along demographic, socioeconomic, personality, behavioral, and health factors (Fig. S1). UAS core data, repeated biennially, comprises 5,000+ variables across 10+ years and 600+ surveys (40), including (but not limited to) (i) sociodemographics; (ii) all variables from HRS (45); (iii) comprehensive cognitive tests and impairment classifications (48–56 ); (iv) psychological, well-being, and mental health scales (e.g. CES-D (57)); (v) personality (Big Five (58)); (vi) economic well-being and financial planning; (vii) competencies and literacies; (viii) COVID-related experiences; and (ix) behaviors/perceptions. ALiR-specific surveys (Appendix 1) assess multiple dynamic health-related factors, repeated monthly (or annually where relevant), including (x) social/structural determinants (e.g. food/housing security, racism/discrimination, neighborhood safety, and adverse childhood events); (xi) behaviors (e.g. drugs/alcohol, diet, internet competence, health literacy, health motivations, medical mistrust, vaccine hesitancy, etc.); and (xii) health/well-being (e.g. mental/physical outcomes, infectious disease, healthcare utilization, etc.). (xiii) ALiR Fitbit data include both minute-level sensor data and Fitbit's aggregated metrics (e.g. heart rate, heart rate variability, activity, steps, sleep, temperature, blood oxygenation, currently; others as released by Fitbit, in the future) (59). (xiv) UAS linkages matched spatially/temporally at relevant frequencies include air/water quality, weather, crime, neighborhood characteristics, Centers for Medicare and Medicaid Services claims, Social Security Administration earnings/benefits, and the national death index (40). Finally, (xv) UAS prospective studies are added as relevant, like whole-genomic sequencing and personal wearable-based pollution monitoring (Atmotube).
Enrollment
Sample selection
From 2021 August 3 to 2022 March 1, we randomly invited 2,468 UAS members to ALiR using 16 demographic groups defined by intersections of biological sex, age, race/ethnicity, and education (Table S1). Relative to population benchmarks, non-Whites were oversampled by 40% and noncollege graduates by 5%. We excluded individuals with computers only (7% of UAS; Fitbit required either smartphone or tablet), and individuals needing mobility assist devices (6% of UAS; there were few validated Fitbit measures for this population at the time).
Nonresponse
The consent survey achieved a response rate of 87.1% (n = 2,150). There were no major differences in the likelihood of completing the consent survey across demographic groups (Tables S2 and S3).
Consent
Of consent survey respondents, 64.4% (n = 1,386) agreed to participate in ALiR (Fig. 3). The most common reason for nonconsent in survey comments—which were only provided by 12% of respondents and may not be generalizable—was already having an “Apple Watch” or “better” device (23%). Most of these individuals (85%) were White with household incomes above $60 K. An additional 20% of commenters cited logistics (e.g. work environments that cannot accommodate wearables) or unwillingness to continuously wear a device. Only 16% cited barriers like mobile network coverage or privacy (Table S4).
Fig. 3.
ALiR enrollment: distribution of enrollment status among consent survey respondents within sociodemographic subgroups: orange = did not consent; shades of blue = consented but lost to follow-up; green = enrolled. Data labels = counts. AI, AN, HI, or PI = American Indian, Alaska Native, Hawaiian, or Pacific Islander.
Enrollment
As of 2022 April 4, 74.9% (n = 1,038) of the consenting sample enrolled in ALiR and 25.1% (n = 348) were lost to follow-up. Of this latter group, 75.3% (n = 262) did not confirm their address and could not be sent their device; 12.1% (n = 42) received materials but never enrolled; and 12.6% (n = 44) withdrew. Almost all withdrawals (86%; n = 38) occurred before enrollment (Table S5). Reasons for withdrawal varied but included repeated unavailability to sign for packages due to work or other commitments, study time commitment, phone/tablet incompatibility with the Fitbit app, and device-related skin irritation.
Barriers and differential enrollment
We used logistic regression and random forests (RFs) to identify observed factors associated with nonconsent and nonenrollment after consenting across combinations of 79 relevant variables (sociodemographics, health, behaviors, personality, skills, cognitive ability, etc.; Online Methods, Appendix S2A). Odds ratio (OR) estimations below refer to the most parsimonious regression (17 variables: biological sex, race/ethnicity, education, age, marital status, labor status, income, Census region, urbanicity, household size, self-reported health, body mass index (BMI), and number of preexisting chronic conditions). See Figs. S2–S4, Appendix S2B for additional results and sensitivity analyses.
Older age (OR: 1.27, 99% CI: 1.12–1.45) was the only observed historically underrepresented or disparity-related factor consistently associated with nonconsent (i.e. unwillingness to participate) across all models after correction for multiple hypotheses (False Discovery Rate; FDR < 0.05; Fig. S2). Other factors—more commonly associated with higher sociodemographic strata—included smaller household size (OR: 0.90, 99% CI: 0.82–0.98) and lower BMI (OR: 0.76, 99% CI: 0.62–0.92). RFs identified income and nonhousing wealth as barriers with nonlinear interactions. Heterogeneity analysis (60) suggested that participants with the highest total wealth and picture-vocabulary scores were also less likely to consent (both factors are more prevalent within higher sociodemographic strata).
Lower educational attainment (OR: 0.86, 99% CI: 0.79–0.94) was the only observed disparity-related variable consistently associated with nonenrollment after consenting (i.e. barriers beyond willingness to participate; Fig. S3). Heterogeneity analysis (Fig. S5) suggested that individuals with the lowest educational attainment (<9 years) were least likely to enroll.
Systematic univariate testing for nonenrollment across all 5,000+ UAS variables did not yield any additional factors not related to those in our main models (e.g. age; Tables S6 and S7).
ALiR's representation vs. population norms
Overall, ALiR's unweighted (i.e. raw) cohort resembled the US adult population when compared with a holistic set of population norms (see Online Methods), including demographics, socioeconomics, education, personality, behaviors, and health (Fig. 4, Tables S8 and S9). The most statistically significant differences relative to the population were expected given the sampling strategy and/or the composition of the parent UAS cohort. Racial/ethnic minorities (oversampled by 40%) are overrepresented by 42% in ALiR (54% vs. 38% in the population; Fig. 4A), while White individuals are underrepresented by 26% (46 vs. 62%). Since racial/ethnic minorities are more likely to be lower on the income distribution, ALiR overrepresents those with annual household incomes <$30,000 by 16% (22 vs. 19%) and underrepresents those with ≥$100,000 by 19% (26 vs. 32%; Fig. 4B). ALiR's geographic composition mirrors UAS, where the Northeast and South are underrepresented, while West is overrepresented due to large (deliberate) California and Los Angeles County subsamples (Fig. 4D). Finally, individuals with functional limitations are slightly underrepresented due to the exclusion of those needing mobility assist devices (Fig. 4I).
Fig. 4.
ALiR characteristics: differences between US adult population norms (black), weighted ALiR (light green), unweighted ALiR (dark green), and UAS-BYOD (orange) across (A) demographics; (B) socioeconomics; (C) digital inclusion; (D) geography; (E) personality; (F–I) health; and (J) cognition. Error bars = 95% CI. Population norms from Basic Monthly CPS (October 2021–March 2022) where available, literature-based epidemiological estimates where available, and from the UAS Comprehensive File weighted to the CPS for remaining variables. Demographics are from the “My Household” survey (April 2021). HRS health/cognitive outcomes are from the UAS Comprehensive File. Chronic conditions = self-reported diagnoses. AI/AN or HI/PI = American Indian/Alaska Native or Hawaiian/Pacific Islander. Fn’l Limits = Functional Limitations. Rx Drugs = Prescription Drugs. See Tables S8–S11 for details.
Unweighted ALiR's age discrepancy was not anticipated. The fraction of individuals aged 60+ in ALiR is 32% below population norms (21 vs. 31%). Consequently, the fraction of retirees is 40% lower (12 vs. 20%).
While our sampling strategy focused on achieving sociodemographic representation, ALiR's unweighted sample is also representative of US adults across personality (Fig. 4E), skills (Table S8), and several dimensions of health (Fig. 4F–J, Table S9). Those with digital access barriers are also included, as 77.2% and 2.3% did not have a wearable or Internet access prior to UAS-ALiR provisioned hardware, resembling respective norms (Fig. 4C).
Application of sample weights—generated to align ALiR's distributions of basic demographics to the adult US population (biological sex, race/ethnicity, age, education, and geography; Online Methods)—corrected for most observed discrepancies, including variables not used in weighting. Three notable discrepancies remained. First, the weights improve but do not fully correct for the underrepresentation of retired individuals. Second, weights do not correct for the overrepresentation of unemployed, on leave, and disabled individuals. Third, individuals with diagnosed hypertension are underrepresented while those with diabetes are slightly overrepresented (61).
ALiR's representation vs. convenience-based cohorts
To evaluate improvement in representation, we compared ALiR's unweighted sample to two comparison cohorts that isolate benefits from different aspects of our approach. (i) UAS-BYOD, a sample of wearable owners within the UAS probability sample, is used to assess the impact of hardware provisioning vs. BYOD (Fig. 4, Tables S10 and S11). (ii) AoU's BYOD, a clinical convenience sample of Fitbit owners, is used to assess the combined benefits of probability sampling and hardware provisioning (Table S12).
Compared with population norms, UAS-BYOD (n = 1,631) significantly underrepresented individuals who are: (i) male by 32%; (ii) Black by 48%; (iii) Asian by 29%; (iv) American Indian, Alaska Native, Hawaiian, and Pacific Islander by 36%; (v) Hispanic/Latino by 19%; (vi) older adult by 15%; (vii) without bachelor's degree by 33%; (viii) unemployed by 18%; (ix) with household income <$30 K by 49%; (x) with household income between $30 and 60 K by 30%; (xi) without UAS-provided Internet access by 85%; and (xii) living in rural areas by 23%. The UAS-BYOD sample would also significantly underrepresent people with diagnosed chronic conditions (obesity, hypertension, lung disease, and stroke), functional limitations, and with poorer overall health, cognitive function, health utilization, health behaviors, financial literacy, and digital competencies.
AoU's BYOD Fitbit dataset (n = 14,133) underrepresented males by 41%, non-White participants by 66%, individuals without bachelor's degrees by 57%, and individuals with household incomes <$75 K by 29% compared to population norms.
Generalizability of models trained on unweighted ALiR vs. BYOD
To illustrate ALiR's generalizability compared with AoU-Fitbit BYOD, respective classification models of COVID-19 infection were trained on each (unweighted) dataset (Fig. 5A; Online Methods). COVID-19 was chosen due to: (i) comparability of available variables across datasets; (ii) model robustness in the evidence base (3–8); (iii) timeliness and relevance of respiratory infectious disease to the general population; and (iv) known symptom heterogeneities and racial/ethnic disparities in outcomes (6, 31, 32). Drawing on published models (3–8), identical methods were used for data curation, training, and hyperparameter tuning. XGBoost was chosen as the best performing model (area under the curve; AUC) for both datasets. Performance was compared in (ALiR model on ALiR holdout data; AoU on AoU) and out of sample (ALiR model on AoU data, AoU model on ALiR data), overall, and for subgroups defined by intersections of race/ethnicity, age, and sex.
Fig. 5.
Generalizability of COVID-19 classification model performance using ALiR vs. AoU datasets: A) Schematic representation of COVID-19 classification model training and testing procedure for ALiR (green) and AoU (orange) datasets. COVID-19-positive cases = COVID+; COVID-19-negative cases = COVID−. B) Sample composition of ALiR (unweighted) and AoU datasets. C) Overall COVID-19 classification model performance (area under curve; AUC) in-sample (ALiR-trained on ALiR holdout data: light green and AoU-trained on AoU holdout data: light orange) and out of sample (ALiR-trained on AoU data: dark green and AoU-trained on ALiR data: dark orange). D) Out-of-sample COVID-19 classification model performance (AUC) for sociodemographic subgroups for ALiR-trained model tested on AoU data (green) and AoU-trained model tested on ALiR data (orange).
During curation, ALiR's COVID-19 dataset retained 71% of its initial sample vs. AoU's 7%. Demographic distributions, representative for ALiR vs. primarily female, higher-educated, and White for AoU (Fig. 5B; Table S13), remained largely consistent between initial and training samples for both datasets. The prevalence of COVID-19 cases was higher in ALiR (27 vs. 8%), leading to comparable positive training samples (n = 181 vs. 237) despite the 14× difference in initial sample size.
ALiR-trained classification performed equally well when tested in (AUC = 0.84, 95% CI: 0.79–0.89) and out of sample, both overall (AUC = 0.84, 95% CI: 0.78–0.89; Fig. 5C) and for subgroups (AUC = 0.82–1.0; Fig. 5D). Conversely, while overall AoU-trained classification performed marginally better in sample (AUC = 0.93, 95% CI: 0.91–0.96), it underperformed by 35% (vs. in sample) when tested on ALiR (AUC = 0.68, 95% CI: 0.61–0.75). Further, AoU's performance was worse in historically underserved groups (Table S14), including older White females (by 40%, despite also having the largest representation in AoU's training dataset) and non-White individuals (by 22–37%) compared with White males (by 17–20%) and younger White females (by 14%).
Discussion
To the best of our knowledge, ALiR is the first longitudinal population study that combines wearable PGHD with comprehensive, frequently repeated, validated labels, meeting all relevant benchmarking criteria for precision health research (15 , 16, 36, 62–64). With only two observed sources of differential enrollment (identified from 5,000+ variables), ALiR's 1,038 participants were broadly representative across demographic, socioeconomic, and health factors, a significant improvement in inclusivity over convenience samples. ALiR's illustrative COVID-19 model (AUC = 0.84) matched or exceeded in-sample performance of others (AUC = 0.52–0.92) (3–8) and maintained effectiveness when tested out of sample and in diverse sociodemographic groups. ALiR's model used a similar number of positive COVID-19 training cases as AoU and other models despite having a 10–100× smaller initial sample (3–8), due to higher (and more consistent with population estimates (65)) disease prevalence and curation-related case retention. In contrast, AoU's BYOD model significantly underperformed on ALiR data, with 22–40% worse performance in older female and non-White groups. Surprisingly, AoU's performance was weakest in its largest training subgroup, likely due to diminished diversity of healthcare-engaged device owners relative to the within-group general population on other dimensions (e.g. education, income, comorbidity, and vaccination) (66). These results suggest that for promoting model generalizability, smaller samples of high-quality, representative data can outperform large convenience samples, as the latter, due to selection effects, may fail to adequately capture the diversity of observed and unobserved characteristics in the target population (67–69). While COVID-19 represents a single conceptual demonstration, its significance as an infectious pandemic that transcended worldwide sociodemographic boundaries is especially pertinent. ALiR's ability to account for heterogeneity in COVID-19 dynamics showcases its potential for improving equity in health applications that have previously suffered from disparities (6, 31, 32). The focus on COVID-19 was further dictated by the availability of analogous data, underscoring the field-wide need for more comprehensive datasets with standardized variables to explore drivers of equitable model performance in different health applications. To that end, probability sampling, hardware provisioning, and strategic oversampling were each critical to ALiR's benchmarking ability, providing a roadmap for inclusive prospective data collection that enhances model generalizability.
Probability sampling of US addresses promoted broad, balanced representation in ALiR's invited sample and minimized expected sources of selection bias (17, 39). In contrast, studies that recruit from clinical encounters (e.g. UK Biobank, AoU) are more likely to underrepresent people experiencing healthcare access barriers, who also tend to be non-White, under-resourced, and/or of poorer health (17–20, 70, 71). Further, probability sampling likely better accounted for unobserved factors (66, 72, 73), leading to ALiR's stronger performance on nonrepresentative data and AoU's poorer performance on representative data, even within larger subgroups. This suggests that in the context of convenience samples, use of subgroup targets and geography-based recruitment to achieve cohort inclusivity, as in AoU 's sampling approach (18), may be insufficient for model generalizability.
Universal Fitbit provisioning enabled invited individuals to remain eligible regardless of self-motivated technology access, which minimized differential enrollment and preserved inclusivity of the probability sample. ALiR's experience suggested that 64% of invited individuals were willing to participate, though only 23% of enrollees owned a wearable at the time, consistent with a recent study suggesting cost and device-related education gaps explained a similar discrepancy in federally qualified health center patients (58 vs. 20%) (25). It is not surprising then, that BYOD designs have systematically underrepresented populations experiencing health disparities (21). Indeed, had ALiR relied on UAS device owners, the invited sample would have substantially underrepresented people of color, older adults (22–25), those with lower income or education, and those with poorer internet literacy, financial literacy, health, and cognitive ability—disparities that would likely grow along age, education, and associated dimensions during enrollment. AoU's BYOD Fitbit cohort was even less representative, highlighting the compounding effect of combining convenience sampling methods.
Strategic oversampling, on one hand, led to overrepresentation of racial/ethnic minorities and socioeconomically disadvantaged individuals—who did not exhibit lower consent or enrollment compared with their counterparts—in ALiR's unweighted cohort. On the other, oversampling counterbalanced relatively poorer enrollment rates in individuals with lower education who were equally willing to participate but faced barriers beyond device access after consenting. Address confirmation and recurring unavailability to sign for packages were the top reasons for loss to follow-up, which may reflect greater housing instability or time pressures within this population (74). Remote recruitment is a primary strategy in large-scale PGHD research, and planned investigations will elucidate how enrollment steps may inadvertently introduce additional barriers in underserved populations.
Older adults were the only underserved or disparity-facing group also underrepresented in ALiR's unweighted cohort as we did not anticipate their relative unwillingness to participate despite Fitbit provisioning. This disparity, apparent even after controlling for many other age-related factors (employment, income, cognitive ability, chronic disease, disability, Internet skills, etc.), is consistent with the BYOD literature (22). Again, barriers beyond device access are likely involved, including mistrust of, disinterest in, or lack of perceived benefit from using wearables or participating in wearable studies (33–35). Facing a globally aging population, precision health offers an important opportunity to support older adult health and well-being (e.g. PGHD-based detection of depression (75), cognitive decline and dementia (76, 77), and falls (78)), but systematically improving their research participation must be a priority. As we grow ALiR's cohort to ∼10,000 members by 2027 alongside UAS's ongoing expansion, we will continue to characterize and quantify the mechanisms driving differential study participation to maintain and refine inclusivity. Potential interventions include: (i) oversampling older adults in future recruitment; (ii) strategically inviting preexisting UAS device owners; and (iii) employing evidence-based strategies like community-based participation efforts that improve trust, education, and engagement in minoritized and/or underrepresented communities (79–81).
This study is not without limitations. First, we focus on consent and enrollment; longitudinal engagement—the third component of participation—is the subject of ongoing work. Second, while ALiR substantially improves representativeness relative to existing PGHD cohorts, some selection biases remain. Weighting did not fully correct underrepresentation among older adults not in the labor force (e.g. retired, unemployed, on leave, or disabled), highlighting the potential for residual bias from unobserved factors, despite extensive coverage of observed characteristics. Third, while our illustrative COVID-19 model demonstrates promise for improved generalizability, it represents a single example. Future work will assess model performance across additional health domains.
Collectively, ALiR provides the scientific community with a public benchmarking toolkit: a methodological framework and connected cohort for prospective research, and an inclusive dataset for training and validating generalizable precision health models in diverse populations.
Supplementary Material
Acknowledgments
The authors are grateful to UAS-ALiR participants who agreed to contribute their time and digital data for research. The authors acknowledge Luca Foschini and thank Ernesto Ramirez and Evidation Health, Inc. for their collaboration in securing funding and designing and building the study app. In addition, we gratefully acknowledge All of Us participants for their contributions, without whom this research would not have been possible. We also thank the National Institutes of Health’s All of Us Research Program for making available the participant data examined in this study.
Contributor Information
Ritika R Chaturvedi, Center for Economic and Social Research, University of Southern California, 635 Downey Way, Verna and Peter Dauterive Hall (VPD), Los Angeles, CA 90089, USA; Leonard D. Schaeffer Center for Health Policy and Economics, University of Southern California, 635 Downey Way, Verna and Peter Dauterive Hall (VPD), Los Angeles, CA 90089, USA.
Marco Angrisani, Center for Economic and Social Research, University of Southern California, 635 Downey Way, Verna and Peter Dauterive Hall (VPD), Los Angeles, CA 90089, USA.
Wendy M Troxel, Division of Social and Economic Wellbeing, RAND Corporation, 4570 Fifth Avenue, Pittsburgh, PA 15213, USA.
Monika Jain, Evidation Health, Inc., 63 Bovet Road #146, San Mateo, CA 94402, USA.
Tania Gutsche, Center for Economic and Social Research, University of Southern California, 635 Downey Way, Verna and Peter Dauterive Hall (VPD), Los Angeles, CA 90089, USA.
Eva Ortega, Center for Economic and Social Research, University of Southern California, 635 Downey Way, Verna and Peter Dauterive Hall (VPD), Los Angeles, CA 90089, USA.
Adrien Boch, Evidation Health, Inc., 63 Bovet Road #146, San Mateo, CA 94402, USA.
Citina Liang, Viterbi School of Engineering, University of Southern California, 3650 McClintock Ave, Los Angeles, CA 90089, USA.
Shiyang Sima, Daniels School of Business, Purdue University, 403 Mitch Daniels Blvd, West Lafayette, IN 47907, USA.
Aziz Mezlini, Evidation Health, Inc., 63 Bovet Road #146, San Mateo, CA 94402, USA.
Eric J Daza, Evidation Health, Inc., 63 Bovet Road #146, San Mateo, CA 94402, USA.
Miad Boodaghidizaji, College of Engineering, Purdue University, 701 W Stadium Ave #3000, West Lafayette, IN 47907, USA.
Sze-chuan Suen, Viterbi School of Engineering, University of Southern California, 3650 McClintock Ave, Los Angeles, CA 90089, USA.
Alok R Chaturvedi, Daniels School of Business, Purdue University, 403 Mitch Daniels Blvd, West Lafayette, IN 47907, USA.
Hossein Ghasemkhani, Daniels School of Business, Purdue University, 403 Mitch Daniels Blvd, West Lafayette, IN 47907, USA.
Arezoo M Ardekani, College of Engineering, Purdue University, 701 W Stadium Ave #3000, West Lafayette, IN 47907, USA.
Arie Kapteyn, Center for Economic and Social Research, University of Southern California, 635 Downey Way, Verna and Peter Dauterive Hall (VPD), Los Angeles, CA 90089, USA.
Supplementary Material
Supplementary material is available at PNAS Nexus online.
Funding
This work was funded by a grant from the National Library of Medicine at the National Institutes of Health awarded to R.R.C. (R01LM013237). ALiR relies on the Understanding America Study (supported by the Social Security Administration and the National Institute on Aging at the National Institutes of Health: U01AG054580), which is maintained by the Center for Economic and Social Research (CESR) at the University of Southern California. The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The All of Us Research Program is supported by the National Institutes of Health, Office of the Director: Regional Medical Centers: 1 OT2 OD026549; 1 OT2 OD026554; 1 OT2 OD026557; 1 OT2 OD026556; 1 OT2 OD026550; 1 OT2 OD 026552; 1 OT2 OD026553; 1 OT2 OD026548; 1 OT2 OD026551; 1 OT2 OD026555; IAA #: AOD 16037; Federally Qualified Health Centers: HHSN 263201600085U; Data and Research Center: 5 U2C OD023196; Biobank: 1 U24 OD023121; The Participant Center: U24 OD023176; Participant Technology Systems Center: 1 U24 OD023163; Communications and Engagement: 3 OT2 OD023205; 3 OT2 OD023206; and Community Partners: 1 OT2 OD025277; 3 OT2 OD025315; 1 OT2 OD025337; 1 OT2 OD025276.
Author Contributions
Ritika R. Chaturvedi (Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Visualization, Writing—original draft, Writing—review & editing), Marco Angrisani (Data curation, Formal analysis, Methodology, Writing—original draft, Writing—review & editing), Wendy M. Troxel (Funding acquisition, Writing—review & editing), Monika Jain (Conceptualization, Funding acquisition, Project administration, Resources, Supervision, Writing—review & editing), Tania Gutsche (Data curation, Funding acquisition, Methodology, Project administration, Resources, Writing—review & editing), Eva Ortega (Data curation, Project administration), Adrien Boch (Data curation, Project Administration), Citina Liang (Data curation, Formal analysis), Shiyang Sima (Data curation, Formal analysis), Aziz Mezlini (Data curation, Formal analysis, Writing—review & editing), Eric J. Daza (Data curation, Formal analysis, Writing—review & editing), Miad Boodaghidizaji (Data curation, Formal analysis), Sze-chuan Suen (Formal analysis, Methodology, Supervision; Writing—review & editing), Alok R. Chaturvedi (Methodology, Supervision, Writing—review & editing), Hossein Ghasemkhani (Methodology, Supervision, Writing—review & editing), Arezoo M. Ardekani (Methodology, Supervision, Writing—review & editing), and Arie Kapteyn (Funding acquisition, Methodology, Resources, Writing—review & editing).
Data Availability
As UAS makes deidentified study data available to registered researchers (who sign the Data Use Agreement), ALiR PGHD will be available upon curation of one full year of data for the entire panel (anticipated in late 2025) and will be updated frequently thereafter. This will include additional study information, such as data dictionaries, protocols, informed consent documentation, and app code. COVID modeling code developed for this study is also available. All above information can be obtained from the UAS website at https://uasdata.usc.edu/. A portion of this work relies on data from the All of Us Research Program; data and relevant analysis code are available to select US universities on the Researcher Workbench (https://www.researchallofus.org/data-tools/workbench/).
References
- 1. Collins FS, Varmus H. 2015. A new initiative on precision medicine. N Engl J Med. 372(9):793–795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Khoury MJ, Galea S. 2016. Will precision medicine improve population health? JAMA. 316(13):1357–1358. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Jenkins T. 2022. Wearable medical sensor devices, machine and deep learning algorithms, and internet of things-based healthcare systems in COVID-19 patient screening, diagnosis, monitoring, and treatment. Am J Med Res. 9(1):49–64. [Google Scholar]
- 4. Conroy B, et al. 2022. Real-time infection prediction with wearable physiological monitoring and AI to aid military workforce readiness during COVID-19. Sci Rep. 12(1):3797. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Radin JM, et al. 2022. Sensor-based surveillance for digitising real-time COVID-19 tracking in the USA (DETECT): a multivariable, population-based, modelling study. Lancet Digit Health. 4(11):e777–e786. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Mitratza M, et al. 2022. The performance of wearable sensors in the detection of SARS-CoV-2 infection: a systematic review. Lancet Digit Health. 4(5):e370–e383. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Quer G, et al. 2021. Wearable sensor data and self-reported symptoms for COVID-19 detection. Nat Med. 27(1):73–77. [DOI] [PubMed] [Google Scholar]
- 8. Alavi A, et al. 2022. Real-time alerting system for COVID-19 and other stress events using wearable data. Nat Med. 28(1):175–184. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Gambhir SS, et al. 2018. Toward achieving precision health. Sci Transl Med. 10(430):eaao3612. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Jim HS, et al. 2020. Innovations in research and clinical care using patient-generated health data. CA Cancer J Clin. 70(3 ):182–199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Marmot M, Friel S, Bell R, Houweling TA, Taylor S, Commission on Social Determinants of Health . 2008. Closing the gap in a generation: health equity through action on the social determinants of health. Lancet. 372(9650):1661–1669. [DOI] [PubMed] [Google Scholar]
- 12. Hood CM, Gennuso KP, Swain GR, Catlin BB. 2016. County health rankings: relationships between determinant factors and health outcomes. Am J Prev Med. 50(2):129–135. [DOI] [PubMed] [Google Scholar]
- 13. Bradley EH, Elkins BR, Herrin J, Elbel B. 2011. Health and social services expenditures: associations with health outcomes. BMJ Qual Saf Health Care. 20(10):826–831. [Google Scholar]
- 14. Galea S, Abdalla SM, Sturchio JL. 2020. Social determinants of health, data science, and decision-making: forging a transdisciplinary synthesis. PLoS Med. 17(6):e100317. [Google Scholar]
- 15. Panch T, et al. 2020. Yes, but will it work for my patients?” Driving clinically relevant research with benchmark datasets. NPJ Digit Med. 3:87. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Mincu D, Roy S. 2022. Developing robust benchmarks for driving forward AI innovation in healthcare. Nat Mach Intell. 4(11):916–921. [Google Scholar]
- 17. Brayne C, Moffitt TE. 2022. The limitations of large-scale volunteer databases to address inequalities and global challenges in health and aging. Nat Aging. 2(9):775–783. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Ramirez AH, et al. 2022. The All of Us Research Program: data quality, utility, and diversity. Patterns. 3(8):100570. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Huang JY. 2021. Representativeness is not representative: addressing major inferential threats in the UK Biobank and other big data repositories. Epidemiology. 32(2):189–193. [DOI] [PubMed] [Google Scholar]
- 20. Sudlow C, et al. 2015. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12(3):e1001779. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Cho PJ, et al. 2022. Demographic imbalances resulting from the bring-your-own-device study design. JMIR Mhealth Uhealth. 10(4):e29510. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Pratap A, et al. 2020. Indicators of retention in remote digital health studies: a cross-study evaluation of 100,000 participants. NPJ Digit Med. 3(1):1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Chandrasekaran R, Katthula V, Moustakas E. 2020. Patterns of use and key predictors for the use of wearable health care devices by US adults: insights from a national survey. J Med Internet Res. 22(10):e22443. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Holko M, et al. 2020. Fitbit “Bring Your Own Device” data in the All of Us Research Program. AMIA Annual Meeting, Virtual, USA. https://knowledge.amia.org/72332-amia-1.4602255/t004-1.4605866/t004-1.4605867/3414532-1.4606075/3414532-1.4606076?qr=1
- 25. Holko M, et al. 2022. Wearable fitness tracker use in federally qualified health center patients: strategies to improve the health of all of us using digital health devices. NPJ Digit Med. 5(1):1–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Zou J, Schiebinger L. 2018. AI can be sexist and racist—It's time to make it fair. Nature. 559:324–326. [DOI] [PubMed] [Google Scholar]
- 27. Huang J, Galal G, Etemadi M, Vaidyanathan M. 2022. Evaluation and mitigation of racial bias in clinical machine learning models: scoping review. JMIR Medical Informatics. 10(5):e36388. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Larrazabal AJ, et al. 2020. Gender imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis. Proc Natl Acad Sci U S A. 117(23):12592–12594. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Obermeyer Z, et al. 2019. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 366(6464):447–453. [DOI] [PubMed] [Google Scholar]
- 30. National Academies of Sciences, Engineering, and Medicine; Policy and Global Affairs; Committee on Women in Science, Engineering, and Medicine; Committee on Improving the Representation of Women and Underrepresented Minorities in Clinical Trials and Research; Bibbins-Domingo K, Helman A, editors. Improving representation in clinical trials and research: building research equity for women and underrepresented groups. Washington (DC): National Academies Press, US, 2022. [Google Scholar]
- 31. Mackey K, et al. 2021. Racial and ethnic disparities in COVID-19–related infections, hospitalizations, and deaths: a systematic review. Ann Intern Med. 174(3):362–373. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Green H, Fernandez R, MacPhail C. 2021. The social determinants of health and health outcomes among adults during the COVID-19 pandemic: a systematic review. Public Health Nurs. 38(6):942–952. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Grande D, et al. 2022. Consumer willingness to share personal digital information for health-related uses. JAMA Netw Open. 5(1):e2144787–e2144787. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Grande D, Mitra N, Shah A, Wan F, Asch DA. 2013. Public preferences about secondary uses of electronic health information. JAMA Intern Med. 173(19):1798–1806. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Grande D, et al. 2020. Health policy and privacy challenges associated with digital technology. JAMA Netw Open. 3(7):e208285–e208285. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Chaturvedi RR, et al. 2023. American Life in Realtime: a benchmark registry of health data for equitable precision health. Nat Med. 29:283–286. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Fuezeki E, Engeroff T, Banzer W. 2017. Health benefits of light-intensity physical activity: a systematic review of accelerometer data of the National Health and Nutrition Examination Survey (NHANES). Sports Med. 47(9):1769–1793. [DOI] [PubMed] [Google Scholar]
- 38. Wilkinson M, et al. 2016. The FAIR guiding principles for scientific data management and stewardship. Sci Data. 3:160018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Alattar L, Messel M, Rogofsky D. 2018. An introduction to the understanding America study internet panel. Soc Sec Bull. 78:13. [Google Scholar]
- 40. Understanding America Study. Center for Economic and Social Research. University of Southern California . Details, methods, protocols, data, and data dictionaries [accessed 2025 Jan]. https://uasdata.usc.edu/
- 41. Sziagyi PG, et al. 2021. National trends in the US public's likelihood of getting a COVID-19 vaccine—April 1 to December 8, 2020. JAMA. 325(4):396–398. [Google Scholar]
- 42. Samek A, Kapteyn A, Gray A. 2021. Using vignettes to improve understanding of social security and annuities. J Pension Econ Financ. 21(3):326–343. [Google Scholar]
- 43. Galesic M, et al. 2018. Asking about social circles improves election predictions. Nat Hum Behav. 2:187–193. [Google Scholar]
- 44. Flood S, et al. 2024. Integrated Public Use Microdata Series, Current Population Survey: Version 12.0 [Dataset]. IPUMS. 10.18128/D030.V12.0 [DOI]
- 45. Sonnega A, et al. 2014. Cohort profile: the health and retirement study (HRS). Int J Epidemiol. 43(2):576–585. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Hamilton CM, et al. 2011. The PhenX Toolkit: get the most from your measures. Am J Epidemiol. 174(3):253–260. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Cella D, et al. 2010. The Patient-Reported Outcomes Measurement Information System (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005–2008. J Clin Epidemiol. 63(11):1179–1194. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Liu Y, et al. 2022. Self-administered web-based tests of executive functioning and perceptual speed: measurement development study with a large probability-based survey panel. J Med Internet Res. 24(5):e34347. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. McArdle J, Rodgers W, Willis R. Cognition and aging in the USA (CogUSA) 2007–2009. Inter-University Consortium for Political and Social Research, 2015. [Google Scholar]
- 50. McArdle JJ, Fisher GG, Kadlec KM. 2007. Latent variable analyses of age trends of cognition in the Health and Retirement Study, 1992–2004. Psychol Aging. 22(3):525–545. [DOI] [PubMed] [Google Scholar]
- 51. Gatz M, et al. 2023. Identifying cognitive impairment among older participants in a nationally representative internet panel. J Gerontol B Psychol Sci Soc Sci. 78(2):201–209. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Sliwinski MJ, et al. 2018. Reliability and validity of ambulatory cognitive assessments. Assessment. 25(1):14–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53. Thompson LI, et al. 2022. A highly feasible, reliable, and fully remote protocol for mobile app-based cognitive assessment in cognitively healthy older adults. Alzheimers Dement (Amst). 14(1):e12283. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. Nicosia J, et al. 2023. Unsupervised high-frequency smartphone-based cognitive assessments are reliable, valid, and feasible in older adults at risk for Alzheimer's disease. J Int Neuropsychol Soc. 29(5):459–471. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55. Cerino ES, et al. 2021. Variability in cognitive performance on Mobile devices is sensitive to mild cognitive impairment: results from the Einstein Aging Study. Front Digit Health. 3:758031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Oravecz Z, et al. 2022. Accounting for retest effects in cognitive testing with the Bayesian double exponential model via intensive measurement burst designs. Front Aging Neurosci. 14:897343. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Eaton WW, Smith C, Ybarra M, Muntaner C, Tien A. Center for epidemiologic studies depression scale: review and revision (CESD and CESD-R). 2004.. In: Maruish ME, editor. The use of psychological testing for treatment planning and outcomes assessment: instruments for adults. 3rd ed. Lawrence Erlbaum Associates Publishers. p. 363–377. [Google Scholar]
- 58. Goldberg LR. 1992. The development of markers for the Big-Five factor structure. Psychol Assess. 4(1):26–42. [Google Scholar]
- 59. Fitbit application programming interface (API) Web Reference . Fitbit Developers [accessed 2025 Jan]. https://dev.fitbit.com/build/reference/web-api/
- 60. Mezlini AM, Das S, Goldenberg A. 2021. Finding associations in a heterogeneous setting: statistical test for aberration enrichment. Genome Med. 13(1):68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61. Wall HK, Hannan JA, Wright JS. 2014. Patients with undiagnosed hypertension: hiding in plain sight. JAMA. 312:1973–1974. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62. Abdel-Salam R, Mostafa R, Hadhood M. Human activity recognition using wearable sensors: review, challenges, evaluation benchmark. International workshop on deep learning for human activity recognition. Springer Singapore, Singapore, 2021. [Google Scholar]
- 63. Antar AD, Ahmed M, Ahad MAR. Challenges in sensor-based human activity recognition and a comparative analysis of benchmark datasets: a review. 2019 Joint 8th international conference on informatics, electronics & vision (ICIEV) and 2019 3rd international conference on imaging, vision & pattern recognition (icIVPR). IEEE, 2019. [Google Scholar]
- 64. Böttcher S, et al. 2022. Data quality evaluation in wearable monitoring. Sci Rep. 12(1):21412. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65. Elflein, J. Cumulative COVID cases in the U.S. from 2020 to 2022. Statista [accessed 2022 Nov 17]. https://www.statista.com/statistics/1103185/cumulative-coronavirus-covid19-cases-number-us-by-day/#statisticContainer.
- 66. Liang W, et al. 2023. Accuracy on the curve: on the nonlinear correlation of ml performance between data subpopulations. International Conference on Machine Learning in Hawaii, USA. PMLR.
- 67. Budach L, et al. 2022. The effects of data quality on machine learning performance. arXiv, arXiv:2207.14529, preprint: not peer reviewed.
- 68. Jain A, et al. 2020. Overview and importance of data quality for machine learning tasks. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining in California, USA.
- 69. Kariluoto A, et al. 2021. Quality of data in machine learning. 2021 IEEE 21st International Conference on Software Quality, Reliability and Security Companion (QRS-C). IEEE.
- 70. Fry A, et al. 2017. Comparison of sociodemographic and health-related characteristics of UK biobank participants with those of the general population. Am J Epidemiol. 186(9):1026–1034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71. Doherty A, et al. 2017. Large scale population assessment of physical activity using wrist worn accelerometers: the UK biobank study. PLoS One. 12(2):e0169649. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72. Bradley VC, et al. 2021. Unrepresentative big surveys significantly overestimated US vaccine uptake. Nature. 600:695–700. 10.1038/s41586-021-04198-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73. Farrokhi F, Mahmoudi-Hamidabad A. 2012. Rethinking convenience sampling: defining quality criteria. Theory Pract Lang Stud. 2(4):784–792. 10.4304/tpls.2.4.784-792 [DOI] [Google Scholar]
- 74. Taylor L. 2018. Housing and health: an overview of the literature. Health Affairs Health Policy Brief. 10.1377/hpb20180313.396577 [DOI] [Google Scholar]
- 75. De Angel V, et al. 2022. Digital health tools for the passive monitoring of depression: a systematic review of methods. NPJ Digit Med. 5(1):1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76. Piau A, et al. 2019. Current state of digital biomarker technologies for real-life, home-based monitoring of cognitive function for mild cognitive impairment to mild Alzheimer disease and implications for clinical care: systematic review. J Med Internet Res. 21(8):e12785. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77. Kourtis LC, et al. 2019. Digital biomarkers for Alzheimer's disease: the mobile/wearable devices opportunity. NPJ Digit Med. 2(1):1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78. Pang I, et al. 2019. Detection of near falls using wearable devices: a systematic review. J Geriatr Phys Ther. 42(1):48–56. [DOI] [PubMed] [Google Scholar]
- 79. Shalowitz MU, et al. 2009. Community-based participatory research: a review of the literature with strategies for community engagement. J Dev Behav Pediatr. 30(4):350–361. [DOI] [PubMed] [Google Scholar]
- 80. Witham MD, McMurdo ME. 2007. How to get older people included in clinical studies. Drugs Aging. 24(3):187–196. [DOI] [PubMed] [Google Scholar]
- 81. Weil J, Mendoza AN, McGavin E. 2017. Recruiting older adults as participants in applied social research: applying and evaluating approaches from clinical studies. Educ Gerontol. 43(12):662–673. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Citations
- Flood S, et al. 2024. Integrated Public Use Microdata Series, Current Population Survey: Version 12.0 [Dataset]. IPUMS. 10.18128/D030.V12.0 [DOI]
Supplementary Materials
Data Availability Statement
As UAS makes deidentified study data available to registered researchers (who sign the Data Use Agreement), ALiR PGHD will be available upon curation of one full year of data for the entire panel (anticipated in late 2025) and will be updated frequently thereafter. This will include additional study information, such as data dictionaries, protocols, informed consent documentation, and app code. COVID modeling code developed for this study is also available. All above information can be obtained from the UAS website at https://uasdata.usc.edu/. A portion of this work relies on data from the All of Us Research Program; data and relevant analysis code are available to select US universities on the Researcher Workbench (https://www.researchallofus.org/data-tools/workbench/).





