Abstract
Objective
With its size and diversity, the All of Us Research Program has the potential to power and improve representation in clinical trials through ancillary studies like Nutrition for Precision Health. We sought to characterize high-level trial opportunities for the diverse participants and sponsors of future trial investment.
Materials and Methods
We matched All of Us participants with available trials on ClinicalTrials.gov based on medical conditions, age, sex, and geographic location. Based on the number of matched trials, we (1) developed the Trial Opportunities Compass (TOC) to help sponsors assess trial investment portfolios, (2) characterized the landscape of trial opportunities in a phenome-wide association study (PheWAS), and (3) assessed the relationship between trial opportunities and social determinants of health (SDoH) to identify potential barriers to trial participation.
Results
Our study included 181 529 All of Us participants and 18 634 trials. The TOC identified opportunities for portfolio investment and gaps in currently available trials across federal, industrial, and academic sponsors. PheWAS results revealed an emphasis on mental disorder-related trials, with anxiety disorder having the highest adjusted increase in the number of matched trials (59% [95% CI, 57-62]; P < 1e-300). Participants from certain communities underrepresented in biomedical research, including self-reported racial and ethnic minorities, had more matched trials after adjusting for other factors. Living in a nonmetropolitan area was associated with up to 13.1 times fewer matched trials.
Discussion and Conclusion
All of Us data are a valuable resource for identifying trial opportunities to inform trial portfolio planning. Characterizing these opportunities with consideration for SDoH can provide guidance on prioritizing the most pressing barriers to trial participation.
Keywords: clinical trial diversity, electronic health records, social determinants of health, phenome-wide association study, All of Us Research Program
Introduction
Improving clinical trial participation among diverse populations is crucial for advancing equitable health research. Over the last 30 years, promoting diversity and inclusion in clinical trials emerged as a key priority for U.S. federal agencies, such as the National Institutes of Health (NIH) and the Food and Drug Administration.1,2 Despite efforts to address long-standing inequities in trial accrual, little progress has been made to increase participation among historically marginalized populations (eg, minoritized racial and ethnic groups, older adults, rural residents, and women).3
A promising strategy is recruitment from ongoing large-scale health cohorts. Successful nationwide trials like VITAL4 and COSMOS5 that enrolled participants from the Women’s Health Study6 and Women’s Health Initiative,7 respectively, demonstrated the potential of embedded recruitment to enable timely accrual of participants who have largely and historically been underrepresented in biomedical research. Specifically, underrepresented groups in current health cohort studies have already demonstrated interest and willingness to participate in health-related research, reducing key barriers such as mistrust and lack of awareness.8 In addition, the availability of participant-level data, such as routine demographics and electronic health records (EHRs), can help investigators quickly identify potential participants from diverse populations to accelerate start-up times and shorten enrollment phases.9
The All of Us Research Program is an ongoing national initiative with the potential to inform and power clinical trials through ancillary studies aimed at returning value to the Program participants and enriching the All of Us data.10 Created following the announcement of the U.S. Precision Medicine Initiative in 2015, the Program seeks to enroll at least 1 million participants to help build one of the most diverse biomedical research repositories in the world. Since its inception, the All of Us Research Program has prioritized recruitment of participants from communities historically underrepresented in biomedical research, including minority racial and ethnic groups, sexual and gender minorities, older adults, people with disabilities, people with barriers in access to care, people with low income or educational attainment, and rural residents.11 The Program’s community and engagement partners focus on educating communities and supporting enduring relationships with participants.12 As a result, more than 80% of participants come from communities historically underrepresented in biomedical research, making the All of Us dataset the most diverse of its kind.13,14 In 2022, the NIH awarded $170 million for Nutrition for Precision Health, powered by the All of Us Research Program, an ancillary study that began enrolling All of Us participants for precision nutrition studies, including interventional clinical trials, on March 30, 2023.15 Ancillary studies like Nutrition for Precision Health will return value by generating new biomedical data accessible to the wider research community, providing engagement opportunities for underrepresented groups to participate in clinical trials, and enabling long-term discoveries to improve health for diverse populations. Thus, sponsors of future ancillary studies and participants of the All of Us Research Program represent assets that are capable of advancing equitable health research. The Program’s potential to power a wide range of clinical trials, however, has not been studied to date. Furthermore, opportunities for sponsors of future trial investment are currently under-explored.
To this end, the objective of this study is to characterize at a high level the availability of recruiting clinical trials in the U.S. for the All of Us participant population. Our work aims to return value to communities through several ways. First, illuminating high-level trial availability can help sponsors evaluate portfolios and identify potential investment opportunities in the current clinical trial landscape to inform policy making and planning for All of Us ancillary studies. These studies will not only catalyze short-term engagement opportunities for All of Us participants, but also enable long-term health discoveries. Second, characterizing high-level trial opportunities with consideration for social determinants of health (SDoH) can identify barriers to trial participation. Assessing the interplay between trial opportunities and socioenvironmental factors in a national health cohort like All of Us can shed light on these barriers across diverse populations. Third, our flexible, data-driven approach can readily generalize to other large-scale cohorts to illuminate the trial landscape for the broader scientific community.
Methods
Study design and participants
This cross-sectional study included All of Us participants with a 3-digit zip code and at least one condition in their EHR from the All of Us June 2022 Controlled Tier dataset. The presence of a condition was defined as having at least 2 instances of a relevant International Classification of Diseases (ICD) code to maximize phenotyping accuracy.16,17 We grouped ICD codes into phecodes (version 1.2),18 which are physician-curated codes intended to capture clinically meaningful concepts or conditions. For each condition, we used ClinicalTrials.gov’s application programming interface (API) (v1.01.05) (https://clinicaltrials.gov/api/gui) to query study records. We restricted our query to actively recruiting (as of February 14, 2023), U.S.-based studies, and retrieved the following fields: the study’s unique identifier, age limit, sex at birth, location, study type, and lead sponsor information. We identified relevant adult trials with the study type “interventional” and minimum age 18 years. Figure 1 provides a schema detailing the query and matching process. A description of each field as defined on ClinicalTrials.gov and a sample API query are provided in Table S1. The All of Us Controlled Tier dataset has a nonhuman subjects designation, and ClinicalTrials.gov data are publicly available, aggregate trial data; thus, institutional review board approval was not required. This study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guideline.19
Figure 1.
Overview of matching procedure.
Characterization of trial opportunities for All of Us participants and sponsors of future trial investment
For every All of Us participant, we calculated the number of matched trials based on medical conditions, age, sex at birth, and geographic location (3-digit zip code) (Figure 1). To help sponsors navigate the current trial landscape, we developed the Trial Opportunities Compass (TOC), a generalizable, data-driven framework that identifies opportunities for future portfolio investment and gaps in currently available trials. Specifically, the TOC divides medical conditions into 1 of 9 regions, each characterized by a combination of: (1) few, (2) some, and (3) many participants/matched trials (ie, a 3x3 grid). Because sponsors can have different portfolio investment interests, the TOC is designed so that users can specify the cut-offs for these regions (eg, <20 percentile=few, between 20 and 80 percentile=some, >80 percentile=many). The rationale for having a gradient of regions is to help sponsors quickly and easily identify potential opportunities or gaps in the trial landscape. For example, conditions that belong to the region with “many participants and few trials” (ie, prevalent but under-studied conditions) represent potential trial investment opportunities, whereas those that belong to “few participants and many trials” (ie, less prevalent but well-studied conditions) may correspond to saturated research areas. To characterize the trial opportunities landscape in the Program, we conducted a phenome-wide association study (PheWAS) to assess the relationship between the number of matched trials and medical conditions. Specifically, we sought to characterize the diversity of trial investment portfolios in the current landscape and identify conditions that were associated with more matched trials, representing trial opportunities. Identifying trial opportunities can help guide sponsors on resource allocation and portfolio planning. In a secondary analysis, we assessed the relationship between the number of matched trials and SDoH to better understand the interplay between trial opportunities and socioenvironmental factors across the diverse participant population.
Statistical analysis
We calculated the average number of matched trials stratified by age, sex at birth, self-reported race and ethnicity, geographic location, highest educational level attained, and income level across different phecode condition domains. For geographic location, we categorized participants into living in a metropolitan or nonmetropolitan area according to the U.S. Department of Agriculture20 based on their 3-digit zip code. For income status, participants were categorized into the low-income group if their annual household income did not exceed the low-income limit defined by the U.S. Department of Housing and Urban Development based on household size, 3-digit zip code, and fiscal year.21 For our PheWAS, we performed multivariable negative binomial regressions to assess the association between the number of matched trials (outcome) and the presence of each medical condition (present or absent) adjusting for age, sex at birth, location, self-reported race and ethnicity, highest educational level attained, income status (low income vs not low income), disability (have one or more disability vs no disability), and number of medical conditions. We chose the negative binomial distribution to account for overdispersion in the outcome data and used Bonferroni’s correction to adjust for multiple testing based on a significance threshold of 0.05. For our demographic and SDoH analysis, we used multivariable negative binomial regression to assess the association between the number of matched trials and various factors, including demographics (self-reported race and ethnicity, sex at birth, age), socioenvironmental factors (geographic location, income status, highest educational level attained, English proficiency, food insecurity, discrimination, social support, housing quality, and neighborhood cohesion),22 and number of medical conditions. All statistical analyses were performed using R version 4.2.2 on the All of Us Researcher Workbench.23
Results
The June 2022 release of the All of Us Controlled Tier data included 210 491 participants with at least one medical condition in their EHR; among those, 181 529 had a 3-digit zip code. Based on our ClinicalTrials.gov query from February 14, 2023, there were 18 634 actively recruiting, adult trials with at least one U.S.-based recruiting site. The majority (95%) of these trials had at least one recruiting site in a metropolitan area, whereas only 19% had a recruiting site in a nonmetropolitan area. The phecode condition domains with most number of trials were mental, cancer, neurological, endocrine/metabolic, circulatory, and respiratory. To ensure legibility of the table/figures, we consolidated the rest of the domains as “other.”
Overview of trial opportunities in the All of Us research program
Table 1 provides an overview of the cohort across sociodemographic factors and condition domains. The majority of the cohort was ages 50 years or above (65%), female (64%), self-reported white (52%), and lived in metropolitan areas (93%). Mental, neurological, circulatory, endocrine/metabolic, and respiratory conditions were common, with at least 50% of participants having at least one corresponding phecode entry in their EHR. Overall, the average number of matched trials was 58.4 [95% CI, 57.9-58.9] per participant and higher among self-reported racial and ethnic minorities (72.3 [95% CI, 71.5-73.1]). Figure 2 shows the average number of matched trials stratified by sociodemographic factors across 7 condition domains. Participants from communities historically underrepresented in biomedical research, including self-reported Black and Hispanic participants, participants with low income, participants with one or more disabilities, and participants without a high school diploma or equivalent, had more matched trials compared to their respective counterparts across all condition domains. Conversely, participants living in nonmetropolitan areas had substantially fewer matched trials (ie, 3.9-12.2 times fewer) than those in metropolitan areas. Within each sociodemographic factor, participants generally matched with more mental health-related trials than those from other condition domains.
Table 1.
Overview of demographic, social, and condition profiles.
N (%)a | |
---|---|
Self-reported demographic and social factors | |
Age | |
<50 | 63 727 (35) |
50 | 117 802 (65) |
Sex at birth | |
Male | 63 951 (36.2) |
Female | 112 730 (63.8) |
Intersex | 37 (<1) |
Race and ethnicity | |
White | 94 814 (52.2) |
Black, African American, or African | 35 953 (19.8) |
Hispanic, Latino, or Spanish | 34 693 (19.1) |
Asian | 4382 (2.41) |
Middle Eastern or North African | 950 (0.52) |
Native Hawaiian or other Pacific Islander | 207 (0.11) |
Multiple | 2671 (1.5) |
Remaining | 7859 (4.3) |
Geographic location | |
Metropolitan | 167 920 (92.5) |
Nonmetropolitan | 13 609 (7.5) |
Highest educational level attained | |
11 or below | 17 611 (10.1) |
12 or GED | 84 354 (48.2) |
College graduate | 38 424 (21.9) |
Advanced degree | 34 609 (19.8) |
Have a disability | |
Yes | 14 754 (30.2) |
No | 34 113 (69.8) |
Low income | |
Yes | 49 584 (35.2) |
No | 91 318 (64.8) |
Condition domainsb | |
Cancer | 60 919 (33.8) |
Mental | 89 826 (50) |
Neurological | 94 810 (52.6) |
Circulatory | 111 720 (62) |
Endocrine/metabolic | 117 157 (65) |
Respiratory | 99 887 (55.4) |
Other | 166 538 (92.4) |
N (%) denotes the number of participants (percentage).
Phecode condition domains where participants with at least one corresponding phecode were counted. Note that a participant can be counted towards multiple condition domains. The condition domain “Other” consists of digestive, hematopoietic, sense organs, symptoms, musculoskeletal, genitourinary, injuries & poisoning, pregnancy complications, dermatologic, congenital anomalies, and infectious diseases. The self-reported race and ethnicity category “Remaining” consists of participants who skipped the survey question or selected “prefer not to answer.”
Figure 2.
Average number of matched trials and 95% CI per All of Us participant by sociodemographic factors. Within each category, the dots and horizontal bars represent the average number of matched trials and 95% CI, respectively. These are color coded by phecode-derived medical domains: Mental (pink), Respiratory (purple), Other (blue), Cancer (green), Neurological (dark green), Endocrine/Metabolic (gold), and Circulatory (red). The phecode domain “Other” consists of digestive, hematopoietic, sense organs, symptoms, musculoskeletal, genitourinary, injuries & poisoning, pregnancy complications, dermatologic, congenital anomalies, and infectious diseases. Self-reported race and ethnicity categories: Hispanic = Hispanic, Latino or Spanish; Black = Black, African American, or African; MENA = Middle Eastern or North African; NHOPI = Native Hawaiian or other Pacific Islander. The self-reported race and ethnicity category “Remaining” consists of participants who skipped the survey question or selected “prefer not to answer.”
TOC for navigating the current trial landscape
Figure 3 shows the TOC for NIH, industrial, and academic sponsors across the most common medical conditions in All of Us. For illustration, we divided the landscape into {few, some, many} participants/matched trials based on {the lowest 20th, middle 60th, and top 20th} percentile cutoffs. Overall, the distribution of medical conditions in the TOC is comparable across all sponsors, revealing similar focal areas in their trial portfolio investment. Essential hypertension, hyperlipidemia, and abdominal pain are among the most common medical conditions (many participants), representing potential trial investment opportunities, but under-studied (few to some matched trials) by all 3 sponsors compared to other medical conditions, including anxiety disorder and type 2 diabetes (many matched trials). In particular, anxiety disorder had the most number of matched trials, with 217 by the NIH, 152 industry, and 835 academia. Type 2 diabetes had 50, 62, and 189 matched trials across the NIH, industrial, and academic sponsors, respectively.
Figure 3.
Trial Opportunities Compass for different sponsors across the most common medical conditions in All of Us. Ppts = participants; {few, some many} = {lowest 20th, middle 60th, and top 20th percentiles}.
PheWAS of the trial opportunities landscape
Figure 4 shows the Manhattan plot for the entire All of Us participant population. For legibility, we labeled the 20 most statistically significant medical conditions. Overall, the PheWAS results revealed a wide range of trial opportunities across different clinical domains. Specifically, we discovered an emphasis on mental disorder-related trials in the current landscape, as 4 of the top 20 hits belonged to this domain. Adjusting for age, sex at birth, location, self-reported race and ethnicity, education, income, disability, and the number of medical conditions, all 4 were significantly associated with more matched trials, with anxiety disorder having the highest increase (ie, 59% [95% CI, 57-62]; P < 1e-300) followed by depression (42% [95% CI, 39-44]; P = 6e-252), major depressive disorder (34% [95% CI, 32-37]; P = 3e-282), and tobacco use disorder (24% [95% CI, 22-26]; P = 7e-124). Other focal areas included neoplasms and respiratory conditions. Specifically, malignant neoplasms and secondary malignant neoplasms were associated with a 158% (95% CI, 144-174; P = 5e-233) and 122% (95% CI, 107-138; P = 8e-114) increase in the number of matched trials, respectively, followed by prostate cancer (100% [95% CI, 91-108]; P = 4e-219). Among respiratory conditions, asthma was associated with the highest increase (37% [95% CI, 35-40]; P = 3e-256), followed by shortness of breath (30% [95% CI, 29-33]; P = 2e-192), and cough (25% [95% CI, 23-27]; P = 4e-128).
Figure 4.
Manhattan plot of the trial opportunities PheWAS for all participants. N = 181 529 participants. Red horizontal line indicates the threshold for statistical significance.
Relationship between trial opportunities and socioenvironmental factors
Figure 5 shows the statistically significant adjusted matched trial ratios from our demographic and SDoH analysis. Living in a metropolitan area was associated with substantially more matched trials across all condition domains, with metropolitan participants matching with 3.1-13.1 times more trials than their nonmetropolitan counterparts, adjusting for other factors. Compared to self-reported white participants, Black and Hispanic participants had more matched trials across all condition domains, with endocrine/metabolic trials having the highest adjusted ratio (1.65 times more [95% CI, 1.46-1.86] P = 3.5e-16 and 1.45 [95% CI, 1.27-1.65] P = 2.19e-8, respectively). Substandard housing quality, more neighborhood discord, low income, lower educational attainment, and limited English proficiency were associated with more matched trials across multiple condition domains after adjusting for other factors.
Figure 5.
Matched trials count ratio (95% CI) of statistically significant covariates in the adjusted analysis. Reference levels of categorical covariates include: Self-reported race and ethnicity (White), Sex at birth (Female), Location (Metropolitan), Housing Quality (Good), Low Income (No), Food Insecurity (No), Highest Education (11 or Below), and English Proficient (No). The domain “Other” consists of digestive, hematopoietic, sense organs, symptoms, musculoskeletal, genitourinary, injuries & poisoning, pregnancy complications, dermatologic, congenital anomalies, and infectious diseases. Self-reported race and ethnicity categories: Hispanic = Hispanic, Latino or Spanish; Black = Black, African American, or African; MENA = Middle Eastern or North African; NHOPI = Native Hawaiian or other Pacific Islander. The self-reported race and ethnicity category “Remaining” consists of participants who skipped the survey question or selected “prefer not to answer.”
Discussion
Our high-level analysis illuminated potential opportunities in the current clinical trial landscape. The PheWAS enabled discovery of a wide range of trial opportunities for All of Us participants, reflecting the diversity of trial investment portfolios. Specifically, among the 20 conditions with the strongest statistical associations, half were mental disorders, cancers, or respiratory conditions. Our adjusted analysis showed that self-reported racial and ethnic minorities and participants with certain SDoH, ie, environmental stressors and limited healthcare access, had more clinical trial opportunities in the current landscape. It is worth noting, however, that trial availability is not the same as trial accessibility. Trials may not be truly accessible to underrepresented communities due to various reasons, including under-designed access logistics and lack of investment in community engagement.24 Thus, significant investment in concurrent initiatives is required to increase trial accessibility among marginalized communities.3
While prior work characterized the clinical trial landscape by analyzing trends in trial completion, participant enrollment, and focal areas,25–27 ours is the first to pair ClinicalTrials.gov studies with data from a large-scale health cohort to return value to communities by illuminating opportunities for trial planning. Specifically, the TOC can be applied to combinations of different sponsors and medical conditions to assess broad-to-specific scientific portfolios. As such, it may provide meaningful information to inform All of Us ancillary study design planning and policy making for sponsors interested in embedded recruitment from large-scale health cohorts. Based on our adjusted analysis on demographic factors and SDoH, living in a nonmetropolitan area was a major limiting factor associated with up to 13.1 times fewer matched trials. This finding supports the observation from qualitative studies that geographic location is a key barrier to trial participation.28–32 Therefore, quantifying the magnitude of potential barriers by leveraging a diverse, large-scale health cohort like All of Us not only can supplement qualitative studies, but also return value to the biomedical community by providing guidance on prioritizing the most pressing barriers.
Our study has several strengths and limitations. Because phecodes rely on data that are ubiquitous, standardized, and easy to manipulate,16 our approach is readily generalizable to other large-scale health cohorts that contain participant-level EHR data. Limitations of phecodes, or ICD-based phenotyping in general, include incomplete or incorrect ascertainment of medical conditions. As a result, our analysis may have missed conditions. While our study focuses on providing a high-level characterization of the national clinical trial landscape based on a broad matching strategy, we recognize that sponsors may also be interested in matching participants based on more granular trial criteria. Specifically, using a broad matching strategy can be helpful for sponsors that need to identify a large group of potential participants; on the other hand, others may prefer a granular matching strategy to identify participants with specific traits (eg, specific genetic markers). Therefore, the definition of a trial opportunity depends on the sponsors’ interests and consequently, their matching strategy. To this end, our framework can be easily tailored to different matching strategies. For sponsors interested in a more granular characterization of the trial landscape, a potential approach is to leverage information from multiple EHR data types, such as lab results and clinical notes, to increase the accuracy and granularity of phenotypes.33–35 Natural language processing models that can extract computable phenotypes from ClinicalTrials.gov data may serve as a useful tool for future work in this direction.36–38
Our findings support the potential of the All of Us Research Program to power a wide range of clinical trials. The diversity and infrastructure of the Program can catalyze much-needed improvement in clinical trial participation from underrepresented populations through ancillary studies like Nutrition for Precision Health. As such, these studies provide opportunities for participants and sponsors to address long-standing inequities in trial accessibility and recruitment. This requires building trust between participants and researchers, promoting fairness for participants and their communities, and generating unbiased biomedical knowledge. Ultimately, reaching this goal will take all of us.
Supplementary Material
Acknowledgments
The All of Us Research Program would not be possible without the partnership of its participants. All of Us and Nutrition for Precision Health, powered by the All of Us Research Program are service marks of the U.S. Department of Health and Human Services (HHS).
Contributor Information
Cathy Shyr, Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States.
Lina Sulieman, Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States.
Paul A Harris, Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States; Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37203, United States; Department of Biomedical Engineering, Vanderbilt University, Nashville, TN 37240, United States.
Author contributions
Cathy Shyr had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. Concept and design: Paul A. Harris, Cathy Shyr. Acquisition, analysis, or interpretation of data: Paul A. Harris, Cathy Shyr. Drafting of the manuscript: Cathy Shyr. Critical revision of the manuscript for important intellectual content: Paul A. Harris, Cathy Shyr, Lina Sulieman. Statistical analysis: Cathy Shyr. Obtained funding: Paul A. Harris. Administrative, technical, or material support: Paul A. Harris. Supervision: Paul A. Harris.
Supplementary material
Supplementary material is available at Journal of the American Medical Informatics Association online.
Funding
This work was funded by the grant numbers 1OT2OD035404 from the National Institutes of Health (P.A.H., C.S., L.S.), 1U24TR004432-01 from the National Center for Advancing Translational Sciences (P.A.H., C.S.), and 1K99LM014429-01 from the National Library of Medicine (C.S.). The funders had no role in the design and conduct of the study, collection, management, analysis, and interpretation of the data, preparation, review, or approval of the manuscript, and decision to submit the manuscript for publication.
Conflicts of interest
None declared.
Data availability
To ensure privacy of All of Us participants, deidentified data used for this study are available to approved researchers following registration, completion of ethics training, and attestation of a data use agreement through the All of Us Research Program website, which can be accessed at https://workbench.researchallofus.org/login. The code used to produce the results in this article can be accessed at https://github.com/cathyshyr/AllOfUs_SDOH_Clinical_Trials.
References
- 1. U.S. Food and Drug Administration. Diversity Plans to Improve Enrollment of Participants from Underrepresented Racial and Ethnic Populations in Clinical Trials; Draft Guidance for Industry. U.S. Food and Drug Administration; 2022. https://www.fda.gov/media/157635/download
- 2. National Institutes of Health Office of Research on Women’s Health. NIH Inclusion Policies. Office of Research on Women’s Health; 2016. https://orwh.od.nih.gov/womens-health-research/clinical-research-trials/nih-inclusion-policies
- 3. Committee on Improving the Representation of Women and Underrepresented Minorities in Clinical Trials and Research, Committee on Women in Science, Engineering, and Medicine, and Policy and Global Affairs. Improving the Representation of Women and Underrepresented Minorities in Clinical Trials and Research: Building Research Equity for Women and Underrepresented Groups. The National Academies of Sciences, Engineering, and Medicine; 2022. https://nap.nationalacademies.org/catalog/26479/improving-representation-in-clinical-trials-and-research-building-research-equity [PubMed]
- 4. Manson JE, Cook NR, Lee IM, et al. ; VITAL Research Group. Vitamin D supplements and prevention of cancer and cardiovascular disease. N Engl J Med. 2019;380(1):33-44. 10.1056/NEJMoa1809944 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Sesso HD, Manson JE, Aragaki AK, et al. ; COSMOS Research Group. Effect of cocoa flavanol supplementation for the prevention of cardiovascular disease events: the COcoa Supplement and Multivitamin Outcomes Study (COSMOS) randomized clinical trial. Am J Clin Nutr. 2022;115(6):1490-1500. 10.1093/ajcn/nqac055 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Harvard Medical School and Brigham and Women’s Hospital. Women's Health Study; 1993. https://whs.bwh.harvard.edu
- 7. National Heart, Lung, and Blood Institute. Women’s Health Initiative; 1992. https://www.whi.org
- 8. Bae AS. Key barriers against racial and ethnic minority participation in U.S. clinical trials. Int J Clin Trials. 2022;9(3):227. 10.18203/2349-3259.ijct20221876 [DOI] [Google Scholar]
- 9. Rist PM, Sesso HD, Manson JE. Innovation in the design of large-scale hybrid randomized clinical trials. Contemp Clin Trials. 2020;99:106178. 10.1016/j.cct.2020.106178 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Denny JC, Rutter JL, Goldstein DB, et al. ; All of Us Research Program Investigators. The “All of Us” research program. N Engl J Med. 2019;381(7):668-676. 10.1056/NEJMsr1809937 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Ramirez AH, Gebo KA, Harris PA. Progress with the All of Us research program. JAMA. 2021;325(24):2441-2442. 10.1001/jama.2021.7702 [DOI] [PubMed] [Google Scholar]
- 12. National Institutes of Health All of Us Research Program. All of Us Funding and Program Partners. National Institues of Health; 2016. https://allofus.nih.gov/funding-and-program-partners/communications-and-engagement-partners
- 13. Mapes BM, Foster CS, Kusnoor SV, et al. ; All of Us Research Program. Diversity and inclusion for the All of Us research program: a scoping review. PLoS One. 2020;15(7):e0234962. 10.1371/journal.pone.0234962 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Ramirez AH, Sulieman L, Schlueter DJ, et al. ; All of Us Research Program. The All of Us research program: data quality, utility, and diversity. Patterns. 2022;3(8):100570. 10.1016/j.patter.2022.100570 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. National Institutes of Health Office of Strategic Coordination - the Common Fund. Nutrition for Precision Health, powered by the All of Us Research Program. National Institutes of Health; 2022. https://commonfund.nih.gov/nutritionforprecisionhealth/fundedresearch
- 16. Bastarache L. Using phecodes for research with the electronic health record: from PheWAS to PheRS. Annu Rev Biomed Data Sci. 2021;4(1):1-19. 10.1146/annurev-biodatasci-122320-112352 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Bastarache L, Delozier S, Pandit A, et al. The phenotype-genotype reference map: improving biobank data science through replication. Am J Hum Genet. 2023;110(9):1522-1533. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Denny JC, Ritchie MD, Basford MA, et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics. 2010;26(9):1205-1210. 10.1093/bioinformatics/btq126 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Vandenbroucke JP, von Elm E, Altman DG, et al. ; STROBE Initiative. Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): explanation and elaboration. PLoS Med. 2007;4(10):e297. 10.1371/journal.pmed.0040297 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Economic Research Service. U.S. Department of Agriculture. U.S. Department of Agriculture Economic Research Service: Rural-Urban Commuting Area Codes. U.S. Department of Agriculture; 2010. https://www.ers.usda.gov/data-products/rural-urban-commuting-area-codes/
- 21. The Department of Housing and Urban Development. Office of Policy Development and Research. The Department of Housing and Urban Development Income Limits. The Department of Housing and Urban Development; 2022. https://www.huduser.gov/portal/datasets/il.html#2021_data
- 22. Tesfaye S, Cronin R, Lopez-Class M, et al. Measuring social determinants of health in the All of Us research program. Technical Document. medRxiv. Published Online June. 2023;5. 10.1101/2023.06.01.23290404 [DOI] [Google Scholar]
- 23. Mayo KR, Basford MA, Carroll RJ, et al. The All of Us data and research center: creating a secure, scalable, and sustainable ecosystem for biomedical research. Annu Rev Biomed Data Sci. 2023;6(1):443-464. 10.1146/annurev-biodatasci-122120-104825 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Cook SK, Kennedy N, Boone L, et al. What we wish every investigator knew: top 4 recruitment and retention recommendations from the Recruitment Innovation Center. J Clin Transl Sci. 2022;6(1):e31. 10.1017/cts.2022.370 [DOI] [Google Scholar]
- 25. Shadbolt C, Naufal E, Bunzli S, et al. Analysis of rates of completion, delays, and participant recruitment in randomized clinical trials in surgery. JAMA Netw Open. 2023;6(1):e2250996. 10.1001/jamanetworkopen.2022.50996 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Gresham G, Meinert JL, Gresham AG, Meinert CL. Assessment of trends in the design, accrual, and completion of trials registered in ClinicalTrials.gov by sponsor type, 2000-2019. JAMA Netw Open. 2020;3(8):e2014682. 10.1001/jamanetworkopen.2020.14682 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Anand V, Cahan A, Ghosh S. Clinical Trials.Gov: a topical analyses. AMIA Jt Summits Transl Sci Proc. 2017;2017:37-47. [PMC free article] [PubMed] [Google Scholar]
- 28. Clark LT, Watkins L, Piña IL, et al. Increasing diversity in clinical trials: overcoming critical barriers. Curr Probl Cardiol. 2019;44(5):148-172. 10.1016/j.cpcardiol.2018.11.002 [DOI] [PubMed] [Google Scholar]
- 29. Reopell L, Nolan TS, Gray DM, et al. Community engagement and clinical trial diversity: navigating barriers and co-designing solutions—a report from the “Health Equity through Diversity” seminar series. PLoS One. 2023;18(2):e0281940. 10.1371/journal.pone.0281940 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Meropol NJ, Buzaglo JS, Millard J, et al. Barriers to clinical trial participation as perceived by oncologists and patients. J Natl Compr Canc Netw. 2007;5(8):655-664. 10.6004/jnccn.2007.0067 [DOI] [PubMed] [Google Scholar]
- 31. Kumar G, Chaudhary P, Quinn A, Su D. Barriers for cancer clinical trial enrollment: a qualitative study of the perspectives of healthcare providers. Contemp Clin Trials Commun. 2022;28:100939. 10.1016/j.conctc.2022.100939 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Wong AR, Sun V, George K, et al. Barriers to participation in therapeutic clinical trials as perceived by community oncologists. JCO Oncol Pract. 2020;16(9):e849-e858. 10.1200/JOP.19.00662 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Zhang Y, Cai T, Yu S, et al. High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP). Nat Protoc. 2019;14(12):3426-3444. 10.1038/s41596-019-0227-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Liao KP, Sun J, Cai TA, et al. High-throughput multimodal automated phenotyping (MAP) with application to PheWAS. J Am Med Inform Assoc. 2019;26(11):1255-1262. 10.1093/jamia/ocz066 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Yu S, Ma Y, Gronsbell J, et al. Enabling phenotypic big data with PheNorm. J Am Med Inform Assoc. 2018;25(1):54-60. 10.1093/jamia/ocx111 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Fang Y, Idnay B, Sun Y, et al. Combining human and machine intelligence for clinical trial eligibility querying. J Am Med Inform Assoc. 2022;29(7):1161-1171. 10.1093/jamia/ocac051 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Yuan C, Ryan PB, Ta C, et al. Criteria2Query: a natural language interface to clinical databases for cohort definition. J Am Med Inform Assoc. 2019;26(4):294-305. 10.1093/jamia/ocy178 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Datta S, Lee K, Paek H, et al. AutoCriteria: a generalizable clinical trial eligibility criteria extraction system powered by large language models. J Am Med Inform Assoc. 2024;31(2):375-385. 10.1093/jamia/ocad218 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
To ensure privacy of All of Us participants, deidentified data used for this study are available to approved researchers following registration, completion of ethics training, and attestation of a data use agreement through the All of Us Research Program website, which can be accessed at https://workbench.researchallofus.org/login. The code used to produce the results in this article can be accessed at https://github.com/cathyshyr/AllOfUs_SDOH_Clinical_Trials.