Abstract
Objective
We aim to demonstrate the versatility of the All of Us database as an important source of rare and undiagnosed disease (RUD) data, because of its large size and range of data types.
Materials and Methods
We searched the public data browser, electronic health record (EHR), and several surveys to investigate the prevalence, mental health, healthcare access, and other data of select RUDs.
Results
Several RUDs have participants in All of Us [eg, 75 of 100 rare infectious diseases (RIDs)]. We generated health-related data for undiagnosed, sickle cell disease (SCD), cystic fibrosis (CF), and infectious (2 diseases) and chronic (4 diseases) disease pools.
Conclusion
Our results highlight the potential value of All of Us with both data breadth and depth to help identify possible solutions for shared and disease-specific biomedical and other problems such as healthcare access, thus enhancing diagnosis, treatment, prevention, and support for the RUD community.
Keywords: rare disease, rare and undiagnosed diseases, healthcare access, mental health, newborn screening
Introduction
To support individuals with rare and undiagnosed diseases (RUDs), a range of approaches are used, such as policies,1 registries (eg,2), networks (advocacy groups, research networks, eg,3,4), expert medical centers,5,6 rare disease (RD) databases (eg,7), Artificial Intelligence-directed data collation,8 integrative medicine,9 and study design algorithms.10 Despite these efforts, RUDs continue to present unique challenges, such as long or indefinite diagnostic odysseys, limited or absent treatments, expensive healthcare, and mental health and other concerns (11, Reviewed in Chung et al12). Patients, clinicians, researchers, policymakers, RUD advocacy, and other support groups are similarly tackling the basic issue that solutions are harder to find because these diseases are rare. Data is a critical factor that can help advance solutions across all these stakeholders (eg,13–16), and we propose that the All of Us Research Program database is uniquely poised to complement current RUD research due to its large size and range of data.
The goal of All of Us is to collect biomedical, behavioral, genomic, and other types of data from 1 million individuals, especially from underrepresented groups.17 Zeng et al,18 in their sweeping survey of prevalence measures across all types of diseases, found that both common and RDs are present in All of Us and that some RDs are enriched in All of Us. Similar large-scale efforts such as the UK Biobank19 and the RD and cancer-focused 100,000 Genomes Project20,21 have yielded progress. All of Us-based RD publications (eg,22) further support the idea that All of Us data is a potential substantial resource for RUDs.
This descriptive epidemiology study aims to build on these studies by highlighting how All of Us data could be used by various stakeholders in the RUD community. We determine the presence of select RDs in the publicly available All of Us data browser (PDB,23) and controlled-access data—electronic health records (EHRs) and personal and family health history survey (PFHHS)24 to show the different ways that RUD stakeholders (eg, family vs clinician/researcher) can explore All of Us. The extent to which RUDs, as identified by the All of Us program,25 are represented in the database was determined by finding lifetime prevalence, that is, the presence of a disease at any point during the course of life. There are estimated to be 6000+ RDs,26 and to demonstrate All of Us RUD data, we focused on the prevalence of 3 different RUD subcategories: (a) rare infectious diseases (RIDs), (b) newborn screening diseases, and (c) undiagnosed diseases (UD). Further, we identify several All of Us RUD-relevant data types (eg, employment, mental health, healthcare access, social parameters, etc,27 reviewed in Chung et al12 and Khan et al28) to demonstrate how the data can be used to develop solutions for multiple RDs, fulfilling a translational medicine principle that can be used with RDs.29–31 We hope that this study’s delineation of the many versatile ways in which All of Us data can be helpful will prompt future use by various stakeholders within the RUD community.
Methods
Extended methods are available in the Supplementary Materials. RDs are primarily defined by the genetic and rare disease (GARD) February 2024 List32 with possible additions. Please note that some diseases may be missing due to differences in categorization or naming and some discretion was used in our classification. Time of diagnosis was not factored into our calculations, so all prevalence data represents lifetime prevalence. This cross-sectional study uses data from the All of Us Research Program’s Controlled tier Access v7 Dataset(CD), available to authorized users on the Researcher Workbench. PDB is from the 2/15/2023 release.23
Prevalence of RD present in the PFHHS
RDs in All of Us PFHHS were identified with the exception of “Cancer Conditions.” Using CD, cohorts were built based on these inclusion criteria: completion of the PFHHS and diagnosis of a specific RUD found in EHR or PFHHS data.
RID prevalence
The “Conditions” category of the All of Us public data browser was used to determine the number of individuals with each of the 100 RIDs as defined by GARD.32 This is different from the Genetic and Rare Disease February 2024 List used for other parts of this study. Some RIDs may have been missed due to differences in naming and classification and some discretion was used.
RUSP prevalence
Using CD, 23 cohorts were created based on conditions listed in the recommended uniform screening panel (RUSP),33 released in 2023. Hearing loss and generalized sickle cell disease (SCD) were excluded. Data were gathered using the EHR information; due to a lack of distinction in the database, some cohorts include multiple types of the same condition. For some of the diseases with cohort sizes greater than 20, the prevalence in the US population was obtained.
Undiagnosed prevalence and overlap
The UD population was defined using the CD EHR Condition domain data. To determine the overlap of this subgroup with other RD subgroups, cohorts were built to include participants with UD and an additional RD per the conditions domain of their EHR.
Multidimensional investigation of health and lifestyle topics related to RDs
Cohorts were designed within the Cohort Builder of the Researcher Workbench using CD. Multiple All of Us data sources were used to describe many different types of healthcare and lifestyle-related data. Specifically, 5 surveys and 2 domains of EHR data were used to construct a plethora of cohorts which were then used to mine these data. For specific cohort design parameters, see Table S3.
Results
All of Us has different data sources34 such as EHR and surveys, like the PFHHS, which asks about specific medical conditions, including at least 10 RDs, indicating All of Us’s inclusivity of the RD population. Aggregate data can be accessed by anyone on the internet through the PDB and individuals can become All of Us researchers for further access.35 Table 1 shows that the prevalence of select RDs is similar between the PDB, controlled PFHHS, controlled EHR data, and as reported by Zeng et al.18 The PDB flagged participants for 75 of the 100 RIDs assessed, while EHR data are present for several RUSP diseases (Table 1, Tables S1 and S2). Please note that, depending on categorization, some RDs overlapped subcategories (eg, SCD). Participants were identified with both UD and RD data (Figure 1).
Table 1.
Prevalence of RUDs in All of Us.
|
All of Us prevalence from different sources | ||||
|---|---|---|---|---|
| Rare disease | Prevalence—PDB EHR conditions (n = 254 700) | Prevalence—controlled access PFHHS (n = 185 232) | Prevalence—controlled access EHR conditions (n = 250 242) | Prevalence—All of Us per Zeng et al 18 |
| Sickle cell | 0.002356 (600) | 0.0016 (301) | 0.001007 (252) | 0.002 |
| Systemic lupus | 0.014998 (3820) | 0.0091 (1685) | 0.015217 (3808) | 0.013 |
| Dengue fever | 0.000157 (40) | 0.0040 (741) | 0.000152 (38) | 0.000146 |
| West Nile virus | 0.000236 (60) | 0.0011 (204) | 0.000180 (45) | N/A |
| Zika virus | <=20 | 0.0005 (94) | <=20 | 9.8e−05 |
| SARS | 0.000707 (180) | 0.0027 (499) | 0.000707 (177) | 0.060 |
| Lou Gehrig’s/Amyotrophic Lateral Schlerosis | 0.000707 (180) | 0.0003 (60) | 0.000663 (166) | N/A |
| Tuberculosis | 0.007774 (1980) | 0.0093 (1720) | 0.007880 (1972) | 0.004989 |
| Spinal cord injury | N/D | 0.0232 (4296) | 0.004304 (1077) | N/D |
|
Lyme disease |
0.008324 (2120) |
0.0225 (4175) |
0.008468 (2119) |
0.007 |
|
RID (from PDB EHR) | ||||
|
# Participants |
# RIDs with participant data (100 RIDs total) |
|||
| 0 | 25 | |||
| <=20 | 48 | |||
| 20-200 | 21 | |||
|
>200 |
6 |
|||
|
RID condition >200 participants |
Prevalence in PDB (n = 254 700) |
Prevalence in All of Us per Zeng et al 18 |
||
| Actinomycosis | 0.0013 (340) | N/A | ||
| Aspergillosis | 0.0017 (440) | 0.002 | ||
| Bacterial endocarditis | 0.0012 (300) | N/D | ||
| Chronic Epstein-Barr virus | 0.0009 (220) | N/A | ||
| Coccidioidomycosis | 0.0027 (700) | N/A | ||
|
Pneumocystosis |
0.0014 (360) |
N/A |
||
|
Prevalence of RUSP (from controlled access EHR) | ||||
|
# Participants |
# Primary conditions (#total = 35) |
# Secondary conditions (#total = 26) |
||
| 0 | 14 | 20 | ||
| <=20 | 15 | 6 | ||
|
>20 |
6 |
0 |
||
|
RUSP condition >20 |
All of Us prevalence (n = 287 012) |
U.S. prevalence |
Prevalence in All of Us per Zeng et al 18 |
|
| Congenital adrenal hyperplasia | 1:4283 (67) | 1:166636 | N/A | |
| Congenital hypothyroidism | 1:976 (294) | 1:14 28536 | 0.001 | |
| CF | 1:639 (449) | ∼40 00037 | 0.002 | |
| Homocystinuria | 1:1112 (258) | 1:10 00038 | N/A | |
| Sickle cell anemia | 1:349 (823) | ∼120 00039 | 0.002 | |
| Sickle cell β-thalassemia | 1:2080 (138) | 1:100 00040 | 0.000 | |
The All of Us prevalence from multiple All of Us sources and subcategories RID and RUSP.
Abbreviations: CF, cystic fibrosis; EHR, electronic health record; N/A, not present; N/D, not distinct; PFHHS, personal and family health history; RID, rare infectious disease; RUD, rare and undiagnosed disease; RUSP, recommended uniform screening panel.
Figure 1.
Rare disease (RD) subcategories overlap with undiagnosed subgroups. “Chronic Pool” includes participants with systemic lupus, muscular dystrophy (MD), multiple sclerosis, or amyotrophic lateral sclerosis. “Infectious Pool” includes participants with tuberculosis or Lyme disease. “<=20” denotes a count of less than or equal to 20 participants. All RD subgroups were found to overlap with the undiagnosed disease (UD) population, meaning all subgroups were found to have some participants with a UD in addition to specific RDs.
To show the versatility of All of Us data to serve the RUD population, we then identified RUD participants with different types of All of Us data (Table 2). While we recognize that all RUD stakeholders could find all the data valuable, we suggest possible primary stakeholders for our diverse findings. The UD population had higher employment than the general population. Social satisfaction is an indicator of social quality of health,41 and the UD population had higher social satisfaction numbers. Health insurance coverage data were available across all studied groups. Depression prevalence was similar to or higher than the reference population, with the highest subgroup being the family of muscular dystrophy (MD) (Table 2). Participants have records showing the use of medications for SCD (hydroxyurea42) and CF (elexacaftor, tezacaftor, or ivacaftor43). For some RD subgroups, the expense of mental health care and prescription medicine was a limiting factor to treatment in the last 12 months, but at similar levels to the reference population. Insurance acceptance data were available for all of the studied groups as well, but some groups were less than 20. Finally, we describe levels of community support, which required the data to be coarsened to conform to All of Us data dissemination policy; however, it could be noted that RUD subgroups had less support than the reference population.
Table 2.
Demonstrating the breadth and depth of All of Us.
| Primary All of Us data source | Parameter selected | Cohort | Results | Likely primary stakeholders |
|---|---|---|---|---|
| Basics survey | Unemployment status | Surveys (n = 178 102) | 13.5% | Social workers, policymakers, advocacy groups |
| UD (n = 420) | 7.6% | |||
| Covered by insurance | Surveys (n = 178 102) | 94% | Social workers, policymakers, advocacy groups | |
| UD (n = 420) | >85%* | |||
| SCD (n = 182) | >85%* | |||
| CF (n = 193) | >85%* | |||
| Infectious (n = 5737) | 96% | |||
| Chronic (n = 3417) | 96% | |||
| Overall health survey | Social satisfaction (social indicators of health) | Surveys (n = 178 102) | 56.8% | Families, advocacy groups, social workers, policymakers |
| UD (n = 420) | 65.5% | |||
| PFHH survey | Depression prevalence (mental health) | Surveys (n = 178 102) | 29.9% | Mental health professionals, social workers, policymakers |
| UD (n = 420) | 31.9% | |||
| SCD (n = 182) | 33.0% | |||
| CF (n = 193) | 33.2% | |||
| PFHH survey | Depression prevalence (mental health) | PFHH survey (n = 185 232) | 29.4% | |
| Family of SCD (n = 702) | 32.1% | |||
| Family of MD (n = 404) | 43.1% | |||
| Drug domain of EHR data | Medication record | SCD (n = 182) hydroxyurea | 15.9% | Clinicians, social workers |
| CF (n = 193) ivacaftor, tezacaftor, or elexacaftor | 12.4% | |||
| Healthcare Access and Utilization (HAU) survey | Mental healthcare too expensive in last 12 months | Surveys (n = 178 102) | 8% | Social workers, advocacy groups, mental health professionals, policymakers |
| UD (n = 420) | <=20 | |||
| SCD (n = 182) | 12% | |||
| CF (n = 193) | <=20 | |||
| Infectious (n = 5737) | 8% | |||
| Chronic (n = 3417) | 12% | |||
| Prescriptions too expensive in last 12 months | Surveys (n = 178 102) | 11% | Social workers, advocacy groups, clinicians, policymakers | |
| UD (n = 420) | 8% | |||
| SCD (n = 182) | 20% | |||
| CF (n = 193) | 19% | |||
| Infectious (n = 5737) | 11% | |||
| Chronic (n = 3417) | 21% | |||
| Insurance acceptance problems | Surveys (n = 178 102) | 11% | Social workers, advocacy groups, clinicians, policymakers | |
| UD (n = 420) | 5% | |||
| SCD (n = 182) | 18% | |||
| CF (n = 193) | <=20 | |||
| Infectious (n = 5737) | 13% | |||
| Chronic (n = 3417) | 16% | |||
| Social determinants of health survey (SDH) | Community support with medical visits or meal preparation when needed | SDH survey (n = 117 783) | >99%* | Families, social workers |
| UD (n = 338) | >75%* | |||
| SCD (n = 98) | >75%* | |||
| CF (n = 107) | >75%* |
RUD participant data from multiple All of Us data categories. “Surveys” under the cohort column denotes All of Us population that completed the PFHH, basics, and HAU surveys; “*” denotes data that were coarsened to comply with All of Us data dissemination policy; “Infectious Pool” includes individuals with Lyme disease or tuberculosis; “Chronic Pool” includes individuals with systemic lupus, MD, multiple sclerosis, or Amyotrophic Lateral Schlerosis.
Abbreviations: CF, cystic fibrosis; EHR, electronic health record; MD, muscular dystrophy; PFHH survey, personal and family health history; RUD, rare and undiagnosed disease; SCD, sickle cell disease; UD, undiagnosed disease.
Discussion
Our descriptive epidemiology study shows that All of Us is a valid resource for RUD data since All of Us (a) contains data for many RUDs and (b) has individuals with multiple accompanying types of actionable RUD-relevant data. These data complement earlier All of Us RD work18 with comparable prevalence data. This database therefore reflects what would normally be collected by several registries, overcoming certain registry limitations44 since data are streamlined and participant protection is carefully monitored. Further, because these diseases are rare, it can be difficult to find others with the same disease45 and the PDB provides an opportunity for anyone (eg, patients) to easily know if there are individuals with specific RDs in All of Us. Indeed, it is unusual to have a single data source that includes so many different types of data for each participant, and even for relatives.
This study demonstrates the versatility of All of Us RUD data by identifying participants with data that could be actionable for multiple stakeholders. The employment, insurance, social satisfaction, mental health, medication records, healthcare access, and community support data from All of Us certainly have implications for many stakeholders (Table 2). We hope that this highlights that All of Us is like a combination of registries and that this prompts further characterization of All of Us to advance the understanding of the natural history of RUDs, chronicling of diagnostic and treatment strategies and increased awareness, and direction to address financial and other challenges.44 There are also policy implications for All of Us RUD data, where, for example, states choose which of the nationally RUSP newborn screening diseases to adopt,46 and so the presence of RUSP data in All of Us could be used to support state policy campaigns. Mental health, healthcare access, and other data for the RUDs assessed in All of Us show varied trends, some of which are consistent with the literature27,47–52 (reviewed in Chung et al12). This study adds to mental health and other research53,54 indicating that All of Us can be used for advocacy. Artificial Intelligence has been used to mine All of Us to identify new uses for drugs,55 and RD data in All of Us could be used similarly and add to RD-focused and broader efforts (eg,56). While we only addressed a few survey questions, there are many others that can highlight more nuances in future studies. The diagnostic odyssey is a problem for the RUD community12 and All of Us has several UD participants with data that can be explored. A closer investigation of the subset of UD participants with comorbid RD diagnosis may yield insights into the diagnostic odyssey. All of Us genomic data have shown the presence of new variants57 and thus hold promise for solutions for the UD population.58
While these findings indicate that All of Us is indeed a compelling source of RUD data, we recognize that there are limitations. For example, currently, All of Us only has data from adults, and so childhood RDs are not included,59 (also noted by Zeng et al18) which may soon be resolved with the anticipated inclusion of children.60 There are many other RUDs absent from All of Us, including those investigated in this study (Table 1, Tables S1 and S2), which had no participants. Additionally, a large amount of data gathered in this study were collected via survey, introducing potential self-reporting bias. Nevertheless, the existence of this rich source of RUD data holds many promises. For example, while our conclusions are limited due to the cross-sectional nature of this study, especially for the RID, mental health, and comorbidity of undiagnosed and RDs, All of Us does contain temporal data that could be explored in future studies. It has been highlighted that All of Us does not reflect the prevalence of the US population because some groups are over- or underrepresented,18,61–64 limiting generalizability. Our findings (Table 1) and those by Zeng et al18 suggest that the prevalence of some RDs in All of Us may be higher than in the broader population, and we posit that this is a huge advantage of this resource, given the limited data on RUDs.
Finally, this study highlights some of the RUD descriptive epidemiology present in All of Us and hopefully will encourage others to conduct similar work and further pursue analytic epidemiology using this rich data source. RUD stakeholders can find data for undiagnosed and specific RDs, or data relevant to RDs collectively from All of Us to help advance the work to support this community.
Supplementary Material
Acknowledgments
We thank the All of Us Biomedical Research Scholars Program (Baylor College of Medicine All of Us Evenings with Genetics) and the CURing All Of Us Team for supporting AT. We also appreciate the assistance provided by the All of Us Evenings with Genetics Office Hours.
We gratefully acknowledge All of Us participants for their contributions, without whom this study would not be possible. We also thank the National Institutes of Health’s All of Us Research Program for making available the participant and cohort data examined in this study.
The All of Us Research Program is supported by the National Institutes of Health, Office of the Director: Regional Medical Centers: 1 OT2 OD026549; 1 OT2 OD026554; 1 OT2 OD026557; 1 OT2 OD026556; 1 OT2 OD026550; 1 OT2 OD 026552; 1 OT2 OD026553; 1 OT2 OD026548; 1 OT2 OD026551; 1 OT2 OD026555; IAA #: AOD 16037; Federally Qualified Health Centers: HHSN 263201600085U; Data and Research Center: 5 U2C OD023196; Biobank: 1 U24 OD023121; The Participant Center: U24 OD023176; Participant Technology Systems Center: 1 U24 OD023163; Communications and Engagement: 3 OT2 OD023205; 3 OT2 OD023206; and Community Partners: 1 OT2 OD025277; 3 OT2 OD025315; 1 OT2 OD025337; 1 OT2 OD025276.
Contributor Information
Drenen J Magee, Department of Biological and Health Sciences, Crown College, St Bonifacius, MN 55375, United States.
Sierra Kicker, Department of Biological and Health Sciences, Crown College, St Bonifacius, MN 55375, United States.
Aeisha Thomas, Department of Biological and Health Sciences, Crown College, St Bonifacius, MN 55375, United States.
Author contributions
Drenen J. Magee and Sierra Kicker created cohorts, analyzed data, and edited the manuscript. Aeisha Thomas supervised the project, did some data collation, wrote the original draft, and edited the manuscript.
Supplemental material
Supplementary materialis available at Journal of the American Medical Informatics Association online.
Funding
Support for AT was from the All of Us Evenings with Genetics Research Program under award number EWG-23-CE-313. The program, in part, is funded by the NIH All of Us Research Program, 1 OT2 OD031932-01.
Conflicts of interest
The authors have no competing interests.
Data availability
This study uses data from the All of Us Research Program’s Controlled Tier version 7, available to authorized users on the Researcher Workbench. The data are available in the All of Us Workspace and can be shared with registered users on request.
References
- 1. Lopes-Júnior LC, Ferraz VEF, Lima RAG, et al. Health policies for rare disease patients: a scoping review. Int J Environ Res Public Health. 2022;19(22):15174. 10.3390/ijerph192215174. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. De Antonio M, Dogan C, Daidj F, et al. ; The Filnemus Myotonic Dystrophy Study Group. The DM-scope registry: a rare disease innovative framework bridging the gap between research and medical care. Orphanet J Rare Dis. 2019;14(1):122. 10.1186/s13023-019-1088-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Advancing Sickle Cell Advocacy Project Inc. Accessed April, 2024. https://www.advancingsicklecelladvocacyproject.org/
- 4. NIH Undiagnosed Diseases Network. Accessed May, 2024. https://www.ninds.nih.gov/current-research/focus-disorders/focus-undiagnosed-diseases-network
- 5. Syed AM, Camp R, Mischorr-Boch C, et al. Policy recommendations for rare disease centres of expertise. Eval Program Plann. 2015;52:78-84. 10.1016/j.evalprogplan.2015.03.006. [DOI] [PubMed] [Google Scholar]
- 6. NORD Rare Disease Centers of Excellence. Accessed April, 2024. https://rarediseases.org/center-of-excellence/
- 7. Orphanet. Accessed April 5, 2024. https://www.orpha.net
- 8. Kariampuzha WZ, Alyea G, Qu S, et al. Precision information extraction for rare disease epidemiology at scale. J Transl Med. 2023;21(1):291. 10.1186/s12967-023-04011-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Pinto e Vairo F, Kemppainen JL, Vitek CRR, et al. Implementation of genomic medicine for rare disease in a tertiary healthcare system: Mayo Clinic Program for Rare and Undiagnosed Diseases (PRaUD). J Transl Med. 2023;21(1):410. 10.1186/s12967-023-04183-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Whicher D, Philbin S, Aronson N. An overview of the impact of rare disease characteristics on research methodology. Orphanet J Rare Dis. 2018;13(1):14. 10.1186/s13023-017-0755-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Nunn R. “It’s not all in my head!”—the complex relationship between rare diseases and mental health problems. Orphanet J Rare Dis. 2017;12(1):29. 10.1186/s13023-017-0591-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Chung CCY, Chu ATW, Chung BHY, et al. ; Hong Kong Genome Project. Rare disease emerging as a global public health priority. Front Public Health. 2022;10(1):1028545. 10.3389/fpubh.2022.1028545. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Stoller JK. The challenge of rare diseases. Chest. 2018;153(6):1309-1314. 10.1016/j.chest.2017.12.018. [DOI] [PubMed] [Google Scholar]
- 14. Richards D. Seeing is believing: invisibility exacerbates inequality for patients living with rare disease. EMJ. 2022;7(3):17-22. 10.33590/emj/10149519. [DOI] [Google Scholar]
- 15. Ekins S, Perlstein EO. Doing it all—how families are reshaping rare disease research. Pharm Res. 2018;35(10):192. 10.1007/s11095-018-2481-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Yoon S, Lee M, Jung HI, et al. Prioritization of research engaged with rare disease stakeholders: a systematic review and thematic analysis. Orphanet J Rare Dis. 2023;18(1):363. 10.1186/s13023-023-02892-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. All of Us Research Program Strategic Goals. Accessed April, 2024. https://allofus.nih.gov/about/program-goals
- 18. Zeng C, Schlueter DJ, Tran TC, et al. Comparison of phenomic profiles in the All of Us Research Program against the US general population and the UK Biobank. J Am Med Inform Assoc. 2024;31(4):846-854. 10.1093/jamia/ocad260. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Patrick MT, Bardhi R, Zhou W, et al. Enhanced rare disease mapping for phenome-wide genetic association in the UK Biobank. Genome Med. 2022;14(1):85. 10.1186/s13073-022-01094-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Turro E, Astle WJ, Megy K, et al. ; NIHR BioResource for the 100,000 Genomes Project. Whole-genome sequencing of patients with rare diseases in a national health system. Nature. 2020;583(7814):96-102. 10.1038/s41586-020-2434-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Greene D, Pirri D, Frudd K, et al. ; Genomics England Research Consortium. Genetic association analysis of 77,539 genomes reveals rare disease etiologies. Nat Med. 2023;29(3):679-688. 10.1038/s41591-023-02211-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Murphy MJ, Leasure AC, Damsky W, et al. Association of sarcoidosis with psoriasis: a cross-sectional study in the All of Us research program. Arch Dermatol Res. 2023;315(5):1439-1441. 10.1007/s00403-022-02488-z. [DOI] [PubMed] [Google Scholar]
- 23. All of Us Research Hub - Data Browser. Accessed July, 2024. https://databrowser.researchallofus.org/
- 24. All of Us Research Hub - Survey Explorer. Accessed April, 2024. https://www.researchallofus.org/data-tools/survey-explorer/
- 25. All of Us Research Hub - Data Methods. Accessed May, 2024. https://www.researchallofus.org/data-tools/methods/
- 26. Haendel M, Vasilevsky N, Unni D, et al. How many rare diseases are there? Nat Rev Drug Discov. 2020;19(2):77-78. 10.1038/d41573-019-00180-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Bogart K, Hemmesch A, Barnes E, et al. ; Chloe Barnes Advisory Council on Rare Diseases. Healthcare access, satisfaction, and health-related quality of life among children and adults with rare diseases. Orphanet J Rare Dis. 2022;17(1):196. 10.1186/s13023-022-02343-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Khan H, Krull M, Hankins JS, et al. Sickle cell disease and social determinants of health: a scoping review. Pediatr Blood Cancer. 2023;70(2):e30089. 10.1002/pbc.30089. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. NCATS - Translational Science Principles. Accessed April, 2024. https://ncats.nih.gov/about/about-translational-science/principles
- 30. NCATS - Our Impact on Rare Diseases. Accessed November, 2024. https://ncats.nih.gov/research/our-impact/our-impact-rare-diseases#:∼:text=NCATS%20is%20the%20heart%20of%20rare%20diseases%20research,address%20more%20than%20one%20disease%20at%20a%20time [Google Scholar]
- 31. Network of the National Library of Medicine. Rare Diseases are Not Rare - A Training on Rare Disease Resources [Video]. Youtube; 2022. Accessed November 2024. https://www.youtube.com/watch?v=kLBq6V1_wc4
- 32. NIH Genetic and Rare Diseases Information Center. Accessed March-April, 2024. https://rarediseases.info.nih.gov/
- 33. HRSA Recommended Uniform Screening Panel. Accessed January-April, 2024. https://www.hrsa.gov/advisory-committees/heritable-disorders/rusp
- 34. All of Us Research Hub - Data Sources. Accessed May, 2024. https://www.researchallofus.org/data-tools/data-sources/
- 35. All of Us Research Hub - Data Access Tiers. Accessed May, 2024. https://researchallofus.org/data-tools/data-access/
- 36. Sontag MK, Yusuf C, Grosse SD, et al. Infants with congenital disorders identified through newborn screening—United States, 2015–2017. MMWR Morb Mortal Wkly Rep. 2020;69(36):1265-1268. https://www.cdc.gov/mmwr/volumes/69/wr/mm6936a6.htm [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Singh H, Jani C, Marshall DC, et al. Cystic fibrosis-related mortality in the United States from 1999 to 2020: an observational analysis of time trends and disparities. Sci Rep. 2023;13(1):15030. 10.1038/s41598-023-41868-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Sellos-Moura M, Glavin F, Lapidus D, et al. Prevalence, characteristics, and costs of diagnosed homocystinuria, elevated homocysteine, and phenylketonuria in the United States: a retrospective claims-based comparison. BMC Health Serv Res. 2020;20(1):183. 10.1186/s12913-020-5054-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Fu Y, Andemariam B, Herman C. Estimating sickle cell disease prevalence by state: a model using US-born and foreign-born state-specific population data. Blood. 2023;142(Supplement 1):3900. 10.1182/blood-2023-189287. [DOI] [Google Scholar]
- 40. National Organization for Rare Diseases - Beta Thalassemia. Accessed March, 2024. https://rarediseases.org/rare-diseases/thalassemia-major/#affected
- 41. Abbott P, Wallace C. Social quality: a way to measure the quality of society. Soc Indic Res. 2012;108(1):153-167. 10.1007/s11205-011-9871-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Agrawal RK, Patel RK, Shah V, et al. Hydroxyurea in sickle cell disease: drug review. Indian J Hematol Blood Transfus. 2014;30(2):91-96. 10.1007/s12288-013-0261-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Ridley K, Condren M. Elexacaftor-tezacaftor-ivacaftor: the first triple-combination cystic fibrosis transmembrane conductance regulator modulating therapy. J Pediatr Pharmacol Ther. 2020;25(3):192-197. 10.5863/1551-6776-25.3.192. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Kölker S, Gleich F, Mütze U, et al. Rare disease registries are key to evidence-based personalized medicine: highlighting the European experience. Front Endocrinol (Lausanne). 2022;13:832063. 10.3389/fendo.2022.832063. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Groft SC, Posada de la Paz M. Rare diseases: joining mainstream research and treatment based on reliable epidemiological data. In: Posada de la Paz M, Taruscio D, Groft S, eds. Rare Diseases Epidemiology: Update and Overview. Springer; 2017:3-21. [DOI] [PubMed] [Google Scholar]
- 46. Brower A, Chan K, Williams M, et al. Population-based screening of newborns: findings from the NBS expansion study (part one). Front Genet. 2022;13:867337. 10.3389/fgene.2022.867337. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Mund M, Uhlenbusch N, Rillig F, et al. Psychological distress of adult patients consulting a center for rare and undiagnosed diseases: a cross-sectional study. Orphanet J Rare Dis. 2023;18(1):82. 10.1186/s13023-023-02669-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Spencer-Tansley R, Meade N, Ali F, et al. Mental health care for rare disease in the UK—recommendations from a quantitative survey and multi-stakeholder workshop. BMC Health Serv Res. 2022;22(1):648. 10.1186/s12913-022-08060-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Oudin Doglioni D, Chabasseur V, Barbot F, et al. Depression in adults with sickle cell disease: a systematic review of the methodological issues in assessing prevalence of depression. BMC Psychol. 2021;9(1):54. 10.1186/s40359-021-00543-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Lord L, McKernon D, Grzeskowiak L, et al. Depression and anxiety prevalence in people with cystic fibrosis and their caregivers: a systematic review and meta-analysis. Soc Psychiatry Psychiatr Epidemiol. 2023;58(2):287-298. 10.1007/s00127-022-02307-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Guta MT, Tekalign T, Awoke N, Fite RO, Dendir G, Lenjebo TL. Global burden of anxiety and depression among cystic fibrosis patient: systematic review and meta-analysis. Int J Chronic Dis. 2021;2021:6708865. 10.1155/2021/6708865 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Yousif M, Abdelrahman A, Al Jamea LH, et al. Psychosocial impact of sickle cell disease and diabetes mellitus on affected children and their parents in Khartoum State, Sudan. J Trop Pediatr. 2022;68(3):1–12. 10.1093/tropej/fmac042. [DOI] [PubMed] [Google Scholar]
- 53. Lee TC, Radha-Saseendrakumar B, Delavar A, et al. Evaluation of depression and anxiety in a diverse population with thyroid eye disease using the nationwide NIH All of Us database. Ophthalmic Plast Reconstr Surg. 2023;39(3):281-287. 10.1097/IOP.0000000000002318. [DOI] [PubMed] [Google Scholar]
- 54. Siebold D, Denton J, Hurst ACE, Moss I, Korf B. A qualitative evaluation of patient and parent experiences with an undiagnosed diseases program. Am J Med Genet A. 2024;194(2):131-140. 10.1002/ajmg.a.63417. [DOI] [PubMed] [Google Scholar]
- 55. Yan C, Grabowska ME, Dickson AL, et al. Leveraging generative AI to prioritize drug repurposing candidates for Alzheimer’s disease with real-world clinical validation. NPJ Digit Med. 2024;7(1):46. 10.1038/s41746-024-01038-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Every Cure. About Us – Every Cure. Accessed December 2024. https://linkprotect.cudasvc.com/url?a=https%3a%2f%2feverycure.org%2fabout%2f&c=E,1,UcifKIHWEj51mSDg9XEPGhgHQez5nBsrM4oRr6NDN-Z3ypyNG56-o91mFJIG6PzzY3_kaDnJf7QxhMl8HsiQQlk6QiVK7Fb679LMhDYv&typo=1.
- 57. The All of Us Research Program Genomics Investigators. Genomic data in the All of Us Research Program. Nature. 2024;627(8003):340-346. 10.1038/s41586-023-06957-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58. Marwaha S, Knowles JW, Ashley EA. A guide for the diagnosis of rare and undiagnosed disease: beyond the exome. Genome Med. 2022;14(1):23. 10.1186/s13073-022-01026-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59. All of Us Research Program - Who Can Join? Accessed April, 2024. https://www.joinallofus.org/who-can-join
- 60. All of Us Research Program - FAQ. Accessed May, 2024. https://allofus.nih.gov/about/faq
- 61. Patil MK. Enhancing study designs of disease prevalence investigations conducted with the All of Us Research Program. J Am Acad Dermatol. 2024;90(5):e181-e182. 10.1016/j.jaad.2023.12.059. [DOI] [PubMed] [Google Scholar]
- 62. Kam O, Osborne S, Wescott R, et al. Prevalence of calcinosis cutis in the United States using the All of Us research database. J Am Acad Dermatol. 2024;90(2):405-406. 10.1016/j.jaad.2023.09.076. [DOI] [PubMed] [Google Scholar]
- 63. Joshi TP, Calderara GA, Lipoff JB. Prevalence of pityriasis rosea in the United States: a cross-sectional study using the All of Us database. JAAD Int. 2022;8:45-46. 10.1016/j.jdin.2022.04.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64. Leasure AC, Cohen JM. Prevalence of lichen planus in the United States: a cross-sectional study of the All of Us research program. J Am Acad Dermatol. 2022;87(3):686-687. 10.1016/j.jaad.2021.12.013. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
This study uses data from the All of Us Research Program’s Controlled Tier version 7, available to authorized users on the Researcher Workbench. The data are available in the All of Us Workspace and can be shared with registered users on request.

