Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Sep 1.
Published in final edited form as: Mov Disord. 2023 Jun 15;38(9):1679–1687. doi: 10.1002/mds.29507

Adjusting for underrepresentation reveals widespread underestimation of Parkinson disease symptom burden

Ali G Hamedani 1,2,3, Peggy Auinger 4, Allison W Willis 1,2,3, Delaram Safarpour 5, David Shprecher 6, Natividad Stover 7, Thyagarajan Subramanian 8, Leslie Cloud 9
PMCID: PMC10524668  NIHMSID: NIHMS1904753  PMID: 37318322

Abstract

Background:

Clinical research is limited by underrepresentation, but the impact of underrepresentation on patient-reported outcomes in PD is unknown.

Objectives:

To produce nationwide estimates of non-motor symptoms (NMS) prevalence and PD-related quality of life (QOL) limitations while accounting for underrepresentation.

Methods:

We performed a cross-sectional analysis of data from the Fox Insight (FI) study, an ongoing prospective longitudinal study of persons with self-reported PD. Using epidemiologic literature and U.S. Census Bureau, Medicare, and National Health and Aging Trends Study data, we simulated a “virtual census” of the PD population. To compare the PD census to the FI cohort, we used logistic regression to model the odds of study participation and calculate predicted probabilities of participation for inverse probability weighting.

Results:

There are an estimated 849,488 persons living with PD in the U.S. Compared to 22,465 eligible FI participants, non-participants are more likely to be older, female, and non-White; live in rural regions; have more severe PD; and have lower levels of education. When these predictors were incorporated into a multivariable regression model, predicted probability of participation was much higher for FI participants than non-participants, indicating a significant difference in the underlying populations (propensity score distance 2.62). Estimates of NMS prevalence and QOL limitation were greater when analyzed using inverse probability of participation weighting compared to unweighted means and frequencies.

Conclusions:

PD-related morbidity may be underestimated because of underrepresentation , and inverse probability of participation weighting can be used to give greater weight to underrepresented groups and produce more generalizable estimates.

INTRODUCTION:

Non-motor symptoms (NMS) are a significant source of patient and caregiver distress and reduced quality of life (QOL) in Parkinson disease (PD)1,2. However, the total burden of NMS in the PD population is unknown, in large part because these symptoms are frequently underreported and underdiagnosed in clinical practice3. Screening questionnaires have been shown to improve the detection of NMS in PD and have been deployed in cohort studies and clinical trials4, but these studies are largely limited to tertiary academic centers and are susceptible to selection bias from underrepresentation in clinical research. Specifically, patients from racial and ethnic minority backgrounds are less likely to be enrolled in clinical trials of PD57, and because of known racial and socioeconomic disparities in PD severity and treatment outcomes810, extrapolating data from White and higher socioeconomic status patients to other patient populations is not valid.

Efforts to improve diversity and inclusion in PD research are ongoing11,12; however, these efforts will take years due to numerous systemic barriers. In the meantime, it is important to understand the impact of underrepresentation on NMS and QOL research in PD and to account for it when using existing research resources. Because all research studies are, by definition, non-random samples of the underlying population, statistical weighting is a promising method of improving generalizability when the study sample suffers from underrepresentation. By design, national health surveys such as the National Health and Nutrition Examination Survey (NHANES) are conducted as complex stratified rather than simple random samples, and design weights are used to weight results to the general population13. When the source population for a research study is defined – for example, a cohort of patients of whom a subset participates in a randomized controlled trial – the probability of study participation can be modeled, and outcomes can be adjusted using inverse probability of participation weighting to reduce selection bias and estimate what would have been observed had the entire population participated1416. However, there is currently no way of defining the entire population with Parkinson disease or any other neurologic disorder in the U.S.

In this study, we analyzed data from the Fox Insight (FI) Study, an ongoing prospective longitudinal study of more than 35,000 PD patients17, and using a simulated “virtual census” of the PD population in the U.S., we employed inverse probability of participation weighting to produce national estimates of non-motor symptom prevalence, daily activity limitations, and QOL. We hypothesized that underrepresentation biases patient-reported outcomes (PROs) in favor of milder symptoms and fewer QOL limitations and that placing greater emphasis on underrepresented groups at risk for negative PD-related outcomes through sample weighting would yield higher estimates that better approximate the health of the general PD population.

METHODS:

Standard Protocol Approvals, Registrations, and Patient Consents:

The University of Pennsylvania Institutional Review Board approved this study. The FI Study was approved by the New England IRB, and informed consent was obtained at the time of enrollment.

Study overview:

We constructed a “virtual census” of the U.S. PD population using a combination of epidemiologic, Medicare, and national health survey data. We then merged the virtual census with the FI dataset according to demographic and clinical characteristics such that each real FI participant was matched to a hypothetical individual in the virtual census. Using this combined dataset, we constructed a logistic regression model with FI participation as the outcome and calculated the predicted probability that each hypothetical PD patient in the virtual census had participated in FI. Using PROs and predicted probabilities of participation for the FI cohort, we calculated inverse probability of participation-weighted estimates of NMS prevalence and QOL. Each of these steps is discussed in detail below.

Virtual PD census derivation:

We simulated a “virtual census” of the U.S. PD population, with each hypothetical individual characterized according to age, gender, race/ethnicity, geographic region, urban/rural residence, level of education, and PD severity. We began with published data from the Parkinson’s Foundation P4 project, which combined data from multiple population-based cohort studies and administrative claims datasets to estimate the prevalence of PD in North America stratified by age and gender18, and multiplied this by the age- and gender-stratified U.S. population from the 2019 American Community Survey to calculate the estimated number of people in the U.S. with PD (Table 1)19. Age was subsequently categorized as 45–64, 65–74, or ≥75 years. For each stratum of age and gender, we calculated the number of PD patients who had mild (Hoehn & Yahr [HY] 1–2), moderate (HY 3), or severe (HY 4–5) disease using estimates from population-based epidemiologic data20.

Table 1:

Age- and gender-specific prevalence of Parkinson disease in the U.S.

Meta-estimated prevalence of PD per 100,000* U.S. Population** Estimated number of persons with PD in the U.S.
Female 45–54 yrs 46 21320518 9807
Female 55–64 yrs 184 21610185 39763
Female 65–74 yrs 616 15739252 96954
Female 75–84 yrs 1638 8455176 138496
Female ≥85 yrs 2284 4070765 92976
Male 45–54 yrs 68 20752102 14111
Male 55–64 yrs 273 20146229 54999
Male 65–74 yrs 1022 13803014 141067
Male 75–84 yrs 2658 6517337 173231
Male ≥85 yrs 4007 2198252 88084
*

From Marras et al., NPJ PD 201818

**

According to 5-year estimates from the 2019 American Community Survey19

Next, we used Medicare data to calculate a series of conditional probabilities describing the U.S. PD population according to race, geographic region, and urban/rural residence21,22. Using the 2015–2017 carrier files, we identified all Medicare beneficiaries aged 65 or older who had at least two claims for PD (ICD-10-CM code G20) on different dates. We excluded individuals with claims for other neurologic disorders that could complicate or preclude a diagnosis of idiopathic PD (ICD-10-CM codes G21.11, G21.19, G21.8, G23.0, G23.1, G23.2, G23.8, A52.19, G12.21, or G31.83). This case definition has very good sensitivity and positive predictive value (89.2% and 79.4%, respectively) to identify PD in administrative claims data23. We merged the carrier file data with the Master Beneficiary Summary File (MBSF) to obtain age, gender, race, ethnicity, and county. Rural-Urban Continuum Codes were used to convert county to census division and metropolitan vs. non-metropolitan residence24. Using the combined carrier-MBSF dataset, we calculated a series of conditional probabilities for each demographic characteristic according to strata defined by other variables in the dataset. We tabulated race/ethnicity within strata of age and gender; census division within strata of age, gender, and race/ethnicity; and rural/urban residence within strata of age, gender, race/ethnicity, and census division.

Education level is an important determinant of research participation25 but is not captured in epidemiologic or Medicare data, so we determined the distribution of educational background in the PD population using Medicare-linked data from the National Health and Aging Trends Study (NHATS). NHATS is a nationally representative sample of Medicare beneficiaries aged 65 or older who have been surveyed annually since 2011, with replenishment of the sample in 201526. Through linked Medicare claims data, we identified all NHATS participants who were in the 2011 cohort and met our Medicare criteria for PD diagnosis. Using NHATS sample weights, we categorized self-reported education for the PD population as follows: less than high school, high school/some college, bachelor/associate degree, or master/professional/doctoral degree. Of note, education may be associated with a number of sociodemographic factors including race/ethnicity, gender, geographic region, and rural/urban residence, but the NHATS PD sample was too small to permit stratification on all of these variables. To determine which were significantly associated with level of education, we constructed an ordinal generalized linear model of education as a function of age, gender, race/ethnicity, census division, and rural/urban residence. This model accounted for the complex survey design of NHATS by using sample weights (which are adjusted for survey non-response) and stratum and primary sampling unit variables. The results of this model indicated that gender and race/ethnicity associated with education level but other demographics did not, so we calculated conditional probabilities for each level of education within strata defined by these two variables.

We multiplied the total number of PD patients by age group and gender from the Parkinson Foundation’s P4 project by the conditional probabilities of demographic characteristics and disease severity obtained from epidemiologic, Medicare, and NHATS data to simulate a “virtual census” of the PD population in the U.S. Because Medicare and NHATS did not include individuals younger than 65, we assumed that the conditional probabilities for those 65–74 were the same as for those less than 65. The FI study contains few individuals of non-White and non-Black ancestry, so to improve the degree of overlap between the two datasets, we also categorized race as White, Black, or other.

FI study:

The FI study is an online longitudinal health survey of more than 35,000 self-identified PD patients administered by the Michael J. Fox Foundation17. Participants are recruited from neurology clinics, patient education/research events, and online through digital marketing. Following online registration and informed consent, participants receive screening questionnaires and PRO assessments every 90 days. We used data collected between March 2015 and September 2021. Each 90-day survey contains several validated PRO measures and PD-specific questionnaires, of which the most recently available response was used for analysis. These include the Movement Disorders Society Unified Parkinson’s Disease Rating Scale (MDS-UPDRS) part II, Non-Motor Symptom Questionnaire (NMS-Quest), Parkinson’s Disease Questionnaire (PDQ-8), and Penn Parkinson’s Daily Activity Questionnaire (PDAQ-15). Demographic variables such as age, gender, and race/ethnicity were self-reported at the time of study enrollment. MDS-UPDRS part II score was categorized as mild, moderate, or severe PD using published cutoffs27. ZIP code, which was optionally provided at the time of enrollment and made available by the Michael J. Fox Foundation, was converted to census division and urban/rural residence using published algorithms from the Department of Housing and Urban Development28 and the U.S. Department of Agriculture’s Economic Research Service24.

Statistical Analysis:

We merged the virtual census and FI datasets according to age group, gender, race/ethnicity, census division, urban/rural residence, level of education, and PD severity such that each real FI participant could merge with (i.e. “represent”) one hypothetical member of the virtual census. Next, we constructed a multivariable logistic regression model of FI participation (where the outcome variable equaled 1 if someone was present in both the virtual census and FI and equaled 0 if they were present in the virtual census but not in FI) as a function of the above covariates. Using the intercept and regression coefficients from this model, we calculated the predicted probability of FI participation for every individual in the virtual PD census. To quantify the generalizability of FI relative to the virtual census, we calculated a propensity score distance, which is equal to the average difference in predicted probabilities between FI and the virtual census divided by the standard deviation of the propensity scores. Previous research has suggested a propensity score distance of 0.2 or greater as indicative of a threat to generalizability between the sample and population15.

We calculated mean NMS-Quest, PDQ-8, and PDAQ-15 scores and the prevalence of each NMS in the FI sample. In addition to these unweighted summary statistics, we also calculated weighted estimates using inverse probability of participation weighting to adjust for differences between FI participants and non-participants in the virtual census. In a standard unweighted analysis, all individuals in the dataset count equally, but a weighted analysis allows certain individuals – namely, those with a higher weight – to contribute more strongly to the analysis than others. In inverse probability weighting, the weight is equal to the inverse of the predicted probability of participation. Thus, individuals who are relatively underrepresented in FI compared to the general PD population and therefore have lower predicted probabilities of participation are given a higher weight than individuals who are overrepresented and have higher predicted probabilities. Prior to weighting, we examined whether extreme weights influenced the distributions of the covariates and subsequently excluded individuals with weights of 1,000 or greater (n=222, 0.99%). We compared weighted and unweighted estimates of nonmotor symptom prevalence and mean NMS-Quest, PDQ-8, and PDAQ-15 scores. For unweighted estimates, we used standard variance estimation to produce 95% confidence intervals. For weighted estimates, we used bootstrap variance estimation based on 1,000 bootstrap resamples with replacement to estimate weighted 95% confidence intervals based on the percentile method.. We did not compare weighted and unweighted mean MDS-UPDRS part II scores because this variable is already included as part of the weighting scheme. Statistical analyses were performed using SATA/IC 15.1 (College Station, TX) and SAS version 9.4 (Cary, NC), and statistical significance was defined at the p<0.05 level.

Data Availability Statement:

The virtual PD census dataset is available to researchers upon request. FI data is available for free download to all registered users. For up-to-date information on the study, visit https://foxinsight-info.michaeljfox.org/insight/explore/insight.jsp.

RESULTS:

Based on Parkinson’s Foundation P4 project and U.S. Census Bureau data, there are an estimated 849,488 persons with PD living in the United States. The age- and gender-stratified prevalence of PD is shown in Table 1, and after simulating the joint distribution of demographic and clinical variables, the baseline characteristics of the virtual PD census are shown in Table 2. There were 37,753 FI participants with self-reported PD as of September 28, 2021, of whom 23,965 (63.5%) had complete geographic and other matching variable data. When merged with the virtual census according to age, gender, race/ethnicity, geographic region, urban/rural residence, census division, and PD severity, 340 FI participants had variable combinations that were not present in the virtual census, and 1,160 had variable combinations that were overrepresented compared to the virtual census. 22,465 of 23,965 eligible FI participants (93.7%) were able to be merged with a hypothetical counterpart in the virtual census. The baseline characteristics of this cohort are summarized in Table 2, and characteristics of the unmatched FI cohort are found in Supplemental Table 1.

Table 2:

Baseline characteristics of the Fox Insight cohort and virtual PD census

Virtual census (n=849488) Fox Insight (n=22465) Association with Fox Insight participation (OR, 95% CI)*
Age
<65 118680 (14.0%) 8479 (37.7%) 1.6 (1.6, 1.7)
65–74 238021 (28.0%) 9735 (43.3%) REF
>=75 492787 (58.0%) 4251 (18.9%) 0.15 (0.15, 0.16)
Gender
Male 471492 (55.5%) 12720 (56.6%) REF
Female 377996 (45.5%) 9745 (43.4%) 1.7 (1,6, 1.7)
Race/ethnicity
White 737352 (86.8%) 21303 (94.8%) REF
Black 45903 (5.4%) 148 (0.7%) 0.12 (0.10, 0.14)
Other 66233 (7.8%) 1014 (4.5%) 0.74 (0.69, 0.80)
Census division
Northeast 140168 (16.5%) 1518 (6.8%) REF
Mid-Atlantic 78062 (9.2%) 2943 (13.1%) 4.0 (3.7, 4.2)
East North Central 86509 (10.2%) 3154 (14.0%) 3.9 (3.7, 4.2)
West North Central 75338 (8.9%) 1432 (6.4%) 1.8 (1.6, 1.9)
South Atlantic 199401 (23.5%) 4722 (21.0%) 2.3 (2.1, 2.4)
East South Central 46257 (5.4%) 993 (4.4%) 2.3 (2.1, 2.5)
West South Central 114716 (13.5%) 1993 (8.9%) 1.5 (1.4, 1.6)
Mountain 66977 (7.9%) 2183 (9.7%) 3.5 (3.3, 3.7)
Pacific 42060 (5.0%) 3527 (15.7%) 10.9 (10.2, 11.6)
Rural/urban residence
Non-metropolitan 160805 (18.9%) 2750 (12.2%) REF
Metropolitan 688683 (81.1%) 19715 (87.8%) 2.3 (2.2, 2.4)
Education
Less than high school 200718 (23.6%) 155 (0.7%) REF
High school/some college 399107 (47.0%) 6496 (28.9%) 20.5 (17.5, 24.1)
Bachelor/associate degree 134267 (15.8%) 8153 (36.3%) 111.3 (94.8, 130.6)
Master/professional/doctoral degree 115396 (13.6%) 7661 (34.1%) 130.4 (111.1, 153.2)
PD severity
Mild 231278 (27.2%) 11607 (51.7%) REF
Moderate 355828 (41.9%) 9521 (42.4%) 0.42 (0.40, 0.43)
Severe 262382 (30.9%) 1337 (6.0%) 0.07 (0.07, 0.08)
*

Odds ratios are from a multivariable logistic regression model adjusted for all covariates in the table.

From the merged virtual census and FI dataset, the results of a logistic regression model of FI participation are shown in Table 2. Compared to the unmatched census, FI participants were more likely to be younger than 65 years of age (OR 1.6, 95% CI: 1.6–1.7) and live in metropolitan regions (OR 2.3, 95% CI: 2.2–2.4) and more likely to be female (OR 1.7, 95% CI: 1.6–1.7) or non-White (Table 2). Significant regional geographic differences were also present. FI participants were less likely to have moderate (OR 0.42, 95% CI: 0.40–0.43) or severe (0.07, 95% CI: 0.07–0.08) PD compared to the virtual census. Differences between FI and the virtual census were most pronounced for education, with the former being more than one hundred times likely to have an advanced degree (95% CI: 94.8–130.6). From this multivariable regression model, we calculated the probability that a given individual in the virtual census would be represented within FI. The average predicted probability of participation was much higher for FI participants than non-participants (propensity score distance 2.62), indicating that the two populations are significantly different (more than two standard deviations apart) with regards to predictors of research participation.

Unweighted and inverse probability of participation-weighted measures of NMS prevalence and QOL are shown in Table 3. Mean weight was 33.0 (SD 82.9, range 1.3–973.7). Weighting resulted in larger estimates for virtually all measures of NMS burden and QOL. This was especially pronounced for the PDQ-8 (37.1 weighted vs. 26.5 unweighted), which measures PD-specific QOL, and the PDAQ-15 (40.7 weighted vs. 47.9 unweighted), a measure of limitations in cognitive activities of daily living. Among NMS, inverse probability of participation weighting had the greatest effect on the prevalence of drooling (54.4% weighted vs. 38.6% unweighted) and hallucinations (28.4% weighted vs. 16.3% unweighted). Standardized differences for each predictor of study participation between the weighted FI cohort and virtual census are found in Supplemental Table 2.

Table 3:

Unweighted and inverse probability of participation-weighted measures of non-motor symptom prevalence and quality of life in the Fox Insight study

Unweighted Weighted Difference between weighted and unweighted
NMS Quest, mean (95% CI), n=21978 11.7 (11.6, 11.8) 13.7 (13.5, 13.9) 2.0 (1.8, 2.2)
Drooling, % (95% CI), n=21978 38.6 (37.9, 39.2) 54.4 (52.9, 55.9) 15.8 (14.1, 17.4)
Hyposmia, % (95% CI), n=21976 29.9 (29.3, 30.5) 36.9 (35.6, 38.5) 7.0 (5.4, 8.5)
Dysphagia, % (95% CI), n=21974 38.7 (38.1, 39.4) 49.2 (47.8, 50.8) 10.5 (8.9, 12.1)
Nausea/vomiting, % (95% CI), n=21974 23.3 (22.8, 23.9) 24.4 (23.2, 25.7) 1.1 (−0.3, 2.5)
Constipation, % (95% CI), n=21972 59.3 (58.6, 59.9) 62.5 (61.0, 64.0) 3.2 (1.7, 4.8)
Bowel incontinence, % (95% CI), n=21970 17.1 (16.6, 17.6) 28.4 (27.0, 29.9) 11.3 (9.7, 12.8)
Incomplete bowel emptying, % (95% CI), n=21967 51.1 (50.4, 51.7) 53.4 (51.9, 54.9) 2.3 (0.7, 4.0)
Urinary urgency, % (95% CI), n=21965 71.1 (70.5, 71.7) 76.0 (74.8, 77.4) 4.9 (3.6, 6.2)
Nocturia, % (95% CI), n=21963 73.9 (73.3, 74.5) 77.2 (76.0, 78.4) 3.3 (1.9, 4.6)
Pain, % (95% CI), n=21963 40.5 (39.8, 41.1) 42.1 (40.7, 43.5) 1.6 (0.1, 3.2)
Weight change, % (95% CI), n=21963 14.6 (14.1, 15.1) 20.4 (19.1, 21.8) 5.8 (4.6, 7.1)
Forgetfulness, % (95% CI), n=21961 56.0 (55.4, 56.7) 66.0 (64.6, 67.4) 10.0 (8.5, 11.4)
Loss of interest, % (95% CI), n=21960 43.5 (42.8, 44.1) 53.7 (52.2, 55.1) 10.2 (8.5, 11.7)
Hallucinations, % (95% CI), n=21960 16.3 (15.8, 16.8) 28.4 (27.0, 30.0) 12.1 (10.5, 13.7)
Difficulty concentrating, % (95% CI), n=21958, 54.3 (53.6, 54.9) 63.0 (61.5, 64.4) 8.7 (7.2, 10.4)
Depression, % (95% CI), n=21957 59.7 (59.1, 60.4) 64.2 (62.8, 65.7) 4.5 (2.9, 6.0)
Anxiety, % (95% CI), n=21954 42.7 (42.0, 43.3) 45.8 (44.4, 47.3) 3.1 (1.5, 4.8)
Altered interest in sex, % (95% CI), n=21952 37.7 (37.1, 38.4) 41.0 (39.6, 42.6) 3.3 (1.7, 4.9)
Difficulty with sex, % (95% CI), n=21950 36.9 (36.2, 37.5) 44.2 (42.8, 45.7) 7.3 (5.8, 8.9)
Dizziness, % (95% CI), n=21948 51.1 (50.5, 51.8) 61.1 (59.6, 62.5) 10.0 (8.3, 11.5)
Fall, % (95% CI), n=21947 27.1 (26.6, 27.7) 40.2 (38.8, 41.8) 13.1 (11.4, 14.6)
Difficulty staying awake, % (95% CI), n=21947 23.1 (22.6, 23.7) 31.7 (30.1, 33.2) 8.6 (7.0, 10.1)
Difficulty falling/staying asleep, % (95% CI), n=21946 65.2 (64.6, 65.8) 64.5 (63.0, 65.9) −0.7 (−2.2, 0.9)
Vivid dreams, % (95% CI), n=21945 36.9 (36.3, 37.6) 40.1 (38.5, 41.5) 3.2 (1.5, 4.7)
Dream enactment, % (95% CI), n=21945 37.0 (36.4, 37.6) 41.0 (39.7, 42.4) 4.0 (2.5, 5.5)
Restless legs, % (95% CI), n=21945 50.5 (49.9, 51.2) 56.9 (55.5, 58.3) 6.4 (4.8, 7.8)
Leg edema, % (95% CI), n=21944 23.4 (22.8, 23.9) 34.4 (33.0, 35.8) 11.0 (9.4, 12.5)
Excessive sweating, % (95% CI), n=21943 24.6 (24.1, 25.2) 24.9 (23.5, 26.2) 0.3 (−1.2, 1.7)
Double vision, % (95% CI), n=21941 22.4 (21.9, 23.0) 30.1 (28.7, 31.6) 7.7 (6.3, 9.1)
Delusions, % (95% CI), n=21940 5.7 (5.4, 6.0) 14.3 (13.2, 15.6) 8.6 (7.4, 9.9)
PDAQ-15, mean (95% CI), n=21485 47.9 (47.7, 48.1) 40.7 (40.2, 41.2) −7.2 (−7.7, −6.6)
PDQ-8, mean (95% CI), n=22240 26.5 (26.2, 26.7) 37.1 (36.5, 37.8) 10.6 (10.0, 11.3)

DISCUSSION:

In this study, we characterized the PD population at the individual level and used inverse probability of participation weighting to produce national estimates of the burden of NMS and reduced QOL in PD. We found that PD-related symptoms and quality of life limitations may be underestimated in studies such as FI, and that this is due to a significant degree of underrepresentation in PD research on the basis of age, gender, race/ethnicity, and other socioeconomic indicators such as level of education. These findings highlight the importance of inclusion and generalizability in clinical research and the potential for statistical methods such as inverse probability of participation weighting to give greater weight to underrepresented groups and produce more generalizable results.

NMS are known to be common in PD, but because of differences in healthcare access and clinical research participation, single-center studies from tertiary movement disorders cohorts may not be representative of the entire U.S. population. By combining large-scale health survey data with a simulated census of the PD population, we have estimated the symptom prevalence and QOL limitation that would have been seen had the entire PD population in the U.S. participated in Fox Insight. These estimates provide important context for the availability and allocation of health care resources. For example, knowing that over half of PD patients currently living in the U.S. have dysphagia means that there should be an adequate number of speech therapists and other resources to meet their needs. The relative prevalence of different NMS also informs guidelines about routine screening for specific symptoms in clinical practice29 and market share and sample size calculations for pharmaceutical development and clinical trials, respectively.

Our results also highlight the effect of underrepresentation on clinical research results in PD. Like many research populations, FI is limited by the underrepresentation of minority and socioeconomically disadvantaged groups, with nearly 95% being White and more than half with college degrees. Knowing about these differences raises critical questions about generalizability, but it has previously not been possible to quantify the effects of underrepresentation on research results. Statistical weighting is the cornerstone of population-based national health survey research, and propensity scores with inverse probability weighting are frequently used in observational epidemiologic studies to adjust for confounding (e.g. non-random allocation to two different treatment groups)30,31. Similar approaches have been used estimate population average treatment effects from clinical trials accounting for selection bias14,15, but these typically require a well-defined source population (e.g. cancer registry), and alternatives such direct standardization are limited by the inability to account for correlations between multiple different variables32,33. In this study, we utilize a novel approach by first simulating a “virtual census” of the PD population in the U.S. using epidemiologic and administrative claims data and then analyzing FI data with inverse probability of participation weighting. The virtual PD census can be used to adjust the results of any observational or interventional PD study, and similar approaches could easily be applied to other neurologic disease populations.

While our virtual census was derived using multiple high-quality sources of data and accounts for complex interrelationships between different demographic and clinical variables, it is important to acknowledge that it is only an approximation and is therefore subject to error. Specifically, population-based data on race, geographic region, urban/rural residence, and level of education in the PD population was only available for individuals 65 years and older, so an important limitation is that we were required to assume that these characteristics were similar for individuals younger than 65. Characteristics such as disease severity were also limited in granularity. However, because of the large size of the virtual census, comparisons to the FI cohort convey a high degree of statistical precision (as indicated by the relatively narrow confidence intervals in Table 2) despite this inherent uncertainty. Furthermore, inverse probability of participation weighting assumes that all of the factors associated with research participation are known and that everyone in the population has some probability of being selected. While FI’s innovative recruitment strategy has made it possible to enroll PD patients at an unprecedented scale, the nature of the study does limit participation for some individuals. For example, internet access is required to complete the questionnaires, so individuals without internet access are unable to be represented in FI. Patients with very severe disease (e.g. permanent nursing home residents) may also be unable to participate. Inverse probability of participation weighting is also sensitive to model misspecification, including potential residual selection bias associated with categorizing variables such as age and disease severity, and while weighting drastically reduced the differences between FI and the virtual census, some small differences persisted (Supplemental Table 2). Because PD diagnoses in FI are self-reported, diagnoses cannot be confirmed through in-person examination, though previous studies have shown that the accuracy of self-reported PD in this context is high34. Our results apply to the FI dataset in its current form, but underrepresentation within an ongoing study may change over time. Because we limited our analysis to active study participants, we were unable to incorporate information about participants who enrolled and were subsequently lost to follow-up, and this could have affected the factors associated with study participation. Finally, we lacked population-level data on other determinants of research participation, which could have been used to improve the validity of our weighting scheme. Ultimately, post hoc weighting is not a substitute for prospective sampling and inclusive recruitment12, but rather informs these efforts by quantifying the effects of underrepresentation and improving the generalizability of current resources until future efforts are implemented.

Supplementary Material

1

ACKNOWLEDGEMENT:

This work was funded by the Michael J. Fox Foundation (grant no. 020058 to AGH) and NINDS (grant no. R01 NS099129-05 to AWW). The funding organizations had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; or decision to submit the manuscript for publication.

Data used in the preparation of this article were obtained from the Fox Insight database (https://foxinsight-info.michaeljfox.org/insight/explore/insight.jsp) on September 28, 2021. For up-to-date information on the study, visit https://foxinsight-info.michaeljfox.org/insight/explore/insight.jsp. The Fox Insight Study (FI) is funded by The Michael J. Fox Foundation for Parkinson’s Research. We would like to thank the Parkinson’s community for participating in this study to make this research possible.

The National Health and Aging Trends Study (NHATS) is sponsored by the National Institute on Aging (grant number NIA U01AG032947) through a cooperative agreement with the Johns Hopkins Bloomberg School of Public Health.

The authors wish to thank Dr. Danielle Abraham for her assistance with Medicare data programming.

FINANCIAL DISCLOSURES OF ALL AUTHORS:

Dr. Hamedani reports grants from the National Eye Institute and Michael J. Fox Foundation, clinical trial support from Biogen and Biohaven, and honoraria from Westchester Medical Center.

Ms. Auinger reports no financial disclosures

Dr. Willis reports grants from the National Institute on Aging, National Institute of Neurological Diseases and Stroke, and Acadia Pharmaceuticals.

Dr. Safarpour reports grants from the Parkinson Study Group and Medtronic; consultancies from Clear View and Ryan, Swanson, & Cleveland PLLC; advisory boards for AbbVie and Boston Scientific; and honoraria from Boston Scientific, Abbott, and AbbVie.

Dr. Shprecher reports grants from the Arizona Alzheimer’s Consortium, AbbVie, Biogen, Cognition Therapeutics, Eisai, Jazz Pharmaceuticals, the Michael J Fox Foundation, Neuraly, Roche, and Sanofi-Aventis; consultancies for Amneal, AbbVie, Emalex, and US World Meds/Supernus; and honoraria from Amneal, Teva, and Neurocrine.

Dr. Stover reports grants from F. Hoffmann-LaRoche Ltd, Neuraly, UCB Biopharma SPRL, Cerevel Therapeutics, AbbVie, Praxis Precision Medicines, the Dystonia Medical Research Foundation, and the National Institutes of Health; and consultancies for the Huntington Study Group and Parkinson Study Group EARSTIM Data Safety Monitoring Board.

Dr. Subramanian reports grants from the National Institutes of Health, American Parkinson Disease Association, Enterin Inc., Pharma2B, Ann and Phillip Gladfetler III Foundation, and Ron and Pratima Gatehouse Foundation; and honoraria from Neurocrine, Supernus, Teva, and the National Institutes of Health.

Dr. Cloud reports grants from the National Institute of Neurological Diseases and Stroke, Michael J. Fox Foundation, Virginia Catalyst Fund, and Parkinson’s Foundation; consultancies from AbbVie; advisory board membership from Kyowa Kirin; contracts from Bukwang, Cerevel, and Neuro-point alliance; honoraria from Medlink neurology, HMP global, Colontown, M3 global research team, Qessential research, and Kyowa Kirin; inventions of the Gastrointestinal Symptoms in Neurodegenerative Disease (GIND) scale; and a provisional patent for Vibration for Mitigation of Freezing of Gait [#PRE-21-137F (322203-8130)].

Funding:

This study was funded by the Michael J. Fox Foundation and NINDS.

Footnotes

Financial Disclosure/Conflict of Interest: The authors report no conflicts of interest.

REFERENCES:

  • 1.Seppi K, Ray Chaudhuri K, Coelho M, et al. Update on treatments for nonmotor symptoms of Parkinson’s disease-an evidence-based medicine review. Mov Disord. 2019;34(2):180–198. doi: 10.1002/mds.27602 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Sieber BA, Landis S, Koroshetz W, et al. Prioritized research recommendations from the National Institute of Neurological Disorders and Stroke Parkinson’s Disease 2014 conference. Ann Neurol. 2014;76(4):469–472. doi: 10.1002/ana.24261 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Chaudhuri KR, Prieto-Jurcynska C, Naidu Y, et al. The nondeclaration of nonmotor symptoms of Parkinson’s disease to health care professionals: an international study using the nonmotor symptoms questionnaire. Mov Disord. 2010;25(6):704–709. doi: 10.1002/mds.22868 [DOI] [PubMed] [Google Scholar]
  • 4.Kim HS, Cheon SM, Seo JW, Ryu HJ, Park KW, Kim JW. Nonmotor symptoms more closely related to Parkinson’s disease: comparison with normal elderly. J Neurol Sci. 2013;324(1–2):70–73. doi: 10.1016/j.jns.2012.10.004 [DOI] [PubMed] [Google Scholar]
  • 5.Di Luca DG, Sambursky JA, Margolesky J, et al. Minority Enrollment in Parkinson’s Disease Clinical Trials: Meta-Analysis and Systematic Review of Studies Evaluating Treatment of Neuropsychiatric Symptoms. J Parkinsons Dis. 2020;10(4):1709–1716. doi: 10.3233/JPD-202045 [DOI] [PubMed] [Google Scholar]
  • 6.Schneider MG, Swearingen CJ, Shulman LM, Ye J, Baumgarten M, Tilley BC. Minority enrollment in Parkinson’s disease clinical trials. Parkinsonism Relat Disord. 2009;15(4):258–262. doi: 10.1016/j.parkreldis.2008.06.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Gilbert RM, Standaert DG. Bridging the gaps: More inclusive research needed to fully understand Parkinson’s disease. Mov Disord. 2020;35(2):231–234. doi: 10.1002/mds.27906 [DOI] [PubMed] [Google Scholar]
  • 8.Hemming JP, Gruber-Baldini AL, Anderson KE, et al. Racial and socioeconomic disparities in parkinsonism. Arch Neurol. 2011;68(4):498–503. doi: 10.1001/archneurol.2010.326 [DOI] [PubMed] [Google Scholar]
  • 9.Dahodwala N, Siderowf A, Xie M, Noll E, Stern M, Mandell DS. Racial differences in the diagnosis of Parkinson’s disease. Mov Disord. 2009;24(8):1200–1205. doi: 10.1002/mds.22557 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Dahodwala N, Xie M, Noll E, Siderowf A, Mandell DS. Treatment disparities in Parkinson’s disease. Ann Neurol. 2009;66(2):142–145. doi: 10.1002/ana.21774 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Jackson J, Sanchez A, Ison J, Hemley H, Siddiqqi B. Importance of Diversity in Parkinson’s Disease Research. Applied Clinical Trials. 2020;29(12). [Google Scholar]
  • 12.Dobkin RD, Amondikar N, Kopil C, et al. Innovative Recruitment Strategies to Increase Diversity of Participation in Parkinson’s Disease Research: The Fox Insight Cohort Experience. J Parkinsons Dis. 2020;10(2):665–675. doi: 10.3233/JPD-191901 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Fuller W Sampling Statistics. John Wiley & Sons; 2009. [Google Scholar]
  • 14.Stuart EA, Cole SR, Bradshaw CP, Leaf PJ. The use of propensity scores to assess the generalizability of results from randomized trials. J R Stat Soc Ser A Stat Soc. 2001;174(2):369–386. doi: 10.1111/j.1467-985X.2010.00673.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Stuart EA, Bradshaw CP, Leaf PJ. Assessing the generalizability of randomized trial results to target populations. Prev Sci. 2015;16(3):475–485. doi: 10.1007/s11121-014-0513-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Dahabreh IJ, Robertson SE, Tchetgen EJ, Stuart EA, Hernán MA. Generalizing causal inferences from individuals in randomized trials to all trial-eligible individuals. Biometrics. 2019;75(2):685–694. doi: 10.1111/biom.13009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Smolensky L, Amondikar N, Crawford K, et al. Fox Insight collects online, longitudinal patient-reported outcomes and genetic data on Parkinson’s disease. Sci Data. 2020;7(1):67. doi: 10.1038/s41597-020-0401-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Marras C, Beck JC, Bower JH, et al. Prevalence of Parkinson’s disease across North America. NPJ Parkinsons Dis. 2018;4:21. doi: 10.1038/s41531-018-0058-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.U.S Census Bureau. American Community Survey, 2018: ACS 1-Year Estimates Detailed Tables, Table S0101. Accessed June 14, 2021. https://data.census.gov/cedsci/
  • 20.Riedel O, Heuser I, Klotsche J, Dodel R, Wittchen HU, GEPAD Study Group. Occurrence risk and structure of depression in Parkinson disease with and without dementia: results from the GEPAD Study. J Geriatr Psychiatry Neurol. 2010;23(1):27–34. doi: 10.1177/0891988709351833 [DOI] [PubMed] [Google Scholar]
  • 21.Wright Willis A, Evanoff BA, Lian M, Criswell SR, Racette BA. Geographic and ethnic variation in Parkinson disease: a population-based study of US Medicare beneficiaries. Neuroepidemiology. 2010;34(3):143–151. doi: 10.1159/000275491 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Mantri S, Fullard ME, Beck J, Willis AW. State-level prevalence, health service use, and spending vary widely among Medicare beneficiaries with Parkinson disease. NPJ Parkinsons Dis. 2019;5:1. doi: 10.1038/s41531-019-0074-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Szumski NR, Cheng EM. Optimizing algorithms to identify Parkinson’s disease cases within an administrative database. Mov Disord. 2009;24(1):51–56. doi: 10.1002/mds.22283 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Rural-Urban Continuum Codes [online]. Economic Research Service, United States Department of Agriculture; 2016. https://www.ers.usda.gov/data-products/rural-urban-continuum-codes/ [Google Scholar]
  • 25.Moore SF, Guzman NV, Mason SL, Williams-Gray CH, Barker RA. Which patients with Parkinson’s disease participate in clinical trials? One centre’s experiences with a new cell based therapy trial (TRANSEURO). J Parkinsons Dis. 2014;4(4):671–676. doi: 10.3233/JPD-140432 [DOI] [PubMed] [Google Scholar]
  • 26.NHATS Public Use Data. Rounds 1–7. Sponsored by the National Institute on Aging (grant number NIA U01AG032947) through a cooperative agreement with the Johns Hopkins Bloomberg School of Public Health. Available at www.nhats.org.
  • 27.Martínez-Martín P, Rodríguez-Blázquez C, Mario Alvarez null, et al. Parkinson’s disease severity levels and MDS-Unified Parkinson’s Disease Rating Scale. Parkinsonism Relat Disord. 2015;21(1):50–54. doi: 10.1016/j.parkreldis.2014.10.026 [DOI] [PubMed] [Google Scholar]
  • 28.Wilson R, Din A. Understanding and Enhancing the U.S. Department of Housing and Urban Development’s ZIP COde Crosswalk Files. Cityscape: A Journal of Policy Development and Research. 2018;20(2):277–294. [Google Scholar]
  • 29.Chou KL, Martello J, Atem J, et al. Quality Improvement in Neurology: 2020 Parkinson Disease Quality Measurement Set Update. Neurology. 2021;97(5):239–245. doi: 10.1212/WNL.0000000000012198 [DOI] [PubMed] [Google Scholar]
  • 30.Mansournia MA, Altman DG. Inverse probability weighting. BMJ. 2016;352:i189. doi: 10.1136/bmj.i189 [DOI] [PubMed] [Google Scholar]
  • 31.Rubin DB. Estimating causal effects from large data sets using propensity scores. Ann Intern Med. 1997;127(8 Pt 2):757–763. [DOI] [PubMed] [Google Scholar]
  • 32.Bonander C, Nilsson A, Björk J, Bergström GML, Strömberg U. Participation weighting based on sociodemographic register data improved external validity in a population-based cohort study. J Clin Epidemiol. 2019;108:54–63. doi: 10.1016/j.jclinepi.2018.12.011 [DOI] [PubMed] [Google Scholar]
  • 33.Copas A, Burkill S, Conrad F, Couper MP, Erens B. An evaluation of whether propensity score adjustment can remove the self-selection bias inherent to web panel surveys addressing sensitive health behaviours. BMC Med Res Methodol. 2020;20(1):251. doi: 10.1186/s12874-020-01134-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Kim HM, Leverenz JB, Burdick DJ, et al. Diagnostic Validation for Participants in the Washington State Parkinson Disease Registry. Parkinsons Dis. 2018;2018:3719578. doi: 10.1155/2018/3719578 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

Data Availability Statement

The virtual PD census dataset is available to researchers upon request. FI data is available for free download to all registered users. For up-to-date information on the study, visit https://foxinsight-info.michaeljfox.org/insight/explore/insight.jsp.

RESOURCES