Table. Associations Between 30 Socioeconomic and Demographic Features and Claims Database Sampling Fraction at the Zip Code Level, Accounting for State-Level Variation in Sampling Fraction in 2 Models.
Characteristic | % | P value | |
---|---|---|---|
Partial correlation coefficienta | Multivariable regression coefficient (SD)b | ||
Population | |||
Total population (millions) | 0.01 | −0.0005 (0.00018) | <.001 |
Log10 pop density (1/square mile) | 0.14 | 0.67 (0.06) | <.001 |
Female sex | 0.12 | 0.050 (0.010) | <.001 |
Race and ethnicity | |||
Asian (non-Hispanic) | 0.16 | −0.010 (0.004) | <.001 |
Black (non-Hispanic) | −0.15 | −0.008 (0.002) | <.001 |
Hispanic | −0.19 | 0.001 (0.004) | .64 |
White (non-Hispanic) | 0.20 | [Reference] | NA |
Otherc | −0.08 | −0.026 (0.008) | <.001 |
Age, y | |||
<18 | −0.09 | [Reference] | NA |
18-40 | −0.21 | −0.019 (0.010) | <.001 |
40-60 | 0.32 | 0.084 (0.014) | <.001 |
60-80 | 0.14 | 0.002 (0.012) | .71 |
>80 | 0.12 | 0.055 (0.024) | <.001 |
Household income, $ | |||
<15 000 | −0.34 | [Reference] | NA |
15 000-30 000 | −0.40 | −0.016 (0.014) | .01 |
30 000-45 000 | −0.37 | −0.009 (0.012) | .16 |
45 000-60 000 | −0.23 | 0.002 (0.014) | .78 |
60 000-100 000 | 0.10 | 0.007 (0.010) | .15 |
100 000-125 000 | 0.33 | 0.033 (0.018) | <.001 |
125 000-200 000 | 0.43 | 0.016 (0.012) | .01 |
>200 000 | 0.42 | 0.071 (0.014) | <.001 |
Work and insurance | |||
Unemployed | −0.29 | −0.001 (0.014) | .89 |
No health insurance | 0.31 | −0.030 (0.008) | <.001 |
Education | |||
Less than high school | −0.33 | [Reference] | |
High school | −0.27 | 0.038 (0.01) | <.001 |
Some college | −0.09 | −0.010 (0.008) | .01 |
College | 0.41 | 0.071 (0.010) | <.001 |
Graduate | 0.30 | −0.021 (0.010) | <.001 |
Housing | |||
Houses that are owner occupied | 0.25 | −0.003 (0.004) | .14 |
Median house price (millions), $ | 0.36 | −0.00001 (0.00003) | .43 |
Abbreviation: NA, not applicable.
The first set of models considers each covariate of interest separately, along with state-level fixed effects. Partial correlation coefficients derived from this model are presented; positive correlations indicate that zip codes with higher values of the covariate of interest are associated with higher zip-code level sampling in the claims database, even after adjusting for state-level clustering in sampling.
The second model is a full multivariable model that includes all 29 covariates of interest in addition to state-level fixed effects. For example, for a 10 percentage increase in a zip code’s fraction of households earning greater than $200 000, the model suggests the claims database sampling fraction will increase by 0.6 percentage points, on average.
Other race and ethnicity includes persons identifying as non-Hispanic American Indian and/or Alaska Native, non-Hispanic Native Hawaiian and Other Pacific Islander, non-Hispanic other races, and 2 or more races.