Skip to main content
. 2023 Jan 6;6(1):e2249804. doi: 10.1001/jamanetworkopen.2022.49804

Table. Associations Between 30 Socioeconomic and Demographic Features and Claims Database Sampling Fraction at the Zip Code Level, Accounting for State-Level Variation in Sampling Fraction in 2 Models.

Characteristic % P value
Partial correlation coefficienta Multivariable regression coefficient (SD)b
Population
Total population (millions) 0.01 −0.0005 (0.00018) <.001
Log10 pop density (1/square mile) 0.14 0.67 (0.06) <.001
Female sex 0.12 0.050 (0.010) <.001
Race and ethnicity
Asian (non-Hispanic) 0.16 −0.010 (0.004) <.001
Black (non-Hispanic) −0.15 −0.008 (0.002) <.001
Hispanic −0.19 0.001 (0.004) .64
White (non-Hispanic) 0.20 [Reference] NA
Otherc −0.08 −0.026 (0.008) <.001
Age, y
<18 −0.09 [Reference] NA
18-40 −0.21 −0.019 (0.010) <.001
40-60 0.32 0.084 (0.014) <.001
60-80 0.14 0.002 (0.012) .71
>80 0.12 0.055 (0.024) <.001
Household income, $
<15 000 −0.34 [Reference] NA
15 000-30 000 −0.40 −0.016 (0.014) .01
30 000-45 000 −0.37 −0.009 (0.012) .16
45 000-60 000 −0.23 0.002 (0.014) .78
60 000-100 000 0.10 0.007 (0.010) .15
100 000-125 000 0.33 0.033 (0.018) <.001
125 000-200 000 0.43 0.016 (0.012) .01
>200 000 0.42 0.071 (0.014) <.001
Work and insurance
Unemployed −0.29 −0.001 (0.014) .89
No health insurance 0.31 −0.030 (0.008) <.001
Education
Less than high school −0.33 [Reference]
High school −0.27 0.038 (0.01) <.001
Some college −0.09 −0.010 (0.008) .01
College 0.41 0.071 (0.010) <.001
Graduate 0.30 −0.021 (0.010) <.001
Housing
Houses that are owner occupied 0.25 −0.003 (0.004) .14
Median house price (millions), $ 0.36 −0.00001 (0.00003) .43

Abbreviation: NA, not applicable.

a

The first set of models considers each covariate of interest separately, along with state-level fixed effects. Partial correlation coefficients derived from this model are presented; positive correlations indicate that zip codes with higher values of the covariate of interest are associated with higher zip-code level sampling in the claims database, even after adjusting for state-level clustering in sampling.

b

The second model is a full multivariable model that includes all 29 covariates of interest in addition to state-level fixed effects. For example, for a 10 percentage increase in a zip code’s fraction of households earning greater than $200 000, the model suggests the claims database sampling fraction will increase by 0.6 percentage points, on average.

c

Other race and ethnicity includes persons identifying as non-Hispanic American Indian and/or Alaska Native, non-Hispanic Native Hawaiian and Other Pacific Islander, non-Hispanic other races, and 2 or more races.