Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Jun 1.
Published in final edited form as: Environ Res. 2021 Mar 6;197:110986. doi: 10.1016/j.envres.2021.110986

Evaluation of a commercial database to estimate residence histories in the Los Angeles Ultrafines Study

Danielle N Medgyesi 1,, Jared A Fisher 1, Abigail R Flory 2, Richard B Hayes 3,4, George D Thurston 3,4, Linda M Liao 5, Mary H Ward 1, Debra T Silverman 1, Rena R Jones 1
PMCID: PMC8187285  NIHMSID: NIHMS1680840  PMID: 33689822

Abstract

Background.

Commercial databases can be used to identify participant addresses over time, but their quality and impact on environmental exposure assessment is uncertain.

Objective.

To evaluate the performance of a commercial database to find residences and estimate environmental exposures for study participants.

Methods.

We searched LexisNexis® for participant addresses in the Los Angeles Ultrafines Study, a prospective cohort of men and women aged 50–71 years. At enrollment (1995–1996) and follow-up (2004–2005), we evaluated attainment (address found for the corresponding time period) and match rates to survey addresses by participant characteristics. We compared geographically-referenced predictors and estimates of ultrafine particulate matter (UFP) exposure from a land use regression model using LexisNexis and survey addresses at enrollment.

Results.

LexisNexis identified address for 69% of participants at enrollment (N=50,320) and 95% of participants at follow-up (N=24,432). Attainment rate at enrollment modestly differed (≥5%) by age, smoking status, education, and residential mobility between surveys. The match rate at both survey periods was high (82–86%) and similar across characteristics. When using LexisNexis versus survey addresses, correlations were high for continuous values of UFP exposure and its predictors (rho=0.86–0.92).

Significance.

Time period and population characteristics influenced the attainment of addresses from a commercial database, but accuracy and subsequent estimation of specific air pollution exposures were high in our older study population.

Keywords: Analytic Methods, Geospatial Analyses, Air Pollution, Epidemiology, Cancer

INTRODUCTION

Many epidemiologic studies use addresses collected in surveys to estimate environmental exposures near the home [15]. However, for outcomes such as cancer, the exposure period of interest may date back several years or decades before study enrollment, especially for participants enrolled later in life. Often, studies do not ask participants to report the duration at the current address or collect additional information about historical residences. When residential addresses for the time period of interest are unconfirmed, the accuracy of exposure assessment can be impacted by unknown changes in residential location. Subsequently, exposure misclassification may to bias in epidemiologic risk estimates [69].

Commercial databases can be useful for obtaining residential histories for participants, but the quality of such information is uncertain, especially in the context of environmental exposure assessment. Several studies have evaluated residential information collected for study participant susing the commercial database LexisNexis® [1013]. LexisNexis is a tracing service that aggregates public records from several sources such as real estate/tax records, property deeds and mortgages, criminal courts, and state death registries [14]. Previous studies have reported that LexisNexis found at least one address for most participants, but the availability of addresses varied over time. For example, the California Teachers’ Study (CTS) found at least one LexisNexis address for 98% of participants at any time point up to the end of follow-up in 2011, but fewer participants (80%) had an address with a date before enrollment (1995–1996) [13]. Several studies have found that LexisNexis addresses correspond well to self-reported information. Most LexisNexis addresses could be matched to study addresses ascertained at the time of survey completion for CTS participants (85%). A similarly high match rate (87%) to study addresses was observed in a small sample of NIH-AARP Diet and Health Study participants in eight U.S. states, including California [10, 13]. Despite the growing use of commercial databases to augment address histories, few studies have evaluated how their accuracy may influence the classification of environmental exposures. One study of childhood cancer that used LexisNexis to collect participant residences at the time of birth up to cancer diagnosis found moderate to strong correlation in pesticide exposures assigned at LexisNexis versus birth certificate and cancer registry addresses (Spearman’s rho= 0.76–0.83) [11]. The Los Angeles (LA) Ultrafines Study is a cohort of individuals living in three countieso f Southern California for whom participant exposure to black carbon and fine and ultrafine particulate matter (UFP) has been previously estimated with a land use regression model [15]. As neither duration at the current address, nor residential histories prior to enrollment, were ascertained as part of the study questionnaires, we sought to obtain this information for participants using LexisNexis. The goals of this effort were to describe attainment and accuracy of information from LexisNexis and to evaluate potential exposure misclassification. These efforts will inform the use of LexisNexis addresses to estimate historical exposures in future epidemiologic analyses of cancer incidence and other chronic health outcomes.

MATERIALS AND METHODS

Study population

The LA Ultrafines Study includes NIH-AARP Diet and Health Study participants who were aged 50–71 years and lived in Southern California (LA county and parts of Orange and Riverside counties) at enrollment in 1995–1996 (n= 53,833) [15]. Participants were mailed surveys to collect information on health outcomes and lifestyle and demographic characteristics at enrollment and during a follow-up effort in 2004–2005. The follow-up survey was sent to living participants at their last known address (either the enrollment address or updated address identified through tracing efforts). Participants were linked to statewide cancer registries and the National Death Index (NDI) for cancer incidence and mortality, respectively (follow-up through December 2011). The cohort uses cancer registries from eight enrollment states and three additional states to which participants frequently moved after enrollment. The National Cancer Institute Special Studies Institutional Review Board approved the LexisNexis linkage.

Commercial database search and evaluation

We initiated a LexisNexis search for residences of LA Ultrafines Study participants. We provided LexisNexis with the following information for each participant: first and last name, social security number, and the last known address. LexisNexis returned full addresses and the month and year that person was first and last ‘seen’ at each residence (e.g., dates from LexisNexis-acquired databases linking an individual to an address). We geocoded LexisNexis addresses and survey addresses at enrollment and follow-up using the same street map database (ESRI's StreetMap Premium 2019). Addresses that were not P.O. boxes and had a geocode match status of point, address, or street segment are here after referred to as ‘well-geocoded addresses’.

To ensure accurate comparisons between LexisNexis and survey addresses, we restricted our analysis to participants with a well-geocoded survey address at enrollment (N=50,320;93%). Any comparisons at follow-up were restricted to those that completed the follow-up survey and had a well-geocoded survey address at this time (N=24,432; 45%).

Of the 50,320 participants at enrollment, LexisNexis found 49,374 (98%) participants and returned a total of 118,400 addresses for these individuals. LexisNexis addresses were excluded if the date first seen was after the date last seen (n=8,127), after a known date of death (DOD; n=6,044) or was missing (n=105). We replaced the date last seen with DOD if it was later than the DOD (n=987). We excluded LexisNexis addresses that were not well-geocoded (n=5,556). Among the remaining LexisNexis addresses (n=105,237), we excluded duplicate addresses with identical coordinates and matching dates first seen (n=1,800).

LexisNexis addresses: attainment and match rate at enrollment and follow-up

First, we computed the total number of addresses returned by LexisNexis for each participant. Next, we enumerated the percent of participants with addresses on or before the date of the enrollment or the follow-up survey that were within the same state as the respective survey addresses (i.e., attainment). We then selected the LexisNexis address with the closest date first seen to the survey date and described the percent of participants with an address that matched the survey address. A match between the LexisNexis and survey address was defined as: linear distance ≤250m between the address geocodes and a string match of ≥4 address attributes (street number, street name, city, ZIP code, and state). We also employed an alternative match criterion of identical U.S. 2000 Census tract identifiers and described how often it agreed with matches determined by distance and address attributes.

Differences in attainment and match rates by participant characteristics

We calculated rates of LexisNexis address attainment and match to the survey address at enrollment and follow-up by strata of participant characteristics including race/ethnicity, age, gender, education, smoking status, and body mass index (BMI), as well as prospective health outcomes including any cancer or lung cancer diagnosis, mortality or cardiovascular disease-specific mortality.

We also computed attainment match rates by residential mobility status occurring between enrollment and follow-up. Mobility was evaluated for the 24,432 participants included in our follow-up analysis who had address information for both time periods. Movers were defined as those with >250m between survey address geocodes and non-movers were those with geocodes within 250m. Finally, we calculated rates at follow-up by California residency status, as determined by the state of the follow-up survey address.

Differences in attainment and match rates across characteristics were evaluated by computing pairwise differences in proportions between each subgroup and the reference group (e.g., Non-Hispanic Black vs. Non-Hispanic White), and 95% confidence intervals (95%CI). We noted characteristics with 95%CI proportional differences of at least 5%.

Estimating historical duration at enrollment address

We described the distribution of year first seen of the matched LexisNexis enrollment addresses. Historical duration at the enrollment address was estimated by subtracting the year first seen from the year of enrollment.

Identification of residence(s) before address at study enrollment

We enumerated participants’ historical residences prior to the enrollment by identifying LexisNexis address(es) with a date first seen before the date first seen of the matched LexisNexis enrollment address.

Validity of LexisNexis date first seen as an estimate for move-in date

To assess the validity of using LexisNexis date first seen to approximate the move-in date at a residence, we evaluated whether the year first seen of LexisNexis follow-up addresses fell sometime between enrollment and follow-up for those known to have moved during this time period. We calculated the sensitivity of year first seen the percent of movers with a year first seen at follow-up on or after the enrollment year. Specificity was calculated as the percent of non-movers with the same LexisNexis address matched to follow-up and enrollment (i.e., year first seen on or before the enrollment year).

Evaluation of exposures assessed at the enrollment address

For both LexisNexis and survey addresses at enrollment, we estimated geographically-referenced predictors (e.g., land use, traffic) used in a land use regression model and thereafter calculated UFP at participant residences, as previously described [15]. We characterized the distribution of predictors and estimated UFP exposure at both the LexisNexis and survey addresses at enrollment. We evaluated the correlation between important predictors and UFP exposure assigned at the two sets of addresses using Spearman’s rho. We also categorized predictors and UFP levels into quintiles for each set of addresses and calculated their agreement.

RESULTS

Commercial database search and evaluation

After excluding LexisNexis addresses with unreliable information, we identified 103,437 LexisNexis addresses among 48,379 of the 50,320 participants (98%) at enrollment. A median of 2 addresses were found for each participant LexisNexis could locate (interquartile range; IQR: 1–3 addresses), and 10% of participants had 4 or more addresses.

There were 34,909 (69%) participants with LexisNexis addresses in California with a date first seen on or before enrollment (Figure 1A). The address with date first seen closest to the enrollment date matched the survey address 86% of the time (i.e., ≤250m distance and ≥4 matched address attributes), and 64% of addresses were a complete match (i.e., geocodes and all address attributes matched exactly); Table S1. At follow-up, LexisNexis returned at least one address for 95% of participants, and the match rate to the survey address was 82% (Figure 1B). We observed high agreement (98%) as to whether a LexisNexis address matched based distanceand attributes or was within the same census tract as the survey address at both the enrollment and follow-up time periods (Table S2).

Figure 1.

Figure 1.

Percent of those with well-geocoded addresses that completed surveys at (A) enrollment (N=50,320) and (B) follow-up (N=24,432) that had LexisNexis address(es) on or before the respective survey dates. The outer rim shows among participants with LexisNexis address(es), the percent of those with an address that matched the survey address with a date first seen closest to the survey date by 4 or more address attributes (street number, street name, city, ZIP code, and state) and within 250m geocoded distance.

The percent of participants with LexisNexis addresses at enrollment (69%) was similar by gender and across most demographic categories, but greater by 8% for those who reported other race/ethnicity (versus Non-Hispanic White), 8 or 9% for those with a college or postgraduate education, respectively (versus less than high school), and 7% for those 65–69 years of age (versus 50–54 years of age); Tables 1-2. Compared to never smokers, current smokers had a lower attainment rate (−8%, 95%CI: −10 to −7%). Address attainment at enrollment was 11% higher for those that participated in follow-up, and 64% lower for those who moved between surveys (compared to non-mover). Address attainment was similar for those with cancer diagnoses determined by statewide registries, and 18 or 26% lower for those whose death was attributed to any cancer or lung cancer, respectively, according to the NDI, but not confirmed in a registry. The match rate among those with a LexisNexis address at enrollment (86%) was similar across most participant characteristics but 46% lower (95%CI: −48 to −43%) for those who moved. At follow-up, the rates of address attainment (95%) and match to the survey address (82%) were similar across characteristics, including movers and non-movers (Table S3-S4).

Table 1.

Percent of participants with LexisNexis address(es) on or before enrollment and the subset of those with a match to the survey address, by participant characteristics

N participants LexisNexis address founda LexisNexis address matchedb

N (%)c N (%)d

Total 50320 34909 (69) 30122 (86)
Race/Ethnicity
 Non-Hispanic White 42693 29352 (69) 25320 (86)
 Non-Hispanic Black 2297 1637 (71) 1409 (86)
 Hispanic 2146 1557 (73) 1347 (87)
 Other 2348 1802 (77) 1572 (87)
 Unknown 836 561 (67) 474 (84)
Age at enrollment
 50–54 6253 4114 (66) 3454 (84)
 55–59 11933 7846 (66) 6576 (84)
 60–64 14633 10182 (70) 8786 (86)
 65–69 15733 11492 (73) 10153 (88)
 70+ 1768 1275 (72) 1153 (90)
Gender
 Male 28706 20167 (70) 17136 (85)
 Female 21614 14742 (68) 12986 (88)
Education
 Less than high school 1826 1142 (63) 1013 (89)
 12 years/completed high school 6149 4137 (67) 3690 (89)
 Some college or training 17771 12085 (68) 10516 (87)
 College graduate 10701 7627 (71) 6588 (86)
 Postgraduate 12399 8958 (72) 7488 (84)
 Unknown 1474 960 (65) 827 (86)
Smoking status at enrollment
 Never smoked 18638 13468 (72) 11513 (85)
 Current smoker 5518 3507 (64) 3105 (89)
 Former Smoker 24307 16674 (69) 14437 (87)
 Unknown 1857 1260 (68) 1067 (85)
Body mass index
 Normal (18.5–<25) 18873 13279 (70) 11558 (87)
 Underweight (<18.5) 597 398 (67) 357 (90)
 Overweight (25–<30) 19860 13794 (69) 11801 (86)
 Obese (>=30) 9759 6582 (67) 5664 (86)
 Unknown 1231 856 (70) 742 (87)
Follow-up participation
 No 21566 13557 (63) 11518 (85)
 Yes 28754 21352 (74) 18604 (87)
Mobility from enrollment to follow-upe
 No move 18313 16742 (91) 15352 (92)
 Move 6119 1671 (27) 768 (46)
 Not assessed 25888 16496 14002
Mortality
 No 38911 27456 (71) 23412 (85)
 Yes-all causes 11409 7453 (65) 6710 (90)
 Yes- cardiovascular 3782 2468 (65) 2229 (90)
Any cancer diagnosis
 No cancer 36888 25279 (69) 21723 (86)
 Death only from NDI 778 393 (51) 350 (89)
 Registry confirmed 11813 8630 (73) 7508 (87)
 Diagnosis before baseline 841 607 (72) 541 (89)
Lung cancer diagnosis
 No cancer 48531 33745 (70) 29086 (86)
 Death only from NDI 235 104 (44) 93 (89)
 Registry confirmed 1499 1015 (68) 900 (89)
 Diagnosis before baseline 55 45 (82) 43 (96)
a

LexisNexis address(es) found for participant on or before survey date

b

LexisNexis addresses with a date first seen closest to the survey date that matched the survey address by 4 or more address attributes (street number, street name, city, ZIP code, and state) and within 250m geocoded distance

c

Percent of participants with LexisNexis address(es)

d

Percent of participants with LexisNexis address(es) with an address matched to enrollment

e

Among those who participated and had a well-geocoded address at follow-up (n= 24,432)

Table 2.

Difference in rates of LexisNexis address attainment and match to the survey address at enrollment by participant characteristicsa, and 95% confidence intervals (CI)

LexisNexis address foundb LexisNexis address matchedc

Percent difference (95% CI) Percent difference (95% CI)

Race/Ethnicity
 Non-Hispanic White, % 69 (Ref.) 86 (Ref.)
 Non-Hispanic Black 2 (1, 4) 0 (−2, 2)
 Hispanic 4 (2, 6) 1 (−2, 2)
 Other 8 (6, 10) 1 (−1, 3)
 Unknown −2 (−5, 2) −2 (−5, 1)
Age at enrollment
 50–54, % 66 (Ref.) 84 (Ref.)
 55–59 0 (−2, 1) 0 (−2, 1)
 60–64 4 (2, 5) 2 (1, 4)
 65–69 7 (6, 9) 4 (3, 6)
 70+ 6 (4, 9) 6 (4, 8)
Gender
 Male, % 70 (Ref.) 85 (Ref.)
 Female −2 (−3, −1) 3 (2, 4)
Education
 Less than high school, % 63 (Ref.) 89 (Ref.)
 12 years/completed high school 4 (2, 7) 0 (−2, 3)
 Some college or training 5 (3, 8) −2 (−4, 0)
 College graduate 8 (6, 11) −3 (−4, 0)
 Postgraduate 9 (7, 12) −5 (−7, −3)
 Unknown 2 (−1, 6) −3 (−6, 0)
Smoking status at enrollment
 Never smoked, % 72 (Ref.) 85 (Ref.)
 Current smoker −8 (−10, −7) 4 (2, 4)
 Former Smoker −3 (−5, −3) 2 (0, 2)
 Unknown −4 (−7, −2) 0 (−3, 1)
Body mass index
 Normal (18.5–<25), % 70 (Ref.) 87 (Ref.)
 Underweight (<18.5) −3 (−8, 0) 3 (−1, 6)
 Overweight (25–<30) −1 (−2, 0) −1 (−2, −1)
 Obese (>=30) −3 (−4, −2) −1 (−2, 0)
 Unknown 0 (−4, 2) 0 (−3, 2)
Follow-up participation
 No, % 63 (Ref.) 85 (Ref.)
 Yes 11 (11, 12) 2 (1, 3)
Mobility from enrollment to follow-up
 No move, % 91 (Ref.) 92 (Ref.)
 Move −64 (−65, −63) −46 (−48, −43)
Mortality
 No, % 71 (Ref.) 85 (Ref.)
 Yes-all causes −6 (−6, −4) 5 (4, 6)
 Yes-cardiovascular −6 (−6, −3) 5 (3, 6)
Any cancer diagnosis
 No cancer, % 69 (Ref.) 86 (Ref.)
 Death only from NDI −18 (−22, −14) 3 (0, 6)
 Registry confirmed 4 (4, 5) 1 (0, 2)
 Diagnosis before baseline 3 (1, 7) 3 (1, 6)
Lung cancer diagnosis
 No cancer, % 70 (Ref.) 86 (Ref.)
 Death only from NDI −26 (−32, −19) 3 (−3, 10)
 Registry confirmed −2 (−4, 1) 3 (0, 5)
 Diagnosis before baseline 12 (1, 23) 10 (2, 17)
a

Pairwise differences between the proportion of participants in each category compared to the reference (Ref.)

b

Percent of participants with LexisNexis address(es) with a date first seen on or before survey date

c

Percent of participants with a LexisNexis address that matched the survey address by 4 or more address attributes (street number, street name, city, ZIP code, and state) and within 250m geocoded distance

The median year first seen for matched LexisNexis addresses at enrollment was 1983 (IQR: 1975–1989); Figure 2. The median estimated historical duration at this address was 13 years prior to study enrollment (IQR: 7–21 years). Six percent of participants had one or more LexisNexis address(es) with a year first seen before the matched LexisNexis address enrollment.

Figure 2.

Figure 2.

Distribution of year first seen for LexisNexis addresses matched to enrollment address and estimated duration at the address prior to enrollment (N=30,122). Red line indicates the cumulative percent of participants with a residential duration that began on or before the year referenced on the x-axis.

Among those who moved between surveys, sensitivity of date first seen was 88% (i.e., matched LexisNexis address at follow-up had a year first seen ≥ enrollment year); Table S5. Specificity, as calculated among non-movers, was 96% (i.e. matched LexisNexis addresses at enrollment and follow-up were the same).

Evaluation of exposures assessed at the enrollment address

Distributions of UFP exposure and LUR model predictors assigned at LexisNexis and survey addresses at enrollment were similar (Table S6). The median estimated UFP concentrations were 12,580 and 12,635 particles/cm3, respectively. Continuous UFP levels were highly correlated (Spearman’s rho= 0.91) and assigned quintiles were the same for 86% of participants (Table 3). Correlation and assigned quintiles for UFP predictors including distances to the major airport (LAX), nitrogen dioxide concentrations, land use, and traffic were also similar. The percent of developed medium intensity land within a 50m buffer had the lowest correlation (rho= 0.86) and quintile agreement (79%), whereas the percent of highly developed land and mixed forest within a 5km buffer had the highest correlations (rho=0.92).

Table 3.

Accuracy of exposure classifications for predictors of a land use regression model and estimated ultrafine particulates (UFP) exposure at LexisNexis compared to survey address at enrollment (N= 34,909)a

Spearman’s rho Correct quintile (%)

Ultrafine particles (#/cm3)b 0.91 86.34
Inverse distance to LAX airport (km) 0.91 86.04
Nitrogen dioxide (ppm) 0.90 85.68
Highly developed land (%, 5km buffer) 0.92 87.75
Weekday vehicle miles traveled (log/yr, 1km buffer)b 0.90 88.72
Land that is mixed forest (%, 5 km buffer) 0.92 91.88
Developed, medium intensity land (%, 50m buffer) 0.86 78.72
a

Among those with LexisNexis address(es) on or before the enrollment date. For each participant, the LexisNexis address with a date first seen closest to the survey date was selected for this analysis regardless of match status to the survey address

b

Weekday vehicle miles traveled and consequently UFP exposure missing for 529 participants

DISCUSSION

Overall, a LexisNexis search returned at least one residential address for almost all participants in the LA Ultrafines Study; however, the percent of participants with LexisNexis-identified addresses corresponding to the enrollment period in 1995–1996 was 69% compared to 95% at follow-up 10 years later. We found that participants with or without LexisNexis addresses were generally similar across most sociodemographic characteristics but differed at enrollment by age, smoking status, education, and residential mobility between surveys. Over 80% of LexisNexis addresses were verified against known addresses for participants at both the time of enrollment and at follow-up. Using information provided by LexisNexis, we were able to estimate residential at the enrollment address. We observed low rates of misclassification when estimating UFP exposure and its predictors at the LexisNexis address versus study-identified address at enrollment.

The use of LexisNexis to identify participant addresses in epidemiologic studies has been described previously [10-13]. We followed similar data processing procedures described in these studies to resolve commonly encountered problems, including addresses with inaccurate dates, poorly geocoded addresses, and duplicate addresses. LexisNexis’ ability to identify one or more addresses for 96% of participants in our study was similar to the 98% reported in the CTS [13]. However, address attainment before or during enrollment for LA Ultrafine participants (69%) was lower than in the CTS (80%) during the same time period (1995–1996). This difference could be due in part to population demographics; for example, the CTS is comprised of teachers from public schools, who may be easier to track through local data sources. Address attainment in our study was considerably higher at follow-up (95%), as compared to enrollment. This is consistent with findings from other studies that indicate LexisNexis performs better in more recent decades [10-13]. Consequently, only 6% of participants had LexisNexis address(es) that came before the address at enrollment, limiting our ability to evaluate historical residences prior to enrollment. However, the higher attainment rate at follow-up indicates that LexisNexis address acquisition could be beneficial for prospective address collection and for cohorts enrolled in the 2000s or later.

Our match rate to survey addresses at enrollment (86%) and follow-up (82%) was high, indicating that LexisNexis is a reliable source of address information for most of our study population in Southern California. Our results are consistent with the overall 85% match rate reported for CTS participants across four surveys between 1995–2006 [13]. A prior analysis for a sample of AARP participants reported match rates that were slightly lower for enrollment years (78–83%) but greater for follow-up years (87–89%) compared to our study [10]. However, this evaluation not employ temporal restrictions to LexisNexis addresses when matching (e.g., address must have a date first seen ≥ the date when the survey address was reported), unlike our study and the CTS.

We did not find address attainment differences greater than 5% across gender, a characteristic that was not evaluated in previous studies. A case-control study of childhood leukemia in California found LexisNexis was less likely to find addresses for Hispanic mothers compared to non-Hispanic mothers, and those with less than a high school education [11]. LexisNexis was less likely to find addresses for Native Americans compared to other race/ethnicity groups in the CTS [13]. We similarly observed a lower attainment rate at enrollment for those with less than a high school education. Since over 80% of LA Ultrafines Study participants are non-Hispanic white, we were limited in our ability to interpret differences by race/ethnicity. Similar to our findings, a study of residential pesticide exposure and childhood leukemia reported LexisNexis was less likely to find complete residential histories for mothers who moved [11]. Notably, the differences in address attainment across strata of participant characteristics tended to be smaller (<5%) at follow-up versus enrollment. The match rate to survey addresses at both time periods were similar across demographics, suggesting that the quality of available information is not biased by participant characteristics.

Attainment rate at enrollment (69%) was lower for those whose death was attributed to any cancer (51%) or lung cancer (44%) according to the NDI, but not found in cancer registries. The California Cancer Registry procedure for linkage entails entering address information and other identifiers to find cancer diagnoses [16]. Cancer-attributed deaths reported by the NDI with unconfirmed cancer diagnoses via registries may include people that were more difficult to trace with addresses (e.g., moved outside the study catchment areas). Among those with LexisNexis addresses, we saw no differences in the match rate to survey addresses by mortality status or cancer diagnosis. Therefore, our findings suggest that incomplete information (i.e., lower address attainment) may disproportionately affect a subgroup of people with unconfirmed cancer diagnoses but that exposure misclassification due to LexisNexis address inaccuracies is likely to be non-differential with respect to chronic health outcomes.

We found the LexisNexis-estimated median historical duration at the enrollment address was 13 years. Time lived at address(es) where exposure is assessed is critical for epidemiologic analyses of cancer incidence and other chronic health outcomes, where the exposure period of interest may span long periods of time or predate diagnosis by many years for diseases with long latent periods. There are several artifacts of the LexisNexis address database that are worth noting in this context. We observed that a large proportion of participants had a year first seen of 1983 relative to other years, which may reflect acquisition of additional data sources during this time (e.g., agreements with New York Times Information Service and IBM to access information) [17]. LexisNexis does not make details on their search algorithms available, but we speculate that addresses found for this time period may have been artificially assigned a year first seen of 1983 if information was not available prior to that year. To this end, any date first seen may be an underestimate of the true move in date if there is lag time between when a person moves into a new residence versus when a data source acquired address information linked to an individual. Since environmental exposures in studies of cancer and other chronic diseases are commonly assessed over long time scales, such as annually, we only used the year first seen (rather than the exact month and year as reported by LexisNexis) to estimate residential duration, which minimized some of the uncertainty in dates in our evaluation. We were limited in evaluating the accuracy of year first seen as a proxy for the exact residence move-in date, since this information was not collected for participants in our study. However, we were able to demonstrate LexisNexis addresses at follow-up accurately reported a year first seen that fell sometime between the survey periods for those who moved during this time.

When evaluating exposure classification using LexisNexis enrollment addresses, we observed high correlations for estimated UFP exposure and its predictors, reflection of the high match rate between LexisNexis and survey addresses. The aforementioned case-control study of childhood leukemia similarly found moderate to strong correlation in pesticide exposures assessed at LexisNexis addresses compared to addresses from birth and cancer registries [11]. Our results suggest that addresses attained from LexisNexis can be used to assign environmental exposures when residential information is unknown, with the acknowledgement of some potential misclassification. Like many older cohorts, our study did not collect residential information for participants prior to enrollment in 1995 to 1996—a time window also corresponding to limited information available from LexisNexis. Although we could not reconstruct full residential histories, our results suggest that studies that enrolled participants more recently or wish to gather prospective residential information may have greater success in using LexisNexis for such efforts.

In these analyses, we described the capacity of LexisNexis to identify addresses for most participants in an older cohort and demonstrated improvements in performance over calendar time. We verified that LexisNexis accurately identified the residence of a high percentage of participants at the time of enrollment and follow-up. Accordingly, we found low rates of misclassification of UFP exposure and its predictors when relying on LexisNexis addresses and were able to estimate historical address duration not directly reported by study participants for a large proportion of the cohort. These results show promise for the use of LexisNexis address information to support epidemiologic analyses of cancer incidence and other chronic health outcomes in our cohort. Future studies considering the use of commercial address sources like LexisNexis should be cognizant of time period and some population characteristics that could influence the completeness and accuracy of information obtained.

Supplementary Material

1

HIGHLIGHTS.

  • Residential addresses were found for most participants in a commercial database search

  • Information was validated for ≥80% of participants compared to self-reported addresses

  • Misclassification of air pollution exposures was <15% when relying on commercial addresses

  • Study highlights utility of a commercial database to obtain residential histories for participants to support epidemiologic analyses of long-term environmental exposures

ACKNOWLEDGEMENTS

We thank Matt Airola of Westat, Inc. team for GIS support for this work.

FUNDING

This work was supported by the Intramural Research of the National Cancer Institute

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

CONFLICTS OF INTEREST

The authors declare no conflicts of interest.

Declaration of interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

REFERENCES

  • 1.Turner MC, et al. , Ambient Air Pollution and Cancer Mortality in the Cancer Prevention Study II. Environmental health perspectives, 2017. 125(8): p. 087013–087013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Beelen R, et al. , Long-term effects of traffic-related air pollution on mortality in a Dutch cohort (NLCS-AIR study). Environmental health perspectives, 2008. 116(2): p. 196–202. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Thurston GD, et al. , Ambient Particulate Matter Air Pollution Exposure and Mortality in the NIH-AARP Diet and Health Cohort. Environ Health Perspect, 2016. 124(4): p. 484–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Lim CC, et al. , Association between long-term exposure to ambient air pollution and diabetes mortality in the US. Environ Res, 2018. 165: p. 330–336. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Lim CC, et al. , Long-Term Exposure to Ozone and Cause-Specific Mortality Risk in the United States. Am J Respir Crit Care Med, 2019. 200(8): p. 1022–1031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Joseph A-C, Fuentes M, and Wheeler DC, The impact of population mobility on estimates of environmental exposure effects in a case-control study. Statistics in Medicine, 2020. n/a(n/a). [DOI] [PubMed] [Google Scholar]
  • 7.Hystad P, et al. , Spatiotemporal air pollution exposure assessment for a Canadian population-based lung cancer case-control study. Environ Health, 2012. 11: p. 22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Crouse DL, et al. , Ambient PM2.5, O(3), and NO(2) Exposures and Associations with Mortality over 16 Years of Follow-Up in the Canadian Census Health and Environment Cohort (CanCHEC). Environ Health Perspect, 2015. 123(11): p. 1180–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Brokamp C, LeMasters GK, and Ryan PH, Residential mobility impacts exposure assessment and community socioeconomic characteristics in longitudinal epidemiology studies. J Expo Sci Environ Epidemiol, 2016. 26(4): p. 428–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Wheeler DC and Wang A, Assessment of Residential History Generation Using a Public-Record Database. International journal of environmental research and public health, 2015. 12(9): p. 11670–11682. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Ling C, et al. , Residential mobility in early childhood and the impact on misclassification in pesticide exposures. Environ Res, 2019. 173: p. 212–220. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Jacquez GM, et al. , Accuracy of commercially available residential histories for epidemiologic studies. Am J Epidemiol, 2011. 173(2): p. 236–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Hurley S, et al. , Tracing a Path to the Past: Exploring the Use of Commercial Credit Reporting Data to Construct Residential Histories for Epidemiologic Studies of Environmental Exposures. Am J Epidemiol, 2017. 185(3): p. 238–246. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.LexisNexis®. Public Records. 2020; Available from: https://www.lexisnexis.com/enus/products/public-records.page.
  • 15.Jones RR, et al. , Land use regression models for ultrafine particles, fine particles, and black carbon in Southern California. Science of The Total Environment, 2020. 699: p. 134234. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.California Cancer Registry. Procedures for Conducting Data Linkages with the California Cancer Registry. 2020; Available from: https://www.ccrcal.org/download/82/site-pdf-links/7437/description_of_linkages.pdf.
  • 17.LexisNexis®. The LexisNexis Timeline. 2003. [cited 2020 Sept 17]; Available from: http://www.lexisnexis.com/anniversary/30th_timeline_fulltxt.pdf.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES