Skip to main content
International Journal of Epidemiology logoLink to International Journal of Epidemiology
. 2018 Jan 16;47(2):368–368j. doi: 10.1093/ije/dyx268

Data Resource Profile: Expansion of the Rochester Epidemiology Project medical records-linkage system (E-REP)

Walter A Rocca 1,2,3,, Brandon R Grossardt 4, Scott M Brue 5, Cynthia M Bock-Goodner 5, Alanna M Chamberlain 1,6, Patrick M Wilson 4,6, Lila J Finney Rutten 6,7, Jennifer L St Sauver 1,6
PMCID: PMC5913632  PMID: 29346555

Data resource basics

History

In a series of methodological papers, we have described the Rochester Epidemiology Project (REP) medical records-linkage system as it has existed for more than 50 years in Olmsted County, Minnesota.1–4 Further details about the major events and protagonists of the history of the original REP are available elsewhere.1 Starting in 2010, we have expanded the population captured by the REP from a single county in south-eastern Minnesota to a geographical region including 27 counties in southern Minnesota and western Wisconsin. In this paper, we provide a profile of the expanded medical records-linkage system, which we name the Expanded-REP (E-REP) to distinguish it from the original REP. Because the data became available for electronic linkage and storage starting in 2010, we can consider 2010 the birth year for the E-REP.

General description

The E-REP was established to provide longitudinal medical data for a population residing in a well-defined geographical region. The E-REP captures a large percentage of the persons who have resided in a 27-county region of southern Minnesota and western Wisconsin at some time from 1 January 2010 to the present, regardless of age, sex, ethnicity and disease status. Depending on the needs for a specific study, the region can also be partitioned into smaller segments. For example, some studies have targeted a seven-county region or an 11-county region because these regions have a higher percentage of population capture; therefore, the non-participation percentage is lower.5–7 The electronic indexes of the E-REP include not only demographic information, diagnostic and procedure codes, health services utilization data and outpatient drug prescriptions, but also results of laboratory tests and information about smoking, height, weight and body mass index. Table 1 shows a list of data that are currently included in the electronic indexes and their definitions.

Table 1.

List of characteristics captured electronically within the electronic indexes of the E-REP

Characteristic Details
Sociodemographic and life habits
 Name First and last; both current and history of name changes
 Age Date of birth
 Sex Male, female
 Address Full address information; both current and all historical addresses; latitude and longitudea
 Social security number Unique identification number issued by the US government for each resident.
 Race Per US Census: White; Black; Asian; American Indian or Alaskan Native; Native Hawaiian or Pacific Islander. We also used the categories ‘Other and mixed’ and ‘Unknown’b
 Ethnicity Per US Census: Hispanic; non-Hispanicc
 Education Years of educationd
 Smoking Never, former, currente
Medicalf
 Diagnostic codes ICD-9 and ICD-10 codes
 Surgical procedures ICD-9 and ICD-10 codes
 Diagnostic and procedure codes ICD-9 and ICD-10 codes, CPT codes and HCPCS codes
 Outpatient drug prescriptions Rx-Norm codes and NDF-RT codesg
 Laboratory results LOINC codesh
 Height Centimetres
 Weight Kilograms
 Body mass index Kilograms/metres squared
 Blood pressure Systolic and diastolic pressure in mmHg
 Hospitalizations Admit and discharge dates
 Emergency department visits Admit and discharge dates
 Death information Date and causes of death; all causes of death included; ICD-9 and ICD-10 codes

ICD, International Classification of Diseases; CPT, Current Procedural Terminology; HCPCS, Healthcare Common Procedure Coding System.

a

The full address is also available as longitude and latitude (geocoding) for linkage with data from the US Census and the American Community Survey.

b

Other and mixed race includes those persons who specified ‘Two or more races’ in the US Census and persons who specifically reported their race as ‘Other’ or ‘Mixed’ in the E-REP. No category for ‘Unknown’ race exists in the US Census. The persons in the E-REP population with Unknown race include 940 persons who refused to specify a race, and 28 311 persons for whom no race information was available from any of their medical records.

c

All persons who did not declare to be Hispanic, were considered non-Hispanic.

d

Education is available for 46.1% of the E-REP population aged 25 years or older.

e

Patient-provided smoking status is collected in different ways across institutions. Therefore, smoking status is normalized to meaningful use categories before incorporation into the REP indexes.

f

All medical data include the date on which each piece of information was collected.

g

We use the National Library of Medicine RxNorm standard nomenclature [http://www.nlm.nih.gov/research/umls/rxnorm]. Drug prescriptions are also assigned to a National Drug File Reference Terminology (NDF-RT) category. Once standardized, prescriptions can be retrieved by RxNorm code, NDF-RT category or specific ingredient using REP tools.

h

The standard nomenclature LOINC system is complex and not intuitive. Therefore, we have developed a customized retrieval software for 55 common laboratory tests based on test name. We have validated our methods against the CPT codes for the same test to ensure that investigators receive the results for all of the laboratory tests that were actually performed.

Methods for linkage across institutions

The E-REP includes medical record data from multiple health care institutions, and these institutions currently use four distinct electronic health record (EHR) systems: the Mayo Clinic GE Centricity EHR, the Olmsted Medical Center IC Chart EHR, the Mayo Clinic Health System Cerner EHR and the Olmsted County Public Health Services PH-Doc EHR. The Mayo Clinic and the Mayo Clinic Health System are currently transitioning to a shared EPIC EHR system that will simplify the linkage greatly. In addition, the Olmsted Medical Center is also transitioning to an EPIC EHR (similar but less customized than the Mayo Clinic version), which will make the data more homogeneous.

We have more than 50 years of experience in linking data derived from different health care institutions and different EHRs, and the general methods used have been described in detail elsewhere.4 The studies conducted to validate the linkage methods in the original REP showed a frequency of failure to link two records when the records belonged to the same person (under-inclusion rate) of 1.3% [95% confidence interval (CI): 0.2-2.4%], and a frequency of incorrect linkage of two records when the records did not belong to the same person (over-inclusion rate) of 2.5% (95% CI: 1.0-4.0%).4 We also validated the census enumeration based on address information.4 A team with information technology and statistical expertise is dedicated to the E-REP (approximately three full-time equivalent positions).

Data collected

Distribution by age, sex, and county

All unique persons who had at least one health-related visit captured by the system after 1 January 2010 were considered to enumerate the 27-county population using the individual timeline methods described elsewhere.4Table 2 shows the population captured by the E-REP on 1 January 2014 and the percentage capture compared with the US Census estimates for 2014, overall and in strata by county and sex. The counties are grouped under Minnesota (19 counties) and Wisconsin (eight counties). In 2014, the E-REP captured 694 506 persons, 337 241 men (48.6%) and 357 265 women (51.4%; Table 2). The largest captured population resides in Olmsted County (n = 150 013) followed by Eau Claire County (n = 56 104) and La Crosse County (n = 51 804). Table 2 also shows the population captured and the percentage capture for a seven-county region with higher coverage.

Table 2.

Population captured within the Expanded Rochester Epidemiology Project (E-REP) and percentage capture as compared with the 2014 US Census estimates (1 January 2014)a

State, County Men, all ages
Women, all ages
Both sexes, all ages
N % captureb N % captureb N % captureb
Minnesota
 Olmstedc 71 512 97.3 78 501 102.4 150 013 99.9
 Dodgec 8926 87.7 9175 90.2 18 101 89.0
 Mowerc 18 349 93.6 19 037 97.1 37 386 95.3
 Goodhue 16 091 69.7 16 992 73.0 33 083 71.3
 Fillmore 7156 68.6 7541 72.6 14 697 70.6
 Wabashac 9071 85.4 9455 88.5 18 526 86.9
 Winona 6777 26.8 7047 27.3 13 824 27.0
 Houston 4068 43.6 4130 43.9 8198 43.7
 Freebornc 13 525 88.5 14 133 91.1 27 658 89.8
 Steelec 14 924 82.7 15 507 83.6 30 431 83.1
 Rice 7650 23.0 9080 28.5 16 730 25.7
 Blue Earth 15 101 45.8 16 394 50.4 31 495 48.1
 Wasecac 7475 83.5 8049 80.4 15 524 81.9
 Faribault 4823 68.8 5178 72.5 10 001 70.6
 Martin 7124 71.6 7755 75.6 14 879 73.6
 Watonwan 3523 64.5 3770 68.5 7293 66.5
 Brown 2589 20.5 2821 22.3 5410 21.4
 Nicollet 7502 44.9 8343 50.5 15 845 47.6
 Le Sueur 6520 46.6 7173 52.2 13 693 49.3
Wisconsin
 Eau Claire 27 257 54.3 28 847 56.0 56 104 55.2
 Trempealeau 7956 53.5 8045 55.0 16 001 54.2
 La Crosse 25 658 44.5 26 146 43.4 51 804 44.0
 Buffalo 3866 57.9 3871 59.3 7737 58.6
 Pepin 2052 55.8 2108 57.4 4160 56.6
 Dunn 15 792 70.7 16 174 73.6 31 966 72.2
 Barron 9598 42.3 9419 41.3 19 017 41.8
 Chippewa 12 356 37.5 12 574 41.2 24 930 39.3
Seven-county regionc 143 782 92.0 153 857 95.4 297 639 93.8
All counties 337 241 59.4 357 265 62.4 694 506 60.9
a

This table includes only persons who have given permission for all or part of their medical record information to be used for research purposes (participants in the E-REP). The complete population enumerated by the E-REP on 1 January 2014 comprised 763 695 persons (369 403 men and 394 292 women); therefore, the participation was 90.9% overall, 91.3% for men, and 90.6% for women.

b

Percentage captured is calculated by dividing the E-REP population by the US Population Census Estimates (Vintage 2015) for 1 July 2014, as reported in Supplementary Table 3 (available at IJE online).

c

These are the counties included in the seven-county region.

Supplementary Tables 1 and 2 (available as Supplementary data at IJE online) show the additional stratification by age (strata by county, sex and age). Supplementary Table 3 (available as Supplementary data at IJE online) shows the US Census population estimates used to calculate the percentage capture. The capture was 60.9% overall, was higher in women (62.4%) than in men (59.4%; Table 2) and increased monotonically with older age for both men and women (Supplementary Table 2). There are two primary factors influencing the percentage capture. First, not all of the care facilities providing care to the population residing in the 27-county region collaborate with the E-REP. Second, persons who receive care from any of the participating care facilities located in Minnesota are asked to sign a research authorization form as required by Minnesota law (Minnesota State privacy law, statute 144.335).2,4 The E-REP only includes persons who have given permission for all or part of their medical record information to be used for research purposes (participants in the E-REP); overall, 90.9% of the eligible population provided this authorization (participation of 91.3% for men and 90.6% for women; see footnote a in Table 2). A similar law is not active in Wisconsin.

Figure 1 is a map of the 27-county region showing the geographical location and the percentage capture for each county. The region can be subdivided into a maximum capture segment, Olmsted County (99.9%; blue border), a high capture segment, including Olmsted County and six additional contiguous counties (93.8%; seven-county region, pink border) and the complete 27-county region (60.9%). The percentage capture may vary in the coming years with new health care institutions joining the E-REP and with changes in the population. Therefore, the high capture segment may vary across studies conducted at different times.

Figure 1.

Figure 1

Geographical map of the 27-county region of the E-REP showing the geographical location and the percentage capture for each county (black numbers or white numbers). The region can be subdivided into a maximum capture segment, Olmsted County (blue border), a high capture segment, including Olmsted County and six additional contiguous counties (pink border), and the overall 27-county region. The colour shading of the counties is proportional to the percentage capture of the E-REP as compared with the US Census estimates.

Figure 2 shows the distribution in the 27 counties by percentage of persons below poverty level (panel A), percentage of persons of non-White race (panel B), percentage of college-educated persons (panel C) and percentage of county area considered urban. These data were gathered from the US Census Bureau and the American Community Survey.8–11 The expansion from the traditional REP in Olmsted County (highlighted in blue) to the 27-county region will allow the inclusion of greater numbers of people living in poverty, with lower education, of non-White race, and living in more rural areas.

Figure 2.

Figure 2

Maps with distribution of counties by poverty level, race, education and rurality. Geographical map of the 27-county region showing the distribution by percentage of persons below the poverty level (panel A), percentage of persons of non-White race (panel B), percentage of persons with college degrees (panel C) and percentage of county area considered urban (panel D). These data were gathered from the US Census Bureau and the American Community Survey.8–11 The colour shading of the counties is proportional to the percentage values. The scale varies across the four panels to maximize the visual contrast. Olmsted County is highlighted in blue to facilitate the comparison of the traditional REP with the E-REP.

Distribution by race, ethnicity, education and income

Table 3 shows the percentage capture of the E-REP stratified by race and ethnicity, as specified by the US Census. Overall, the capture rate was higher for Blacks and lower for Asians compared with Whites. However, the E-REP classification of people by race and ethnicity was somewhat different from the US Census and allowed for more people to self-identify as ‘other or mixed’. In addition, 4.2% of the population in the E-REP refused to specify a race, or no race/ethnicity information was recorded in any of their medical records. Overall, the Hispanic population had a capture percentage higher than the non-Hispanic population.

Table 3.

Racial and ethnic composition of the population captured in the Expanded Rochester Epidemiology Project (E-REP) on 1 January 2014

US Censusa
E-REP
Race or ethnicity N % N % % captureb
Total population, by race
White 1 062 654 93.3 608 142 87.6 57.2
Non-White 76 894 6.7 57 113 8.2 74.3
 Black 24 675 2.2 19 312 2.8 78.3
 Asian 29 133 2.6 14 203 2.0 48.8
 AIAN 5958 0.5 1716 0.2 28.8
 NHPI 597 0.1 1039 0.1 174.0
 Other and mixedc 16 531 1.5 20 843 3.0 126.1
Unknownd 29 251 4.2
  All races 1 139 548 100.0 694 506 100.0 60.9
Total population, by ethnicity
Non-Hispanic 1 091 649 95.8 662 704 95.4 60.7
Hispanic 47 899 4.2 31 802 4.6 66.4
  Both ethnicities 1 139 548 100.0 694 506 100.0 60.9

AIAN, American Indian or Alaska Native; NHPI, Native Hawaiian or Pacific Islander.

a

The estimates for 2014 are from the US Census.9

b

The capture percentage is calculated by dividing the number of persons in the E-REP by the corresponding number in the US Census. Some of the capture percentages higher than 100% are due to differences in classification between the US Census and the E-REP. In general, the E-REP counted more people who self-identified as Native Hawaiian or Pacific Islander, Other and mixed or had unknown race.

c

Other and mixed race includes those persons who specified ‘Two or more races’ in the US Census and persons who specifically reported their race as ‘Other’ or ‘Mixed’ in the E-REP.

d

No category for ‘Unknown’ race exists in the US Census. The persons in the E-REP population with Unknown race include 940 persons who refused to specify a race, and 28 311 persons for whom no race information was available from any of their medical records.

Supplementary Table 4 (available as Supplementary data at IJE online) compares the demographic, racial, ethnic and socioeconomic characteristics of the 27-county population captured by the E-REP with data from the US Census for the entire 27-county population, the Upper Midwest population and for the entire US population in 2014. The comparisons are also visualized in Figure 3. The top part of Figure 3 shows a map of the USA with the five states of the Upper Midwest highlighted in yellow (North Dakota, South Dakota, Minnesota, Iowa and Wisconsin). The geographical location of the 27-county region is shown as an orange insert superimposed on the maps of Minnesota and Wisconsin. The 27-county region is also projected out of the map in blue. The orange colour refers to the 27-county region as reported by the US Census, and the blue colour refers to the 27-county region as currently captured by the E-REP.

Figure 3.

Figure 3

Comparison of the E-REP population with US Census data for the 27-county region, the Upper Midwest, and the entire USA. The top part of the figure shows a geographical map and the colour coding for the three regions that were compared in this study: the 27-county region, the Upper Midwest and the entire USA. The 27-county region captured by E-REP included 60.9% of the total 27-county population and 4.4% of the Upper Midwest population as estimated by the US Census for 2014.9 In turn, the Upper Midwest population represented 5.0% of the total US population. The left column of the figure compares the 27-county region captured in E-REP (blue-line profile) with the population pyramids for the entire population of the 27-county region (in orange), for the Upper Midwest (in yellow), and for the entire USA (in white). The pyramids were constructed using 5-year age groups. The bar graphs in the central and right columns compare the study populations for race, ethnicity, ageing and education. This figure compares E-REP data with data from the US Census Bureau and the American Community Survey.9,10 Alaska and Hawaii are included in the entire US data, but are not shown in the map.

The left column of Figure 3 shows the population pyramid profile (blue-line) for the 27-county region captured by the E-REP, superimposed on the pyramid from the US Census for the 27-county region (orange pyramid), for the Upper Midwest population (yellow pyramid) and for the entire US population (white pyramid). The centre and right columns of Figure 3 show bar graph comparisons of the 27-county population captured by the E-REP (blue bars) with the three comparison populations for racial, ethnic, ageing, and educational characteristics. As expected, the distributions by race, ethnicity, ageing and education were similar for the E-REP population (blue bars) compared with the US Census 27-county population (orange bars) and with the US Census Upper Midwest population (yellow bars). However, the entire US population (white bars) included higher percentages of non-Whites and Hispanics compared with the E-REP population.

The population of the 27-county region captured by the E-REP has a demographic, ethnic and socioeconomic distribution similar to the entire 27-county region as reported by the US Census, suggesting that the 60.9% capture does not cause a major distortion for these characteristics. In addition, the population of the 27-county region captured in the E-REP is similar to the entire population of the Upper Midwest (Supplementary Table 4, available as Supplementary data at IJE online; and Figure 3). These similarities in distribution suggest that some extrapolations of findings from the E-REP population to the Upper Midwest and to a large segment of the US population may be reasonable. However, extrapolations should be made on a case-by-case basis and should involve careful judgment.3

Data resource use

Although the E-REP started to link and store data only in 2010, a few studies have used the new data resource. These early studies demonstrate the potential for use in future studies. In a first study, Rutten et al. used the E-REP to study the delivery of human papillomavirus (HPV) vaccination among persons 9–18 years old in the 27-county region (n = 68 272). The results were linked to the results of a survey among 280 primary care physicians practising in 52 clinical sites of the same 27-county region. The study showed that clinician knowledge, clinician barriers and perceived parental barriers regarding HPV vaccination were associated with the rates of vaccine initiation and completion at the clinical site level. These data can guide efforts to improve HPV vaccine delivery in clinical settings.12

In a second study, Rutten et al. used the E-REP to study HPV vaccination rates in a seven-county region in southern Minnesota between 2010 and 2015. Information from patient address (geolocation) was used to link patients with socioeconomic data from the American Community Survey at the census block group level. Older age and female sex were associated with higher HPV vaccination rates. Residing in areas with low socioeconomic status was associated with lower rates of vaccine initiation, completion of second dose and completion of third dose. HPV vaccine rates also varied geographically across the region and were higher in urban areas. The identification of geographical areas with low HPV vaccination rates may help to target local interventions.5

Chamberlain et al. used an 11-county region in south-eastern Minnesota to study the association between social isolation and functional status or mental health in patients with heart failure. The E-REP electronic indexes were used to identify all of the persons in the system who received an ICD-9 code for heart failure (ICD-9 code 428) between 1 January 2013 and 31 March 2014. These persons were invited to participate in a mail survey to measure social isolation, functional status and mental health. The study showed that greater social isolation in patients with heart failure was associated with worse functional status and worse mental health, independent of other comorbid conditions.6

Fabbri et al. used the same 11-county region in south-eastern Minnesota to study the association between health literacy and prognosis in patients with heart failure. The E-REP electronic indexes were used to identify all persons in the system who received an ICD-9 code for heart failure (ICD-9 code 428) between 1 January 2013 and 31 March 2015. These persons were invited to participate in a mail survey to measure health literacy. Health literacy was dichotomized as adequate or low, and persons were followed using the E-REP electronic indexes to study mortality and hospitalization. The study showed that low health literacy was associated with increased risk of hospitalization and death among patients affected by heart failure.7

These four studies illustrate four important characteristics of the E-REP: (i) the flexibility of the E-REP to include a convenient number of counties as dictated by the specific study question and by the resources available; (ii) the ability to link medical record data with data collected from care providers or from patients via survey; (iii) the ability to link the patient address with longitude and latitude data, and to use the geolocation to obtain data from the US Census, from the American Community Survey or from other sources; and (iv) the ability to identify people with a particular disease, surgical procedure, laboratory test result or taking a particular drug to be invited to participate in a survey, in an observational study, or in a clinical trial. The combination of passive research methods (using electronic medical record data or electronic data plus manual chart review) and active research methods (recruiting people for surveys, observational studies or interventions) may be particularly important for the future.

Strengths and weaknesses

Strengths

The major strength of the E-REP is the larger sample size and the broader geographical representation compared with the original REP. We decided to expand the REP from the original single-county region to 27 counties, to increase the population size from approximately 150 000 people to approximately 700 000 people. This almost 5-fold increase allows us to study less common conditions (e.g. pancreatic cancer), specific segments of the population (e.g. children 5–15 years old or women 70–79 years old) and specific ethnic/race groups (e.g. non-Hispanic Asian), and allows for the comparison of persons residing in rural vs urban settings (counties with high vs low percentage of area considered urban). Although the percentage representation of minorities in the E-REP region is smaller than in some other parts of the country (e.g. Florida, southern California or Mississippi), the E-REP includes a sizeable number of minorities, and these populations can be oversampled for specific studies (Table 3). On the other hand, for some of the minority groups (e.g. Blacks) and for some less common diseases, the numbers may remain inadequate even after oversampling. This limitation may attenuate over time with the expansion of the window of capture of incident diseases or outcomes. Therefore, the E-REP addresses, to some extent, an important limitation of the original REP that does not include large numbers of minorities.

The E-REP also addresses a second limitation of the original REP population who include an unusually high percentage of health professionals and their families, who may have privileged access to medical care and a higher level of health literacy. We have shown repeatedly that the findings of studies conducted in Olmsted County are replicated in other populations in the USA and internationally. For example, the initial report of a declining trend in the incidence of dementia observed in Olmsted County has now been replicated in the USA and in several European countries.13,14 However, the inclusion of 26 additional counties in rural Minnesota and rural Wisconsin should more completely address the concern. For example, the percentage of persons aged ≥ 65 years and the educational level are similar in the 27-county region captured by the E-REP and in the total US population (Figure 3).

An important strength of the E-REP is the more than 50 years of experience linking data from different institutions and involving different EHRs.2,4 On the other hand, there are continuing and evolving challenges to linkage activities that will require flexibility and innovation.

Weaknesses

An important potential limitation of the E-REP is the 60.9% population capture. The capture is higher for women than men, for older persons and in some counties, including Olmsted County and six contiguous counties. The incomplete capture may cause a selection bias when estimating prevalence, incidence or the natural history of diseases (e.g. outcomes or survival). In addition, factors such as socioeconomic status, education, occupation, insurance coverage or county of residence may influence the capture rate and may bias the results of observational epidemiological studies (case-control studies or cohort studies). The concern is that the population captured may be systematically different from the population not captured.15,16

On the other hand, a similar problem is encountered in most studies that involve active recruitment and participation of subjects in a cross-sectional survey or in a cohort study involving multiple contacts (baseline contact and follow-up contacts). The percentage participation in other studies at baseline is almost invariably less than the E-REP 61%, and further attrition is experienced during follow-up. Participants in surveys or cohort studies may differ systematically from non-participants. In addition, some authors have argued that representativeness is not indispensable, or not even desirable, in observational studies testing the association between two variables.17–19

Investigators using the 27-county region for studies in which a selection bias may be an important limitation can use an internal comparison strategy. The 27-county region can be partitioned into three segments: Olmsted County, where the capture is virtually complete; a seven-county region, where the capture is 93.8%; and the overall region, where the capture is 60.9%. For example, if the incidence rate of a given disease is similar in the three segments (after accounting for age and sex differences), it is likely that the overall results in the 27-county region are valid. The advantage of using the larger region for a study is the increased number of persons and the improved stability of the incidence rates. Similarly, a case-control study or a cohort study can be stratified across the three regions. If the odds ratio (OR) or the hazard ratio (HR) is the same in Olmsted County, the seven-county region and the 27-county region, we can reasonably assume that there are no major sampling biases. However, if the ORs or the HRs are noticeably different, we have evidence of a possible sampling bias and the ORs or the HRs should be reported separately.20 A similar strategy was used to explore how the geographical area of residency and the distance of referral may influence the capture in studies using series of patients referred to the Mayo Clinic from the Upper Midwest region or from further away.21–23

Because the E-REP started to link and store data only in 2010, the maximum depth of longitudinal data is currently 7 years. Therefore, studies involving long-term associations of one variable with another variable cannot be conducted (e.g. historical cohort studies with long-term follow-up). Similarly, diseases that are relatively uncommon may require a longer time frame to collect an adequate number of incident cases. However, the E-REP will mature with time.

The E-REP involves a single geographically defined US population, and the findings of our studies may differ from those of studies conducted in other populations. However, the demographic and socioeconomic characteristics of our population are similar to those of the Upper Midwest and of a large segment of the entire US population, and replication of the studies in other populations in the USA and worldwide will allow for useful comparisons.3 In addition, the focus on a geographically defined population allows for the study of contextual factors, local variability and local needs.24,25 The E-REP can be leveraged to identify priorities for local public health, clinical practice and research. Indeed, data from the original REP have been consistently used to prepare a local collaborative Community Health Needs Assessment which informs local health improvement plans.26 The E-REP provides opportunities to expand these efforts. As suggested by experts from the Center for Disease Control and Prevention, our population can serve as a model for studies in other communities throughout the country.26–28 This local and contextual perspective complements and enriches the national and international perspectives.

Data resource access

We developed an interactive, open access web-based tool that we named the REP Data Exploration Portal (REP-DEP), to explore patterns of prevalence and co-occurrence of diseases (aggregate-level data). The REP-DEP was described in detail elsewhere and can be accessed through the REP website at [http://www.rochesterproject.org/portal/].29 Individual-level data can be obtained upon request. However, like other large and complex records-linkage systems, the use of the data in the E-REP is complex. We encourage investigators interested in using the E-REP to test specific hypotheses to develop a collaboration with our research team. Queries should be sent via e-mail to [info@rochesterproject.org], with a one-page outline of the intended project.

E-REP in a nutshell

  • The E-REP medical records-linkage system is a new data resource to support population-based epidemiological studies. In 2014, the E-REP captured approximately 61% of the persons residing in a 27-county region of southern Minnesota and western Wisconsin.

  • In 2014, the E-REP included a total of 694 506 persons, 337 241 men (48.6%) and 357 265 women (51.4%). Approximately 4.6% of the population is Hispanic and 8.2% of a non-White race.

  • The E-REP electronic indexes include demographic information, diagnostic and procedure codes, health services utilization data, outpatient drug prescriptions, results of laboratory tests and information about smoking, height, weight and body mass index.

  • The demographic, racial, ethnic and socioeconomic characteristics of the E-REP population are similar to the characteristics of the entire 27-county region, of the Upper Midwest, and of a large segment of the entire US population. However, generalizations of findings to other populations in the USA or worldwide should be considered on a case-by-case basis.

  • Aggregate-level data are openly available to browse via the REP Data Exploration Portal [REP-DEP; http://www.rochesterproject.org/portal/]. Investigators interested in using the E-REP to test specific hypotheses can contact us via e-mail [info@rochesterproject.org].

Supplementary Data

Supplementary data are available at IJE online.

Funding

The Rochester Epidemiology Project is supported by the National Institute on Aging of the National Institutes of Health [grant numbers R01 AG034676, R01 AG052425]. However, the content of this article is solely the responsibility of the authors and does not necessarily represent the official view of the National Institutes of Health. This study was also supported by funds from the Mayo Clinic Robert D and Patricia E Kern Center for the Science of Health Care Delivery. W.A.R. was partly supported by the National Institutes of Health [P50 AG044170, U01 AG006786, and P01 AG004875].

Supplementary Material

Supplementary Data

Acknowledgement

We thank Ms Robin M Adams for her assistance in typing and formatting the manuscript.

Author Contributions

W.A.R. and J.L.S. are the co-Principal Investigators of the REP and direct all of the activities. B.R.G., S.M.B., C.M.B.G. and P.M.W. are responsible for data cleaning, harmonization, linkage and analyses. B.R.G. also provides statistical support. A.M.C. provides support to users of the E-REP. L.J.F.R. provides expertise in community health and cancer research. W.A.R. drafted the manuscript and all of the remaining authors provided critical revisions of the manuscript.

Conflict of interest: None declared.

References

  • 1. Rocca WA, Yawn BP, St Sauver JL, Grossardt BR, Melton LJ 3rd. History of the Rochester Epidemiology Project: half a century of medical records linkage in a US population. Mayo Clin Proc 2012;87:1202–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. St Sauver JL, Grossardt BR, Yawn BP. et al. Data Resource Profile: the Rochester Epidemiology Project (REP) medical records-linkage system. Int J Epidemiol 2012;41:1614–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. St Sauver JL, Grossardt BR, Leibson CL, Yawn BP, Melton LJ 3rd, Rocca WA.. Generalizability of epidemiological findings and public health decisions: an illustration from the Rochester Epidemiology Project. Mayo Clin Proc 2012;87:151–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. St Sauver JL, Grossardt BR, Yawn BP, Melton LJ 3rd, Rocca WA.. Use of a medical records linkage system to enumerate a dynamic population over time: the Rochester Epidemiology Project. Am J Epidemiol 2011;173:1059–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Finney Rutten LJ, Wilson PM, Jacobson DJ. et al. A population-based study of sociodemographic and geographic variation in HPV vaccination. Cancer Epidemiol Biomarkers Prev 2017;26:533–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Chamberlain AM, Finney Rutten LJ, Manemann SM. et al. Association of social isolation with functional status and mental health in patients with heart failure. Circulation 2016;133:AP092. [Google Scholar]
  • 7. Fabbri M, Yost K, Finney Rutten LJ. et al. Health literacy and outcomes in heart failure: a prospective community study. Circulation 2017;135:AP123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. US Census. 2015 Poverty and Median Household Income Estimates - Counties, States, and National 2016. https://www.census.gov/did/www/saipe/data/statecounty/data/2015.html (18 July 2017, date last accessed).
  • 9. U.S. Census Bureau, Population Division. Annual County Resident Population Estimates by Age, Sex, Race, and Hispanic Origin: April 1, 2010 to July 1, 2015 U.S. Census Bureau, Population Division, 2016. https://www.census.gov/programs-surveys/popest.html (18 July 2017, date last accessed).
  • 10. Educational attainment for adults age 25 and older for the U.S., States, and Counties. American Community Survey 5-year average, 2011-15. https://www.ers.usda.gov/data-products/county-level-data-sets/county-level-data-sets-download-data/ (18 July 2017, date last accessed).
  • 11. US Census. 2010 Census Urban and Rural Classification and Urban Area Criteria https://www.census.gov/geo/reference/ua/urban-rural-2010.html; Data layout: https://www.census.gov/geo/reference/ua/ualists_layout.html (19 July 2017, date last accessed).
  • 12. Rutten LJ, St Sauver JL, Beebe TJ. et al. Clinician knowledge, clinician barriers, and perceived parental barriers regarding human papillomavirus vaccination: association with initiation and completion rates. Vaccine 2017;35:164–69. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Rocca WA, Petersen RC, Knopman DS. et al. Trends in the incidence and prevalence of Alzheimer's disease, dementia, and cognitive impairment in the United States. Alzheimers Dement 2011;7:80–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Rocca WA. Time, sex, gender, history, and dementia. Alzheimer Dis Assoc Disord 2017;31:76–79. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Sackett DL. Bias in analytic research. J Chronic Dis 1979;32:51–63. [DOI] [PubMed] [Google Scholar]
  • 16. Ellenberg JH. Observational data bases in neurological disorders: selection bias and generalization of results. Neuroepidemiology 1994;13:268–74. [DOI] [PubMed] [Google Scholar]
  • 17. Rothman KJ, Gallacher JE, Hatch EE.. Why representativeness should be avoided. Int J Epidemiol 2013;42:1012–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Rothman KJ. Six persistent research misconceptions. J Gen Intern Med 2014;29:1060–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Elwood JM. Commentary: On representativeness. Int J Epidemiol 2013;42:1014–15. [DOI] [PubMed] [Google Scholar]
  • 20. Porta M. A Dictionary of Epidemiology. 6th edn.New York, NY: Oxford University Press, 2014. [Google Scholar]
  • 21. Rocca WA, Grossardt BR, Peterson BJ. et al. The Mayo Clinic Cohort Study of Personality and Aging: design and sampling, reliability and validity of instruments, and baseline description. Neuroepidemiology 2006;26:119–29. [DOI] [PubMed] [Google Scholar]
  • 22. Rocca WA, Peterson BJ, McDonnell SK. et al . The Mayo Clinic Family Study of Parkinson's Disease: study design, instruments, and sample characteristics. Neuroepidemiology 2005;24:151–67. [DOI] [PubMed] [Google Scholar]
  • 23. Kokmen E, Ozsarfati Y, Beard CM, O'Brien PC, Rocca WA.. Impact of referral bias on clinical and epidemiological studies of Alzheimer's disease. J Clin Epidemiol 1996;49:79–83. [DOI] [PubMed] [Google Scholar]
  • 24. Bayliss EA, Bonds DE, Boyd CM. et al. Understanding the context of health for persons with multiple chronic conditions: moving from what is the matter to what matters. Ann Fam Med 2014;12:260–69. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Goodman RA, Bunnell R, Posner SF.. What is ‘community health’? Examining the meaning of an evolving field in public health. Prev Med 2014;67(Suppl 1):S58–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Olmsted County Public Health Services. Community Health Needs Assessment: A Collaborative Effort Lead by: Olmsted County Public Health Services, Olmsted Medical Center, Mayo Clinic 2013. http://www.co.olmsted.mn.us/OCPHS/reports/Documents/Community%20Health%20Needs%20Assessment%202013.pdf (6 July 2017, date last accessed).
  • 27. Posner SF, Goodman RA.. Multimorbidity at the local level: implications and research directions. Mayo Clin Proc 2014;89:1321–23. [DOI] [PubMed] [Google Scholar]
  • 28. U.S. Department of Health and Human Services. The Community as a Learning System: Using Local Data to Improve Local Health. National Committee on Vital and Health Statistics; 2011. http://www.ncvhs.hhs.gov/wp-content/uploads/2014/05/110512sm.pdf (6 July 2017, date last accessed). [Google Scholar]
  • 29. St Sauver JL, Grossardt BR, Finney Rutten LJ. et al. Rochester Epidemiology Project Data Exploration Portal (REP-DEP). Prev Chronic Dis 2017. (in press). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from International Journal of Epidemiology are provided here courtesy of Oxford University Press

RESOURCES