Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 Jan 31.
Published in final edited form as: Am J Prev Med. 2015 Feb;48(2):234–240. doi: 10.1016/j.amepre.2014.10.020

Electronic Health Records and Community Health Surveillance of Childhood Obesity

Tracy L Flood 1, Ying-Qi Zhao 1, Emily J Tomayko 1, Aman Tandias 1, Aaron L Carrel 1, Lawrence P Hanrahan 1
PMCID: PMC4435797  NIHMSID: NIHMS685263  PMID: 25599907

Abstract

Background

Childhood obesity remains a public health concern, and tracking local progress may require local surveillance systems. Electronic health record data may provide a cost-effective solution.

Purpose

To demonstrate the feasibility of estimating childhood obesity rates using de-identified electronic health records for the purpose of public health surveillance and health promotion.

Methods

Data were extracted from the Public Health Information Exchange (PHINEX) database. PHINEX contains de-identified electronic health records from patients primarily in south central Wisconsin. Data on children and adolescents (aged 2–19 years, 2011–2012, n=93,130) were transformed in a two-step procedure that adjusted for missing data and weighted for a national population distribution. Weighted and adjusted obesity rates were compared to the 2011–2012 National Health and Nutrition Examination Survey (NHANES). Data were analyzed in 2014.

Results

The weighted and adjusted obesity rate was 16.1% (95% CI=15.8, 16.4). Non-Hispanic white children and adolescents (11.8%, 95% CI=11.5, 12.1) had lower obesity rates compared to non-Hispanic black (22.0%, 95% CI=20.7, 23.2) and Hispanic (23.8%, 95% CI=22.4, 25.1) patients. Overall, electronic health record–derived point estimates were comparable to NHANES, revealing disparities from preschool onward.

Conclusions

Electronic health records that are weighted and adjusted to account for intrinsic bias may create an opportunity for comparing regional disparities with precision. In PHINEX patients, childhood obesity disparities were measurable from a young age, highlighting the need for early intervention for at-risk children. The electronic health record is a cost-effective, promising tool for local obesity prevention efforts.

Introduction

In the past 30 years, childhood obesity has emerged as a major health concern in the U.S.1 Rates of childhood obesity began to rise in the 1990s2 with signs of stabilization in recent years.3 Obesity may still be increasing in some racial and ethnic subgroups.4 National data provide insight into disparities but may not reflect regional trends,5 which continue to diverge by location,6 age,7,8 and measures of poverty,8,9 as well as across racial/ethnic divides.7,9 Therefore, local data are increasingly necessary as an adjuvant to national public health surveillance systems. Indeed, local childhood obesity rates may guide the planning and tracking of community-based interventions. Despite local data being pivotal for progress,10 traditionally, there has been a prohibitive time and cost burden associated with the collection, storage, and analysis of local data.

The widespread adoption of electronic health records (EHRs), as incentivized by the 2010 “meaningful use” initiative,11,12 has resulted in the digitization of vast amounts of health data collected during regular clinic visits. Meaningful use has also catalyzed the secure sharing of health data across institutions for the purpose of population health improvement. The examination of EHR data for the purposes of health promotion and public health surveillance, beyond the use for tracking individual patient health, may represent a paradigm shift for population health.1318 The EHR contains many variables, with public health utility and childhood obesity data (as measured by BMI) among them. A multi-institutional study that examined BMI data from multiple EHR systems reported acceptable data quality.5 Reasons for high data quality are likely threefold: First, according to census data, over half of all children aged <18 years utilize health services at least yearly.19 Second, the American Academy of Pediatrics (AAP) recommends an annual BMI measurement for every child aged 2–19 years.20 Third, measuring BMI in the EHR is considered to be a core measure of meaningful use21; therefore, financial incentive is provided for its collection.21 These factors may be contributing to the high quality of BMI data. Additionally, once collaborations and systems are in place, the cost and time commitment needed for EHR data extraction is reportedly minimal.5 Therefore, EHR is a potentially cost-effective, emergent tool for public health surveillance.

Despite potential advantages of utilizing EHRs in local health surveillance, methodologic concerns2224 remain that may challenge its utility for childhood obesity surveillance. These concerns arise from the reality that an EHR is a convenience sample of people seeking health care for various reasons, including both sick visits and visits for preventive services (i.e., well-child visits, immunizations). Therefore, the captured data are a biased sample of clinic-goers and may be systematically missing the heights and weights necessary to calculate BMI. Each health system likely carries unique biases, limiting comparability. The authors hypothesize that standard statistical methods may address these concerns by: (1) weighting for missing data; and (2) adjusting the population distribution to be a nationally representative sample. In this study, this two-step weighting procedure was applied to childhood obesity rates derived from EHR data of a multisite health system. Weighted estimates were then compared to the National Health and Nutrition Examination Survey (NHANES).4 Through describing methods for transformation as well as data limitations, this study contributes to the growing knowledge of how EHR-derived data may be feasibly used for childhood obesity surveillance in public health efforts. The hope is that disease estimates derived from different healthcare systems may one day be compared between cities, counties, states, and even nationally.

Methods

The University of Wisconsin Public Health Information Exchange (PHINEX) database25 contains de-identified EHR data from a multicenter healthcare system located primarily in south central Wisconsin.26 The PHINEX database comprises the de-identified records of patient with a documented primary care encounter (family medicine, pediatrics, and internal medicine) since EHR implementation in 2007. All PHINEX data were derived from the Epic EHR Clarity Database (EpicCare Electronic Medical Record, Epic Systems Corp., Verona WI). During de-identification, personal identifiers were removed, birth date was centralized to month of birth, and address was linked to census block group. The study was reviewed and approved by the University of Wisconsin–Madison School of Medicine and Public Health IRB (Research Protocol M-2009-1273, titled Family Medicine/Public Health Data Exchange).

Subject Selection

A 2-year time period (2011–2012) was selected to align with NHANES.4 Patients within the PHINEX database were selected if they were aged between 2 and 19 years in 2011–2012 (N=108,171) with complete data on race/ethnicity (n=102,154) and census block group (n=93,130).

Measures

All individual-level covariates were derived from the EHR and were as follows: sex, month and year of birth, race/ethnicity, health service payer (i.e., insurance type), and census block group of residence as of 2012. Age was calculated using month and year of birth in two definitions: age as of 2012 was used for the weighting procedure, and age as of latest BMI measurement was used to estimate obesity prevalence. Age was further categorized as 2–5 years (i.e., preschool-aged children), 6–11 years (i.e., school-aged children), and 12–19 years (i.e., adolescents) to be consistent with 2011–2012 NHANES categorizations.4 Race/ethnicity as recorded in the EHR was defined as non-Hispanic white, non-Hispanic black, Hispanic, and non-Hispanic other. Health service payer was categorized as commercial, Medicaid, or no insurance.

The two community-level covariates were urbanicity and economic hardship index (EHI). Both were calculated at the census block group level and linked to patients’ location of residence. EHI was based on 2007–2011 U.S. Census American Community Survey 5-year estimates. Urbanicity was defined as urban, suburban, or rural based on the 11 Urbanization Summary Groups of Esri’s Tapestry,27 which were derived from data on population density, city size, location in relationship to metropolitan area, and economic/social centrality. Urbanicity was categorized as follows: 1–4 was defined as urban, 5–8 as suburban, and 9–11 as rural.27 EHI28,29 is a measure of community SES calculated from six variables: crowded housing (percentage of housing units with more than one person/room), poverty (percentage of households below the federal poverty level), unemployment (percentage of people aged >16 years who are unemployed), education (percentage of people aged>25 years without a high school education), dependency (percentage of the population aged <18 or >64 years), and per capita income. EHI was calculated using the methods of Nathan and Adams,30 normalizing for all Wisconsin census block groups. Scores ranged from 0 to 100, with 100 indicating the highest hardship. Continuous EHI scores were used in the analysis.

BMI was calculated from height and weight measurements (weight/height2 [kg/m2]) that were collected on the same date. There was no imputation for missing values. If a patient had multiple BMI values available in 2011–2012, the latest one was used. If a patient had no BMI available, patients were coded as missing BMI. All BMI values were plotted on age- and sex-specific CDC 2000 growth charts according to recommendations.31 Obesity was defined as having a BMI greater than or equal to the 95th percentile.32

Statistical Analysis

The statistical analysis began with the two-step weighting procedure, producing a weighted obesity rate for PHINEX, which could then be compared to crude rates and NHANES. The weighting procedure was as follows: Step 1 was inverse probability weighting to account for PHINEX patients that were missing BMI data. Weighting was based on the covariates of age category (as of 2012), race/ethnicity, health service payer, urbanicity, and EHI. A logistic regression was conducted for BMI missing versus BMI not missing, and the final model was selected via stepwise selection, with the inclusion of only significant variables (p<0.05). More specifically, for any individual randomly chosen from PHINEX with covariate value X, the probability that such an individual will have BMI not missing is p(X). Any individual with covariate value X and with BMI not missing can represent 1/p(X) individuals from PHINEX. Thus, those with complete data may represent similar patients with missing data.33 Step 2 adjusted the population distribution to a nationally representative sample using post-stratification correction.34 Adjustment was made using 2012 national census data.35 The inverse probability weight (Step 1) and post-stratification correction (Step 2) were multiplied to create a final weight for each patient.

The weighted PHINEX obesity rates and 95% CIs were reported by sex, age (as of BMI measurement), and race/ethnicity. Rates were compared to NHANES and crude PHINEX obesity rates. Subject selection, variable calculations, and all statistical analyses were performed in 2014 using SAS, version 9.3 (SAS Institute Inc., Cary NC). Alpha was set a priori at 0.05.

Results

Overall, 93,130 of PHINEX patients aged 2–19 years had complete data on covariates during the 2011–2012 time period. Of those patients, a total of 34,852 (37.4%) were missing a valid BMI, leaving 58,278 (62.6%) patients in the final sample from whom weighted obesity rates were calculated. The final sample was 48.8% female. Most patients were non-Hispanic white (81.1%); fewer were non-Hispanic black (7.2%), Hispanic (6.6%), and non-Hispanic other (5.1%). The logistic regression model demonstrated that having a valid BMI on record in 2011–2012 was significantly and independently associated (p<0.05) with the covariates of sex, race/ethnicity, payer, urbanicity, and EHI. Age was not significantly associated and was removed from the final model. Based on the final model, patients were significantly more likely to have a valid BMI in 2011–2012 if they were female compared to male, non-Hispanic black compared to non-Hispanic white, if they had insurance (commercial, Medicaid) compared to not having insurance, if they lived in a suburban versus urban census block group, and if they lived in an urban versus rural census block group (all p<0.001). Children living in areas with higher economic hardship were less likely to have a calculable BMI on record in 2011–2012 (p<0.001), independent of sex, race/ethnicity, insurance status, and urbanicity.

The weighted obesity prevalence estimate for children in PHINEX was 16.1% (95% CI=15.8, 16.4). Significant differences existed by sex, age category, and race/ethnicity (Figure 1). Female patients had lower obesity rates (15.2%, 95% CI=14.7, 15.6) compared to male patients (17.1%, 95% CI=16.7, 17.5). Each age category was significantly different (p<0.05), with younger ages having lower rates of obesity: 10.8%, 16.9%, and 18.1% for preschool, school-aged, and adolescent groups, respectively. Obesity rates in non-Hispanic white (11.8%, 95% CI=11.5, 12.1) and non-Hispanic other (12.2%, 95% CI=11.0, 13.3) patients were significantly lower (p<0.05) compared to non-Hispanic black (22.0%, 95% CI=20.7, 23.2) and Hispanic patients (23.8%, 95% CI=22.4, 25.1). Rates for non-Hispanic other patients were included in the overall rate, but, for simplicity, were not included in figures. Among preschoolers, significant differences (p<0.05) were measurable between non-Hispanic white patients (7.5%, 95% CI=6.8, 8.2) and non-Hispanic black (10.8%, 95% CI=8.3, 13.2) and Hispanic (14.9%, 95% CI=12.1, 17.8) patients (Figure 1). Non-Hispanic other preschoolers (9.7%, 95%CI=7.8, 11.5) had lower rates than Hispanic preschoolers. These disparities remained significant across all age categories, and there was a trend for disparities to widen with age in female non-Hispanic blacks.

Figure 1.

Figure 1

Figure 1

Weighted obesity rates (%)a,b (ages 2–19 years) by race and ethnicity, 2011–12.c

aObesity defined as BMI ≥ 95th percentile according to sex- and age-specific CDC 2000 growth charts.31

bRate by %. Error bars show 95% confidence interval.

cNon-Hispanic Other was analyzed and included in overall rates, but not shown for simplicity.

NHANES obesity rates were compared to crude PHINEX and weighted PHINEX obesity rates (Figure 2). Crude PHINEX rates (12.6%, 95% CI=12.3, 12.8) were significantly lower (p<0.05) compared to weighted PHINEX rates (16.1%, 95% CI=15.8, 16.4) and NHANES obesity rates (16.9%, 95% CI=14.9, 19.2).4 NHANES CIs were broader when compared to those of PHINEX, particularly for non-Hispanic whites. When rates were reported by race/ethnicity, crude PHINEX and weighted PHINEX obesity rates fell within NHANES CIs. In every racial/ethnic subgroup, crude PHINEX obesity rates tended to be lower compared to weighted PHINEX obesity rates, a trend that was significant (p<0.05) only in non-Hispanic whites.

Figure 2.

Figure 2

Obesity Rates (%)a for NHANES, weighted PHINEX rates, and crude PHINEX rates, 2011–12b.

aObesity defined as BMI ≥ 95th percentile according to sex- and age-specific CDC 2000 growth charts.31

bNon-Hispanic other was analyzed and included in overall rates, but not shown graphically.

NHB, non-Hispanic black; NHW, non-Hispanic white; PHINEX, Public Health Information Exchange; NHANES, National Health and Nutrition Examination Survey

Discussion

The weighted obesity rate within PHINEX was 16.1%, which was comparable to the 16.9% reported in NHANES for 2011–2012.4 Estimates derived from the EHR had sufficient precision to detect racial/ethnic disparities from preschool onward. Data were available across all ages during this 2-year time interval. Missing data were notable in uninsured patients and in those living rurally or in areas of high economic hardship. BMI data are becoming more readily available within the EHR,3642 making it increasingly feasible for EHR data to become an agile and cost-effective5 public health tool for local childhood obesity prevention efforts.8 EHR data are collected directly from the community during clinic visits and, unlike other data sets, do not rely upon imputed estimates from non-indigenous samples.43,44 Imputed data would not be responsive to local childhood obesity prevention efforts. In contrast, EHR data have the potential to inform, as well as be impacted, by local efforts. Because EHR data are a convenience sample, strategies must be explored for mitigating the biases of systematically missing data and non-representative samples. This requires rigorous statistical scrutiny before EHR data can be interpreted. This may include weighting procedures such as those outlined in this study.

EHR data are a potential emerging tool for public health practitioners, but carry intrinsic methodologic concerns. These concerns include missing data and sample bias.2224 In this study, both issues were addressed via a two-step weighting procedure. Using this procedure, the authors compared rates of childhood obesity derived from PHINEX, a regional Wisconsin EHR database, to NHANES, a nationally representative sample.4 Both individual- and community-level variables were associated with missing data and, after correcting for missing BMI data, the population distribution was adjusted to be a nationally representative sample. Thus, the procedure enabled a comparison of disease burden between PHINEX and NHANES.

The first step of the weighting procedure examined the probability of having a BMI recorded in the EHR based on individual- and community-level covariates. Systematic biases in the availability of BMI data were observed. Specifically, there was bias by sex, race/ethnicity, and payer, but not by age. Previous studies have reported variations in missing BMI data by individual-level factors, such as age,5 race/ethnicity,8 and payer.8 Wen and colleagues8 examined longitudinal BMI data in children at a multisite clinic in Massachusetts. During the study, the health system began accepting Medicaid. There was a concordant shift in racial and payer trends within the EHR data. Other studies have noted age biases. Bailey et al.5 reported that patients had fewer BMI measurements recorded in the adolescent years when BMI is tracked continuously. Our study required only one BMI measurement within a 2-year window. All ages appeared to achieve this goal equally. Previous studies have reported variable childhood BMI data quality for individual-level factors, but, to the authors’ knowledge, no previous study has reported the role of community-level factors such as urbanicity and EHI. Adult studies measuring the likelihood of receiving preventive services have demonstrated lower rates of service provision for those living with high rates of poverty45 and in rural areas,46 which is consistent with the present findings in this pediatric population.

Obesity rates among non-Hispanic white patients significantly increased (p<0.05) after weighting and adjustment for both individual- and community-level variables. This increased rate between crude and weighted PHINEX obesity estimates was believed to be due to three contributing factors: (1) there was a high number of non-Hispanic white patients within the un-weighted PHINEX sample, allowing sufficient power to detect differences; (2) obesity rates in non-Hispanic white children significantly increase with increasing levels of hardship47; and (3) higher levels of hardship were associated with missing BMI in this study. Therefore, the bias of crude EHR data may be to underestimate obesity prevalence rates, because populations with higher rates also tend to have missing BMI data. Weighting procedures utilizing individual- and community-level factors may partially correct for these biases, providing that members of subgroups are not fully excluded from the EHR (e.g., a health system does not accept Medicaid). The quality of BMI data is anticipated to improve with universal insurance coverage and as health systems implement successful quality assurance and outreach efforts.3642

The weighted obesity rate for PHINEX patients was 16.1% and was comparable to rates reported in NHANES (16.9%). Racial/ethnic trends were similar to NHANES, with lower rates reported in non-Hispanic white compared to non-Hispanic black and Hispanic children and adolescents. NHANES also reported disparities emerging during the preschool years with significantly higher rates in Hispanic compared to non-Hispanic white preschoolers.4 This study found similar results and additionally saw significantly higher rates in non-Hispanic black compared to non-Hispanic white preschoolers. Previous studies have shown that obesity disparities are first measurable in Hispanic populations as early as kindergarten, with disparities emerging in grade school among non-Hispanic black children.48 PHINEX data demonstrate emerging disparities much sooner in south central Wisconsin. Racial and ethnic differences in preschool obesity rates have been attributed to sedentary behavior, sleep, early infant feeding practices, maternal BMI, and SES.49 When controlling for these variables, racial/ethnic disparities reportedly diminish.49 This highlights the need for identification and intervention for at-risk Wisconsin children in the prenatal or newborn period, with intervention continuing throughout the preschool and school years.50

A notable strength of the study was its sample size, which had the power to detect significant differences between subgroups. In contrast, NHANES may be underpowered to detect such differences.4 The IOM recently advocated using NHANES to track obesity trends,51 and this study may suggest that EHR is a useful local adjuvant. Importantly, another strength of the study was that many subpopulations were represented in PHINEX,26 allowing for the weighting and adjustment of the data into a representative sample using both individual- and community-level covariates to weight for missing data. Other BMI surveillance systems may be limited by insurance type,5 age,52 or SES,53 and such intrinsic voids in the patient population would limit the number of strata that could be used during inverse probability weighting.

A potential limitation of the study is that there was no a priori standardization of measurement across sites owing to the retrospective nature of the analysis. There was an assumption that trained health professionals took the measurements, and, although errors in measurement may have occurred,24 studies have demonstrated that measurement biases by trained staff may be minimal.54 Another limitation is that other variables such as Tanner staging were not included. Additionally, there may be differences in obesity status between those missing BMI and those not missing BMI, even within population subgroups (i.e., white, rural populations). If this were the case, the weighting procedure may be overly simplified. Further studies are needed to test whether this bias exists.

Future directions include using the PHINEX data set to better understand how racial/ethnic factors interact with community-level covariates. The PHINEX data set is also capable of spatial and longitudinal analysis. Next steps include identifying the communities where childhood weight gain or loss occurs after controlling for other variables.55 This longitudinal and spatial approach could have implications for urban planning and community health needs assessments within Wisconsin. In sum, using statistically weighted and adjusted EHR data may provide a cost-effective solution for precise, local data that are actionable at the community level and comparable at a national scale.

Acknowledgments

The authors wish to express gratitude to Melissa Behrens, MS, and to Brian Jun. Hanrahan and Tandias were supported by Mission Aligned Management Allocation Funds from the University of Wisconsin School of Medicine and Public Health. Tomayko was funded via NIH grant No. 5T32DK007665-21. Zhao was supported by NIH grant No. UL1RR025011/DHHS/NCRR.

Footnotes

No other financial disclosures were reported by the authors of this paper.

References

RESOURCES