Skip to main content
Lippincott Open Access logoLink to Lippincott Open Access
. 2018 Jan 12;24(6):E6–E14. doi: 10.1097/PHH.0000000000000751

Monitoring Depression Rates in an Urban Community: Use of Electronic Health Records

Arthur J Davidson 1,, Stanley Xu 1, Carlos Irwin A Oronce 1, M Josh Durfee 1, Emily V McCormick 1, John F Steiner 1, Edward Havranek 1, Arne Beck 1
PMCID: PMC6170150  PMID: 29334514

Abstract

Objectives:

Depression is the most common mental health disorder and mediates outcomes for many chronic diseases. Ability to accurately identify and monitor this condition, at the local level, is often limited to estimates from national surveys. This study sought to compare and validate electronic health record (EHR)-based depression surveillance with multiple data sources for more granular demographic subgroup and subcounty measurements.

Design/Setting:

A survey compared data sources for the ability to provide subcounty (eg, census tract [CT]) depression prevalence estimates. Using 2011-2012 EHR data from 2 large health care providers, and American Community Survey data, depression rates were estimated by CT for Denver County, Colorado. Sociodemographic and geographic (residence) attributes were analyzed and described. Spatial analysis assessed for clusters of higher or lower depression prevalence.

Main Outcome Measure(s):

Depression prevalence estimates by CT.

Results:

National and local survey-based depression prevalence estimates ranged from 7% to 17% but were limited to county level. Electronic health record data provided subcounty depression prevalence estimates by sociodemographic and geographic groups (CT range: 5%-20%). Overall depression prevalence was 13%; rates were higher for women (16% vs men 9%), whites (16%), and increased with age and homeless patients (18%). Areas of higher and lower EHR-based, depression prevalence were identified.

Conclusions:

Electronic health record–based depression prevalence varied by CT, gender, race/ethnicity, age, and living status. Electronic health record–based surveillance complements traditional methods with greater timeliness and granularity. Validation through subcounty-level qualitative or survey approaches should assess accuracy and address concerns about EHR selection bias. Public health agencies should consider the opportunity and evaluate EHR system data as a surveillance tool to estimate subcounty chronic disease prevalence.

Keywords: depression, electronic health records, monitoring, urban


Depression is the most common of the mental disorders, with a lifetime prevalence of nearly 21% among adults in the United States.1 The occurrence, treatment challenges, and progression2 of many chronic diseases (eg, diabetes, cancer, cardiovascular disease, asthma, and obesity) are worsened by concomitant depression, as are many health risk behaviors (eg, physical inactivity, smoking, excessive drinking, and insufficient sleep). Estimates suggest that depression will be the second leading cause of disability worldwide by 2020, trailing only ischemic heart disease.3 Stigma associated with mental illness4 often obscures our ability to identify this condition accurately as some patients may be hesitant to report symptoms during an encounter or even seek help.5 Personal and cultural overtones have delayed health-seeking behavior, reducing reach, quality, and cost-effectiveness of depression care and opportunity to achieve better outcomes for associated health conditions.6

Community members consistently identify depression and other mental health disorders as high priorities for public health interventions.7 Disparities by demographic group have been observed in national studies.8 In response, local public health agencies seek effective means to identify and address mental health disorder (especially depression) disparities in their jurisdictions. Targeted intervention efforts may be broadly implemented at a county level but often smaller geographic areas (eg, communities or neighborhoods)9 are the real focus. These geographic regions often represent shared cultures and economic perspectives, which may permit more targeted and tailored intervention messages.10 However, little data exist to accurately estimate subcounty depression prevalence rates. As many public health agencies incorporate mental health initiatives in their community health improvement plans, they need more granular estimates of the prevalence of mental health disorders to frame the problem and effectively engage community partners around issues for their region. Accurate information would also permit local public health agencies to evaluate their effectiveness of targeted, evidence-based (both clinic-11,12 and community-based13) mental health interventions for community residents. While national, state, or local depression prevalence rates may be estimated from federally sponsored surveys,14,15 these rates are rarely current or granular enough to support targeted community-based interventions within a jurisdiction.

Electronic health records (EHR) have demonstrated utility in providing surveillance data on issues of public health importance16 (ie, adverse drug and device events) including specific diseases or conditions1720 (ie, diabetes mellitus and hepatitis B). Some data-sharing technologies16,21 may enhance the ability of EHR-derived data to be harvested across health care providers to generate information that complements surveys. With EHR data and increased sample size, smaller demographic subgroups and geographic units are better represented within a jurisdiction, based on a patient's characteristics and residence.

This study was undertaken to better understand novel EHR-based surveillance opportunities and their capacity to complement existing survey data for depression. Our specific goals were to (1) compare the attributes (ie, diagnostic method, specificity, representativeness, and geographic granularity) of EHR-based depression surveillance versus previously published reports for a single urban community and (2) assess subcounty variation in EHR-generated depression prevalence estimates in an urban area. We sought to understand how a complementary surveillance source might inform a community seeking methods to address a common disease such as depression.

Methods

Setting

The City and County of Denver, Colorado's state capital, has a population of about 650 000 with a large Hispanic/Latino population (24%) and smaller African American population (10%).22 Kaiser Permanente Colorado provides care to more than 600 000 Coloradoans (including more than 100 000 in Denver County), and Denver Health (DH) cares for more than 150 000 Denver residents. Collectively, these 2 integrated delivery systems care for nearly 40% of Denver County's population in distinctly different population subgroups. Kaiser Permanente Colorado offers services largely to employed individuals and their families, while DH, a safety-net organization, serves more economically challenged individuals and families.

Inventory of data sources and data source evaluation

We first conducted a PubMed search for published depression estimates to identify commonly used national and local sources of data that might provide information on depression prevalence in Denver County; results from these articles with a prevalence estimate were compared with prevalence estimates from KPCO and DH. The inventory yielded prevalence data from the Behavioral Risk Factor Surveillance Survey (BRFSS),14 National Comorbidity Survey,23 the National Survey on Drug Use and Health,24 and the National Health and Nutrition Examination Survey,15 as well as from 8 managed care organizations across the United States participating in the Mental Health Research Network.25 Those data sources varied by collection method (survey vs administrative data), cohort selection schema (random vs convenience sampling), population included (community-dwelling individuals vs individuals receiving health or mental health care services), measurement method (eg, structured interview questions, symptom severity questions, or diagnosis [International Classification of Diseases, Ninth Revision (ICD-9)] codes), cohort size, time frame, and geographic location. For each data source publication, a review abstracted the sample size, prevalence rate, timeliness (eg, most recent or survey frequency), granularity or geographic location (eg, lowest geo-spatial level of analysis for reporting), and method (eg, screening, related-questions, or diagnosis).

Electronic health record data

Both KPCO and DH have EHR systems with access to diagnostic data recorded by clinicians after each encounter. As part of a community initiative, the Colorado Health Observation Regional Data Service,26 both institutions have stored their EHR data in a common data model, the Virtual Data Warehouse originally developed by the Health Care Systems Research Network.27 This is a data model used by many health care institutions across the country that participate in the PCORnet initiative.28 The regional service uses a query technology21 implemented in several large federal initiatives,16,29 which has been used at the local level as well.17,19,30 The public health surveillance use of CHORDS was reviewed and deemed nonhuman subjects research by the Colorado Multiple Institutional Review Board.

Data analysis

We restricted the analysis to adults 18 years of age or older who received care in either system between January 1, 2011, and December 31, 2012. We retrieved demographic data (ie, age, gender, and residential address) from EHR at DH and KPCO, along with diagnostic codes (ICD-9) for all outpatient visits. Depression was a common diagnosis in both systems and is recorded by a clinician based on a clinical encounter.31 Any adult with at least 1 depression diagnostic code (ie, mood disorder = ICD-9: 296.x, depressive-type psychosis = 298.0, adjustment reaction = 309.x, major depressive disorder = 311) was considered to have a diagnosis of depression. To be included in this geo-spatial analysis, a geo-locatable residence address needed to be established, based on the address declared at the last visit during the time interval. Thus, all homeless individuals were excluded from mapping visualizations.

Using 5-year (2008-2012) American Community Survey denominator estimates, we first calculated the proportion of residents in each census tract who met our diagnostic criterion for depression, based on the combined total patient population data from 2 health care data sources, divided by the American Community Survey estimated base population. Age-gender pyramids were generated to compare the clinical population with the general population. An age- and gender-adjusted depression prevalence rate was also calculated for the county as a whole. An unadjusted depression prevalence rate was calculated for each census tract in Denver County. Prevalence and standard error of the mean (SEM) were calculated for the jurisdiction and each subgroup. Age and gender adjustment were then performed to more closely approximate the general population distribution.30 A finite population correction32 was performed, given the nonrandomness of selection into the clinical population (eg, having a means to pay for care and care-seeking behavior). Once calculated and adjusted, the depression prevalence rates by census tract were represented geospatially using GeoDa software.

Spatial analysis

Summarized CT-level data were imported into GeoDa (Version 0.925) for a spatial analysis of depression prevalence. Box plots, box maps (Hinge = 1.5), and histograms identified lower and upper outliers' values and location as well as statistical measurements. An adjustment (ie, smoothing and weighting) of upper and lower outlying rates was used to reduce rate variability associated with population differences. To minimize variance instability of depression prevalence, we used spatial rate smoothing methods combined with Queen Contiguity spatial weighting.33,34 Rate estimations varied on the basis of whether a CT (1) shared a common border or common vertices with, or (2) had greater proximity to another CT. Weighting and smoothing methods were combined to optimally produce the fewest outliers and most dense neighborhood clusters; local autocorrelation was determined using the Local Indicators of Spatial Association.35

Census tracts were scored for weighted depression prevalence rates using a simple scoring system developed to identify clusters. High-high was defined as a high-value depression prevalence CT neighboring on at least 1 other high-value depression prevalence CT. The inverse, or low-low, indicates a low-value depression prevalence CT near another low-value prevalence CT. Each may indicate potential areas of interest.

Results

Our initial inventory identified 6 sources of information about estimated depression prevalence rates that produced 9 different estimates based on defined population, time frame, and geographic location. Results are summarized in ascending order in Table 1. Reflecting the diversity of methods used to assess depression, the overall rate varied from 7% to nearly 18%. The next to last line of the table used data calculated from the combined DH and KPCO EHR systems for patients who were residents within Denver County. The prevalence estimate of 12.7%, from DH and KPCO EHR data, was in the middle of the range generated by these data sources.

TABLE 1. Comparison of Information Sources: Data Source Attributes and Estimated Depression Prevalence Rate Among Adults Older Than 18 Years, United States, 2000-2013.

Data Source Attributes Prevalence Estimate
Method Specificity N Time Frame Geographic Location
National Survey on Drug Use and Health (NSDUH)24,a Survey Questions 67 500 2013 National 6.7
National Comorbidity Survey (NCS)23 Survey Questions 9282 2000-2003 National 6.8
National Health and Nutrition Examination Survey (NHANES)15 Survey PHQ-9 ≥ 10 10 279 2005–2008 National 6.8
Behavioral Risk Factor Surveillance System (BRFSS)14,b Survey PHQ-8 ≥ 10 5093 2008 Colorado 7.0
Mental Health Research Network (MHRN)—8 sites nationally25 EHR ICD-9 1 723 550 2011 National 8.0
Denver Heath/Kaiser Permanente Colorado (current study) EHR ICD-9 21 961 2011-2012 Denver census tracts 12.7
BRFSS36,b survey question 5743 2011 Denver 17.9

Abbreviations: EHR, electronic health record; ICD-9, International Classification of Diseases, Ninth Revision; PHQ-9, Patient Health Questionnaire.

aNSDUH: multiple questions defined Major Depressive Episode consistent with the 4th edition of the Diagnostic and Statistical Manual of Mental Disorders, which specifies “a period of at least 2 weeks when a person experienced a depressed mood or loss of interest or pleasure in daily activities and had a majority of specified depression symptoms.”

bBRFSS question: “Were you ever told by a health professional that they have a depressive disorder?”

When DH and KPCO patients were pooled, 36% of the adult residents of Denver County were represented in the data (Table 2). Population coverage rates varied between 11% and 45% across census tracts. Denver resident coverage varied by demographic group with higher coverage among Hispanic (34%), African American (38%), and mixed race or unknown (55%) than for whites (19%) or Asian/Pacific Islanders (19%). Age-gender pyramids for Denver County and the EHR-observed subpopulation were aggregated and compared in Figure 1. In the EHR-based population, the groups between 20 and 49 years of age were underrepresented compared with Denver County as a whole. The proportion of men who received care in these institutions was lower than their proportion in the city as a whole.

TABLE 2. Unadjusted Rates of Depressive Disordersa by Demographic Characteristics, Denver, Colorado, 2011-2012.

Demographic Characteristic Census Populationb Clinical Observations Coverage Rate Depression Prevalence (%) (Unadjusted) SE
Overall 474 106 169 906 0.36 12.7 0.06
Gender
Male 236 160 74 731 0.32 8.8 0.09
Female 237 946 95 175 0.40 15.7 0.09
Race/Ethnicity
White 361 526 69 608 0.19 16.3 0.13
Black 53 520 20 238 0.38 10.5 0.17
Asian/Pacific Islander 21 476 4035 0.19 6.4 0.35
American Indian 10 517 820 0.08 14.2 1.17
Other/Multiple 43 360 23 662 0.55 6.0 0.10
Hispanic 150 630 51 178 0.34 12.1 0.12
Age, y
18-24 59 023 23 697 0.40 6.8 0.13
25-34 124 883 42 937 0.34 9.0 0.11
35-44 90 689 31 229 0.34 12.2 0.15
45-54 72 669 27 947 0.38 15.5 0.17
55-64 62 260 24 054 0.39 17.1 0.19
65+ 63 061 12 419 0.20 17.6 0.31
Living situation
Residence 474 106 163 139 036 12.5 0.08c
Homeless Unknown 6767 ... 17.9 0.47c

aFor patients with at least 1 electronic health record diagnosis.

bCensus population from the American Community Survey, 2012.

cAdjusted by the finite population correction.

FIGURE 1.

FIGURE 1

Comparison of Age and Gender Distribution for Adults Older Than 18 Years: Electronic Health Record (EHR) Data Sources (2011-2012) and American Community Survey (2012) Population Estimates, Denver, Coloradoa

aPercent distribution of estimated population: American Community Survey versus patients represented through EHR.

Among 21 578 patients with a diagnosis of depression, 55% had at least 2 visits with the diagnosis while 45% had just 1 visit with a concordant diagnosis. The unadjusted prevalence of depression was 12.7%. Rates of depression differed by gender, race/ethnicity, and age (Table 2). Women had a higher rate than men (15.7% vs 8.8%, respectively). Whites had the highest rate (16.3%) and Asian/Pacific Islanders had the lowest rate (6.4%). Across the life span, increasing age was associated with higher rates of depression. Individuals aged 18 to 24 years had the lowest rates (6.8%) while those older than 75 years had the highest rates (20.7%). The average number of cases in a census tract was 143 (SEM ± 5), while the average number of patients per census tract was 1150 (SEM ± 142). The age-gender–adjusted rate for depression prevalence rate for Denver County was 12.3%, with census tract-specific rates ranging from 5% to 20% across census tracts. While it is impossible to estimate coverage for the base population who are homeless, homeless patients had the highest depression rate of any demographic group (17.9%). Depression prevalence rate estimates by census tract are presented in Figure 2a.

FIGURE 2.

FIGURE 2

Geographic Variation (a) and Clustering (b) of Depression Prevalence Rates by CT for Denver County, Colorado, 2011-2012a

Abbreviation: CT, census tract.

aDepression prevalence—at least 1 diagnostic code from an electronic health record (numerator) and 5-year (2007-2012) American Community Survey population estimates (denominator).

Local autocorrelation spatial rate smoothing with Queen Contiguity weighting under the randomization test had a pseudo P value of ≤ .001. The cluster map in Figure 2b shows 2 predominant positive (high-high) areas in the southeast and southwest areas of the county and 2 predominantly negative (low-low) areas across the northern board of the county. Autocorrelation demonstrated clusters with 13 tracts in dark red (high prevalence) and 17 in dark blue (with low prevalence).

Discussion

Multiple published data sources have estimated depression prevalence at various jurisdictional levels, but none was sufficiently granular to offer subcounty depression prevalence estimates for Denver County. Electronic health record–based depression prevalence estimates permitted more granular depression prevalence monitoring. National surveys permit national- and state-level estimates, but local public health agencies seeking disparity measures would find it difficult to estimate subcounty (eg, zip code, neighborhood, or census tract) depression prevalence from these data. Prevalence of depression varied greatly across census tracts within the same county; the cause for variation may be multifactorial but may represent underdiagnosis for some groups or geographic regions. What community-based interventions might be applied? These alternative surveillance methods, with capacity for more granular estimates, may have value as assessment tools for public health interventions.

With the exception of the MHRN study, all compared data sources (Table 1) were survey-based. In this study, the EHR-derived prevalence estimate for depression across 2 systems was 12.7%, roughly in the midrange between the low estimate of 6.7% obtained from the National Survey on Drug Use and Health and the high estimate of 17.9% derived from the Colorado BRFSS for Denver County.36 Electronic health record estimates found higher rates among women than men (15.7% vs 8.8%, respectively), but BRFSS37 data showed less difference (7.8 vs 6.2, respectively). The BRFSS-based rates of depression varied by age. Younger individuals had higher rates of depression compared with EHR-based estimates where older individuals had higher rates. Differences in questionnaire design and method of administration may lead to varying levels of certainty for case definitions by survey type. The National Survey on Drug Use and Health defined Major Depressive Episode consistent with the fourth edition of the Diagnostic and Statistical Manual of Mental Disorders, which specifies “a period of at least 2 weeks when a person experienced a depressed mood or loss of interest or pleasure in daily activities and had a majority of specified depression symptoms.” For BRFSS, the question was: “[Were you] ever told you have a depressive disorder (including depression, major depression, dysthymia, or minor depression)?” Results could vary dramatically on the basis of question or method, as compared with EHR documentation by a clinician during the course of care. Methods for clinical documentation and assessment are fairly similar across institutions and time; thus, EHR data offer a complementary and consistent assessment tool with ease of repeated measures for populations over time.

The EHR data from 2 systems identified significant variation across Denver's neighborhoods and census tracts. Previous analyses have shown relatively stable estimates of depression across the 2 health care systems.31 Distribution of depression prevalence rates across census tracts permits aggregation to larger geographic units that are particularly meaningful to specific audiences, such as neighborhood residents or city council members, for targeted engagement with community-based organizations or city government. While challenging to develop, emerging query solutions19,38 for aggregated data across health care providers are initial tools for a learning health system28 that leverages EHR data. These emerging more granular sources of information have promise to fill localized measure gaps in communities across the country, while complementing national and regional survey measures.

Several limitations exist in this approach. Comparison of prevalence estimates was predicated on varying definitions of depression from the various data sources. Differing methods for establishing the outcome (eg, questionnaire, survey, or clinical observation) make comparisons problematic. Perhaps more importantly, however, is to understand how complementary definitions provide different perspectives. Behavioral Risk Factor Surveillance Survey is focused on lifetime prevalence while the period of time used to capture depression diagnoses via EHR for this study was just 2 years. Estimates may not be comparable but point to the challenge for public health agencies trying to assess the problem, define a public message and scope, or target a response. No clear gold standard exists with which to compare these measures. While these inherent challenges emerge from using new tools, consistent repeated measures using this 1 tool may help monitor and evaluate community-based interventions.

Our study was unable to unduplicate patients who were seen in both systems over this 2-year period. Because we used deidentified data, these individuals would be double-counted. From prior local analyses, this number was estimated at 8.5% (A. J. Davidson, MD, MSPH, written communication, 2009). Although no national personal identifier exists to facilitate deduplication, a potential solution to this problem is to use the master patient index of a local health information exchange or a statewide initiative as currently funded by the Centers for Medicare & Medicaid Services39 during subsequent analyses. Efforts to use these approaches are ongoing in Denver County. This problem of duplicate counts may increase over longer observation periods as individuals change health insurance coverage or sites of care. Use of last address may result in misclassification. If a person does not update the address (which typically happens at each visit), cases may be assigned to the wrong census tract.

Another small but important limitation of a geographic analysis is the exclusion of homeless individuals. While the homeless had the highest rate of depression in our sample, there is no method to represent them on a map. Specific outreach programs to those communities will need to employ alternative methods that target these individuals through places of congregation and social service delivery.

In addition, diagnostic codes for depression may lack sensitivity and specificity when compared with “gold standard” interviews. We selected at least 1 depression diagnosis for inclusion but would have generated more conservative estimates by using 2 or more depression diagnoses. During the 2-year period, many patients may not have repeat diagnosis-coded visits, if they are stable and controlled on medications. Even if collecting survey information on larger numbers of individuals at the subcounty level were feasible, the wide range in survey-based prevalence estimates (Table 1) emphasizes the problems with using even traditional data sources to support assessment of local public health efforts to combat depression.

Implications for Policy & Practice

  • Depression and mental health issues are highly prevalent diagnoses and frequently associated with poor health outcomes for those patients. Public health agencies should promote effective and targeted community-based interventions to complement clinical mental health treatment efforts.

  • Knowing where to focus limited public health resources means that health departments have established subcounty depression prevalence measures. A sufficiently scaled, subcounty survey would be too costly.

  • In the absence of local level, population-based surveys, electronic health records (EHR) provide a novel way to estimate depression prevalence. This study observed differences in depression prevalence by region and demographic subgroups.

  • Presentation of these results permits more focused discussions during community and other stakeholder engagement. Cluster assessment identified both regions of higher and lower depression prevalence. Were lower rates truly areas of better mental health or areas where access or stigma interferes with clinical engagement? How might these observations be further understood or validated?

  • Public health agencies should consider the opportunity and evaluate EHR system data as a surveillance tool to estimate subcounty chronic disease prevalence. In the future, by harnessing routinely collected clinical information, depression monitoring may help gauge the effectiveness of any public health campaigns.

Similar to prior survey studies, this EHR-based study found depression prevalence varied by gender, race/ethnicity, age, and living status. Some of these findings were contrary to previous published reports. Were these differences more based on method of defining disease or the population being studied? Before adoption of this alternative EHR-based surveillance method, we must better understand how the opportunity for more granular depression prevalence estimates should be balanced with concerns about selection bias (eg, care seeking individuals) in the measured population. Widespread EHR40 adoption makes nonsurvey-based methods of depression prevalence monitoring more viable. Some researchers and communities have begun work to validate these EHR estimates through neighborhood-level surveys to better assess accuracy of EHR-based estimates.38,41 This process of validation will be important to allay concerns about selection bias for those accessing and represented in an EHR-based estimate.

Most local health departments have few data to address this highly prevalent problem. Some may see opportunity to use EHR-based estimates to better describe a continuum of depression screening, diagnosis, and treatment control.42 This should be an area for active research as clinicians and public health officials seek tools to better describe mental health service gaps, assess program effectiveness, and drive public health or clinical service planning and resource allocation. This first look at EHR-based depression prevalence suggests the need for additional research to better establish EHRs as a complementary surveillance resource for public health to guide prevention, outreach, and treatment efforts and how to interpret EHR-based findings considering other factors (eg, social determinants of health43). Working with clinicians, local public health agencies can encourage system-wide changes and feedback loops to ensure early identification and adequate treatment of a highly prevalent disease with high and serious associated morbidity and mortality.

Footnotes

This work was supported by AHRQ grant no. 5R24HS0122143. The authors are grateful to Moises Maravi for assistance with spatial analyses.

The authors declare no conflicts of interest.

References


Articles from Journal of Public Health Management and Practice are provided here courtesy of Wolters Kluwer Health

RESOURCES