Skip to main content
AMIA Annual Symposium Proceedings logoLink to AMIA Annual Symposium Proceedings
. 2020 Mar 4;2019:1081–1090.

Trends and characteristics of protected health information breaches in the United States

Md Mahbub Hossain 1, Y Alicia Hong 2
PMCID: PMC7153056  PMID: 32308905

Abstract

Objectives: To evaluate the data breaches incidents in the U.S. between 2010 and 2018, identify the characteristics of breaches involving more than a million records, and compare the changes before and after wide adoption of EHR in 2015. Materials and methods: Incidents of data breaches between 2010 to 2018 were retrieved from the Office of Civil Rights portal. Descriptive statistical analyses were performed to assess the trends and characteristics, and changes between states from 2015 to 2018 were assessed and mapped. Results: From 2010 to 2018, a total of 2,529 breaches affected 194.74 million individual records. Overall, 72.08% incidents involved healthcare providers; theft (32.94%) and hacking (22.7%) were major types of breaches. Large cases affecting more than a million records happened due to compromised internal structures and systems. After 2015, the magnitude of the data breaches has changed at varying levels in the U.S. states necessitating further research and actions.

Introduction

Use of electronic health records (EHR) offers several benefits including enhanced communication and decision- making in the points of care,3 improve adherence to advised preventive or therapeutic measures by providing timely reminders to the patients or caregivers,4 reduce unnecessary laboratory tests by coordinating previous and prospective diagnostic plans,5 improve quality of care,6 and manage payments and reimbursements.7 The adoption of EHR across different health systems followed diverse pathways.8 Critical challenges like technological complexities, legal issues, economic costs, concerns for safety and security of the records influenced the development of EHR.9 In the United States, two major acts were introduced to address the legal and security-related challenges regarding protected health information.10 Firstly, the Health Insurance Portability and Accountability Act (HIPAA) was passed in 1996 which emphasized physical, administrative, and technical safeguards.10 Secondly, the Health Information Technology for Economic and Clinical Health (HITECH) Act of 2009 enforced the implementation and utilization of EHR.11 This act mandated the Centers for Medicare and Medicaid Services (CMS) recipients to adopt EHR by the beginning of 2015 to continue receiving full reimbursements and preventing penalties due to inability to adopt EHR.12 In addition, the HITECH Act also stresses the importance of reporting incidents of data breaches involving protected health information.11 Since 2009, the number of providers adopting EHR increased alongside the number of data breach incidents.13 Such incidents erode the trust of individuals on digital health technologies and jeopardize the potential benefits of technological advancements.7,14 In addition, the direct economic burden of data breaches can be as high as $6.2 billion with other legal and administrative consequences.15,16

Several studies have reported the characteristics of those breaches,13,17 types and media locations of breach events,18 geographic locations and types of covered entities,19 and causes behind those incidents in different time periods.20 However, there is a lack of evidence that illustrates the changes in the trends and characteristics of the data breach for all the complete years since the HITECH Act was passed. Moreover, several incidents involved more than a million records per breach,13 but the detailed causes and characteristics of those large cases are not explored yet. Furthermore, the number of breach incidents or affected records might not provide the actual picture of data breach in the U.S. at the state level due to varying population size from state to state. It is critical to examine the population adjusted individual records affected due to data breaches after the mandatory adoption of EHR across the nation since 2015. The objectives of this study are 1) identify all the reported incidents of data breaches in the U.S. between 2010 and 2018, 2) list the key characteristics of incidents which involved more than a million individual records, and 3) describe the changes in population adjusted records affected in different U.S. states following the mandatory adoption of EHR in 2015.

Materials and methods

In this study, we used the data of all the reported breach incidents between 2010 to 2018 from the breach portal of the Office of Civil Rights at the U.S. Department of Health and Human Services.21 According to section 13402(e)(4) of the HITECH Act, the incidents of the data breaches involving protected health information of more than 500 individuals are reported to OCR.21 We retrieved these publicly available data which provide the name and type of covered entity, the presence of business associates, location, time, and the number of affected individual records for the incidents. Moreover, specific types of the breaches including hacking, improper disposal, theft, loss, and unknown types are reported. We grouped the entries with more than one type of breach as multiple types of breaches in this analysis. Further, the data informs the media location of the data breaches including a network server, paper, desktop, laptop computer, portable devices, email, electronic medical record, and others. In this category, we grouped the entries with more than one media location as multiple locations. Using the above-mentioned data in Microsoft Excel and Stata 15.0 (College Station, TX), we performed descriptive statistics and two-way measures of association using chi-squared tests.

Furthermore, to examine the changes in data breaches following the adoption of EHR across the nation, we calculated the total number of affected individual records at the state level for 2015 to 2018. We adjusted this annual breach data for each thousand population using the annual population estimates for each state derived from the U.S. Census Bureau.22 We used the population adjusted affected individual records per state to create maps for each year illustrating the changes in the data breach incidents from 2015 to 2018 using ArcGIS software. Moreover, the breach data includes web description of the cases which include qualitative information about the characteristics and consequences of the incidents. We used the description corresponding to the cases that involved more than a million individual records in assessing the characteristics of those large data breach incidents.

Results

Characteristics of data breaches from 2010 to 2018

Total 194.74 million individual records were affected due to 2,529 incidents of the data breaches over a period of nine years from 2010 to 2018 as shown in Table 1. The number of breaches increased every year compared to the previous years except in 2015.

Table 1:

A brief overview of the protected health data breaches in the United States (2010-18)

Characteristics Total 2010 2011 2012 2013 2014 2015 2016 2017 2018
Total number of individual records breached
(in million)
194.74 5.93 13.16 2.85 7.02 17.45 113.28 16.66 5.14 13.24
Number of total breaches 2,529 199 200 218 278 314 268 327 359 366
Types of the covered entity* (percentage)
Business associate 14.23 22.11 22 18.35 23.02 24.52 4.48 6.12 5.57 10.66
Health plan 13.40 10.55 9.5 10.55 6.83 13.06 22.76 15.6 14.21 14.48
Healthcare clearing house 0.16 0 0.5 0.46 0.72 0 0 0 0 0
Healthcare provider 72.08 67.34 68.0 70.64 69.42 61.78 72.76 78.29 80.22 74.59
Types of breach* (percentage)
Hacking/IT Incident 22.7 4.02 7.5 4.59 10.07 11.15 21.27 34.56 41.5 43.44
Improper disposal 2.93 4.02 3 3.21 4.32 2.55 2.24 2.14 3.06 2.46
Loss 6.17 7.04 7.5 7.8 7.19 6.37 9.33 4.89 4.46 3.55
Multiple types reported 31.87 8.54 20.5 22.48 29.14 36.94 38.06 39.76 35.38 39.07
Other 2.97 10.55 1.5 5.96 5.76 7.01 0 0 0 0
Theft 32.94 65.33 56.5 55.96 42.81 35.67 29.10 18.65 15.6 11.48
Unknown 0.4 0 3.5 0 0.72 0.32 0 0 0 0
Media location of breached data* (percentage)
Desktop computer 6.33 10.55 10 10.55 11.51 5.1 4.48 4.28 2.51 3.55
Email 13.33 2.01 0.5 4.13 7.55 10.51 12.69 12.23 23.96 29.78
Laptop 12.65 24.12 17 22.94 23.02 12.1 11.57 6.73 4.46 4.64
Multiple locations 21.43 21.61 28 19.27 17.27 21.66 24.25 22.02 23.4 17.49
Network server 16.13 9.05 8.5 9.17 11.87 17.83 14.93 24.46 23.4 16.39
Other 8.54 9.55 13 11.01 8.27 10.83 5.97 7.95 5.57 7.65
Paper/films 21.59 23.12 23 22.94 20.5 21.97 26.12 22.32 16.71 20.49

* p <0.001

the highest number of individual records were breached (n=113.28 million) in 2015 (Figure 1). Moreover, data breaches varied with the types of covered entities which was statistically significant. Most incidents (72.08%, n=1,832) happened in covered entities which were healthcare providers followed by the business associate (14.23%, n=360) and health plan (13.4%, n=339). Further, different types of breaches were significantly associated with the number of breaches each year. Overall, theft (32.94%) and hacking or IT incident (22.7%) were major types involved in breach incidents whereas a significant proportion (31.87%) of breaches involved more than one type. However, hacking or IT incidents were much lower (4.02%) in 2010 which had increased in 2018 (43.44%).

Figure 1:

Figure 1:

Number of incidents and individual records affected due to data breaches in the U.S. between 2010 and 2018.

Similarly, the number of multiple types of breaches were reported much lower in 2010 (8.54%) compared to 2018 (39.07%). In addition, the breaches happened through different media; papers or films (21.59%) and multiple media locations (21.43%) were reported alongside the breach incidents. Also, the rate of incidents happened through laptop had declined from 24.12% in 2010 to 4.64% in 2018. In contrast, incidents through network server had increased from 9.05% in 2010 to 16.39% in 2018. Similarly, breach incidents happened through email was only 2.01% in 2010 which increased up to 29.78% of annual incidents in 2018.

Breaches affecting more than a million records at a time

Among 194.74 million total breached records, only 23 incidents involved more than a million records each, resulting in a total 151.81 million records which are about 78% of all records breached between 2010 and 2018 (Table 2). More than 105 million breached records were associated with health plans representing 69% of all large incidents of breached data. At the state level (Appendix-1), Florida, New York, and Tennessee had a higher number of such large incidents (n=3 for each of these states).

Table 2:

Summary of breaches affecting more than a million records

Characteristics Type Number of breach events Number of records affected (in million) Percentage of all breaches that involved million records
Number of total breaches All types 23 151.81 100%
Covered entity Business associate 10 30.94 20%
Health plan 8 105.45 69%
Healthcare provider 5 15.42 10%
Type of breach Loss 2 5.95 4%
Multiple types 2 3.25 2%
Theft 5 12.47 8%
Hacking/IT incident 13 128.24 84%
Unknown 1 1.90 1%
Media location of breach Desktop 1 4.03 3%
Email 1 1.42 1%
Laptop 1 1.22 1%
Multiple location 4 11.22 7%
Network server 11 123.79 82%
Others 5 10.13 7%
Status of the cases in January 2019 Under investigation 3 5.32 4%
No description reported 11 46.44 31%
Investigation completed and description reported 9 98.05 65%

Most of the breaches (n=13 out of 23) occurred through hacking or IT incident resulting in 128.24 million breached individual records. In addition, most of the incidents (n=10) occurred using network servers whereas 4 incidents involved multiple media locations. Among all 23 breaches affecting millions, only 9 had web descriptions reported in the OCR data breach portal. Five out of 9 such incidents involved hacking or IT incidents as the type of breaches and network server as the medium of the breach. The largest incident affecting the highest number of individual records (n=78.8 million) occurred in 2015 through a series of cyber-attacks to the IT system of the covered entity. Other cases affecting millions of individuals involved unauthorized access (n=3.47 million), breaching SQL database (n=2.21 million), hacking computer records (n=4.5 million), misuse of information systems through hacking (n=1.1 million), displaced locked cabinet (n=1.05 million), theft of unencrypted tapes (n=1.7 million), stolen laptops (n=1.22 million), and multiple breaches due to unprotected systems and processes (n=4.02 million). Three cases reported settlements ranging from $2.3 to $16 million paid to OCR as penalties whereas all cases reported that the covered entities implemented varying protective measures to prevent data breach incidents.

Changes in health data breaches at the state level from 2015 to 2018

Population-adjusted rate of affected individual records illustrated varying severity of data breaches across the U.S. states (Figure 2). The overall adjusted rates show a decline in data breaches from 2015 to 2018 with different rates of individual records affected in different states in different times.

Figure 2:

Figure 2:

Changes in the population-adjusted breached records from 2015 to 2018 in the United States except Hawaii, Alaska, and Puerto Rico.

In 2015, Indiana had the highest rate of individual records breached (1.26 million affected records per 100,000 population) followed by Washington, New York, Maryland, and California. In the subsequent years, Arizona (65,140 affected records per 100,000 population), Kentucky (16,438 affected records per 100,000 population), and Arkansas (7,129 affected records per 100,000 population) had the highest population-adjusted data breaches in 2016, 2017, and 2018 respectively. Moreover, Washington, New York, California, and Georgia had a higher population adjusted affected records in all four years. In contrast, Idaho, Hawaii, North and South Dakota had lower population adjusted affected records throughout the study period.

Discussion

In this study, we evaluated all reported incidents of data breaches in the U.S. covering all the completed years after 2009 when the data breach registry was introduced under the HITECH Act. Our analysis shows a high burden of the data breach from 2010 to 2018 with a varying number of incidents and individual records affected. In addition, the changes in the covered entity, types of breaches, and media location of data breaches provide a broader picture of data breaches in the U.S. Moreover, the population-adjusted data breaches in 2015 after mandatory adoption of EHR illustrates the severity of data breaches in most of the states, which had decreased in subsequent years. However, many states continue to suffer from a high rate of the data breach, which is a major public health concern. Furthermore, the large data breaches affecting more than a million records per incident offer several critical insights. First, more than 72% breach incidents involved healthcare providers; in contrast, 8 large incidents involved health plans resulting in more than 105 million individual records, which is attributable to 69% of large breaches and 54% of all breaches. Therefore, careful attention should be given to the types of entities covered under HIPAA for securing the protected health information. Second, the increased rate of data breaches through hacking or IT incidents affecting 84% of the major breaches highlights the necessity of safeguarding the EHR systems. In addition, more than 123 million records were breached through the network servers, which revalidates the high number of IT-related incidents. Third, among the 23 major incidents, 11 resolved cases involving more than 46 million cases did not have any description on the OCR portal. A substantial lack of such information can hinder scientific, administrative, economic, and legal analyses about the respective cases.23

To mitigate the burden of data breaches in the U.S., it is critical to examine the reasons for such events and explore intervention strategies to address the same. First, at the institutional level, a lack of effective resources is a major challenge to prevent breaches.14,17 Such resources include physical infrastructure that can store and manage health records efficiently, electronic systems protected with adequate authorization processes, and human capital that can effectively utilize the health data management systems without compromising the safety and security of data.23 For the physical and electronic components, standardizing the inputs and processes may improve the safety outcomes of the health information systems.24 Second, it is essential to incorporate the data security competencies in the health education and training programs, highlight such competencies in the hiring process, promote the work culture ensuring data safety, and unauthorize the credentials from the information systems while replacing the human resources.23 Third, patients and their informal caregivers are increasingly using EHR and other digital health platforms. They should be empowered with adequate knowledge, safety practices, and access to secured patient data management systems. This can improve their EHR usage and engage them in preventive measures adopted by the providers and institutions. Fourth, at the systems level, exploring the challenges and opportunities to secure health information is central to bring changes for broader impact. One such critical challenge is the scope of HIPAA which was introduced earlier than the adoption of EHR.14 In that time, the instruments and technologies were not that complex as they are today. Therefore, revisiting and updating the legal and regulatory measures can strengthen the digital health systems against potential breaches. Fifth, data breaches affecting protected health information can involve federal agencies other than HHS. Recent investigations and settlements conducted by FBI and FTC, assessment of health information products and services by FDA are a few examples of how many agencies can engage during health data breaches.25,26 It is essential to define the roles and scopes of these institutions and develop a meaningful partnership to address health data breaches.27 Lastly, the institutional and systems-level approaches to secure protected health information would require sustained financing. The HITECH Act provides up to $27 billion over 10 years as incentives encouraging the adoption of EHR systems.28 Leveraging such options and bridging other health systems financing strategies would be vital to the successful prevention of health data breaches in the United States.

This study has several limitations. First, we could not assess the unique individual records out of all the breached records. Without such precision, it cannot be concluded that 194.74 million of breached records belong to the equal number of people. Second, the reported cases contain 500 or more individual records. This implies that all those breaches that had less than 500 records were not reported to OCR and therefore, those incidents are not included in this study. This systematic exclusion of unknown events jeopardizes the actual estimation of data breaches in the U.S. Third, we grouped the types and media locations of data breaches if there were more than one entry. An itemized number of affected records from multiple components could provide a more accurate characterization of the types and media locations of breach incidents. Fourth, we could not find what were the contents of breached protected health information, which could show the magnitude of how much importance the contents would have carried and how they can potentially be abused against the individuals and other stakeholders. Fifth, it is possible that there were other administrative, financial, or institutional data that were breached alongside the reported cases, which were not primarily covered under HIPAA as protected health information. Such information is not reported to OCR, but implicit data about covered entities serving a specific population in a specific geographic location can be used to derive inferences about health behavior, access, utilization, billing, and a wide range of outcomes. We could not find such data in the OCR portal and examine the same, which is another limitation of this study. Sixth, we could not evaluate if the reported incidents of data breaches, particularly those involving hacking or IT incidents, did make any changes within the data stored in those affected databases. Such manipulations of sensitive data might have more severe and long-term consequences. Lastly, we evaluated only the OCR-reported measures under the HITECH Act. We did not evaluate additional legislative and regulatory measures in the state-level which might have influenced the trends and characteristics of data breaches. Future research should explore these avenues, address the limitations of this study, and inform the development of multi-level interventions to address health data breaches in the U.S.

Conclusion

Advancements in health information technologies have improved the access to health services, however, the raising security concerns of adopting advanced technologies may affect the individuals and providers enormously instead of increasing the efficiency of the existing systems. The characteristics of data breaches and trend over the past nine years inform the high magnitude of this serious public health problem in the United States. Also, repeated breaches in different location and time using diverse strategies and media types indicate the overall weakness of the health system to ensure the safety of protected health information. To address such a massive problem which has been affecting millions of people, rigorous research is essential to explore nature and reasons for the data breaches. The findings of empirical research studies can facilitate the development of evidence-based policies and programs aiming the prevention of future breaches. Last but not the least, strong economic and political commitment is required to implement the multipronged preventive measures at the institutional and systems levels, where the key stakeholders can collaborate protecting the health data safety of the U.S. population.

Appendix 1: Characteristics of data breach incidents that affected more than a million individual records

Name of the entity Place and time Number of affected individual records (in million) Description of the incidents
AccuDoc Solutions, Inc. North Carolina,
2018
2.65 Under investigation
Employees
Retirement System of Texas
Texas, 2018 1.25 Under investigation
Iowa Health System d/b/a UnityPoint Health Iowa, 2018 1.42 Under investigation
Newkirk Products, Inc. New York, 2016 3.47 Unauthorized individuals accessed the electronic protected health information (ePHI) of 3,992,270 members of health plans and claims administrators before acquisition by a new parent company from the older one.
Banner Health Arizona,
2016
3.62 Not available
21st Century Oncology Florida,
2016
2.21 Two separate incidents breached the SQL database having millions of individual records of the covered entity which agreed to pay $2.3 million in lieu of potential civil money penalties to the Office for Civil Rights (OCR) in U.S. Department of Health and Human Services (HHS) and adopt a corrective action plan to settle probable violations.
Excellus Health Plan, Inc. New York, 2015 10 Not available
Medical Informatics Engineering Indiana,
2015
3.9 Not available
University of California, Los Angeles Health California, 2015 4.5 A hacking incident breached the computer network of the entity which had about 4.5 million individual records. Office for Civil Rights (OCR) collected assurances that corrective measures are implemented following the incident.
CareFirst BlueCross BlueShield Maryland, 2015 1.1 Not available
Premera Blue Cross Washington, 2015 11 Not available
Anthem Inc. Indiana,
2015
78.8 A series of cyber-attach breached the IT systems through phishing emails which affected a database containing about 79 million ePHIs. OCR’s investigations revealed a failure to conduct appropriate risk analysis, review IT systems, respond to security issues and protect ePHIs. A settlement of $16 million was paid to OCR and additional corrective measures were reported.
Xerox State Healthcare, LLC Texas, 2014 2 Not available
Community Health Systems Professional Services Corporations Tennessee, 2014 4.5 Not available
Community Health Systems Professional Services Corporation Tennessee, 2014 4.5 Not available
Montana Department of Public Health & Human Services Montana,
2014
1.1 A server hacking incident allowed misuse of its information system resources for almost 9 months. This incident affected more than 1 million individuals’ personal and health information. After that, several safeguarding and technical enhancement was adopted.
Advocate Health and Hospitals Corporation, d/b/a Advocate Medical Group Illinois,
2013
4.02 Multiple breaches were found and OCR reported critical issues about the covered entity including potential risks and vulnerabilities to all of its ePHI, implement adequate policies and procedures to limit unauthorized access to the electronic information systems, involve business associate to safeguard all ePHI in its possession and safeguard unencrypted laptop The covered entity agreed to pay a settlement amount of $5.55 million and implement a corrective action plan.
Science Applications International Corporation (SA Virginia,
2011
4.9 Not available
The Nemours Foundation Florida,
2011
1.05 A locked cabinet was removed from an IT service desk containing the electronic protected health information (ePHI) of 1.05 million individuals. Further, the covered entity improved various safeguards including advanced storage systems, encryption and dual factor authorization. OCR obtained assurances about the corrective measures.
IBM New York, 2011 1.9 Not available
GRM Information Management Services New Jersey, 2011 1.7 Unencrypted backup tapes of clinical system containing electronic protected health information (ePHI) of 1.7 million individuals were stolen. The involved business associate was terminated from contract and new contract as well as other corrective measures were adopted.
BlueCross BlueShield of Tennessee, Inc. Tennessee, 2010 1.02 Not available
AvMed, Inc. Florida,
2010
1.22 Two unsecured laptop computers containing ePHI (with demographic and clinical information, diagnoses, lab results, treatment and other data) were stolen from the covered entity’s premises. The entity adopted new policies and procedures to safeguard the records and prevent such events.

References


Articles from AMIA Annual Symposium Proceedings are provided here courtesy of American Medical Informatics Association

RESOURCES