Abstract
A quantitative methodology is described that provides objective evaluation of the potential for health record system breaches. It assumes that breach risk increases with the number of potential records that could be exposed, while it decreases when more authentication steps are required for access. The breach risk magnitude (BRM) is the maximum value for any system user of the common logarithm of the number of accessible database records divided by the number of authentication steps needed to achieve such access. For a one million record relational database, the BRM varies from 5.52 to 6 depending on authentication protocols. For an alternative data architecture designed specifically to increase security by separately storing and encrypting each patient record, the BRM ranges from 1.3 to 2.6. While the BRM only provides a limited quantitative assessment of breach risk, it may be useful to objectively evaluate the security implications of alternative database organization approaches.
Introduction
Breaches involving unauthorized disclosure of substantial numbers of identified electronic medical records are occurring with increasing frequency1. These events damage the trust of the public in health information systems, may seriously harm patients whose data is exposed, and are very costly to the organizations responsible for holding the data. As a result, preventing data breaches has become a very high priority in the design and implementation of clinical information systems.
This paper describes a quantitative assessment methodology to evaluate the impact of database architecture on the risk of security breaches. By applying this quantitative methodology, it is shown that alternative architectures can reduce these risks. This suggests that use of quantitative breach risk measures may be useful in guiding the design of more secure clinical information systems.
Need for Quantitative Breach Risk Assessment
Databases have traditionally organized information to facilitate rapid search and retrieval operations, while the security of the stored information has in general been a secondary consideration. Although it is relatively easy to measure and/or calculate the response time for database search operations, there are no existing quantitative measures that can assist information architects in evaluating the potential for security breaches when considering alternative possibilities for organizing and storing data.
There have been a number of prior efforts to develop security metrics. Zhang et al described a semi-quantitative approach for assessing enterprise security that attempts to model the behavior of potential attackers using the variables of intent, objective, and consequence as input to a Markov decision process2. However, the specific parameters of the model are unknown and must be estimated in each case. Harel et al described a “misuseability weight” that evaluates the sensitivity level of exposed data3. This sensitivity is used to calculate an “M-score” that incorporates both the quantity and quality of the information at risk, but does not include any assessment of the difficulty of obtaining access. Aissa et al defined a value-based cybersecurity measure that attempts to quantify the potential dollar losses of intrusions for specific stakeholders4. Bhattacharjee et al proposed a more complex scheme for assessment of overall enterprise level information security risk that assigns a specific risk level to each enterprise information asset based on its “threat-vulnerability” pair, and suggests specific actions of decreasing stringency to ameliorate risks for high, medium, and low risk assets respectively5. Finally, Aime et al described the risks of security metrics in general, noting that the use of such metrics does not replicate the performance of experts and therefore they must not be used in isolation6. However, none of this prior work describes a metric that can provide specific, easily calculated results incorporating both the quantity and degree of difficulty of accessing information.
Consistent with prior recommendations to identify and establish measures that provide insight into potential risks7, the breach risk magnitude measure was developed. Its fundamental assumption is that breach risk is related to the number of potential records that could be exposed by a specific sequence of authentication processes (typically passwords), while it is decreased when more such authentication steps are required for access. In other words, additional authentication steps make it less likely that an unauthorized access will occur. This is consistent with an intuitive understanding of the incentives of a potential attacker: a target is more attractive when either fewer obstacles need to be overcome to gain access or more records can be obtained by circumventing those obstacles.
Desiderata of Breach Risk Measure
Before attempting to define any quantitative measure of a previously subjective characteristic, its desired properties should be considered and described. These properties, which represent the requirements that must be met, allow the characteristics of the resultant measure to be assessed to determine if the goals have been achieved. An alternative set of requirements might lead to a substantially different measurement approach.
To be useful, a breach risk measure should be:
Roughly proportional to our intuitive sense of the level of risk;
Higher when the level of risk increases and lower when it decreases;
Able to express a very wide range of risk levels;
Straightforward to calculate across a variety of systems; and
Easily understood and interpreted.
The first three properties represent accuracy, proportionality, and scalability respectively, while the last two relate to ease of use.
Methods
The development of the measure was guided by the intuitive notion that a system allowing access to a larger number of records with a smaller number of authentication steps is inherently more vulnerable to compromise. This is consistent with concepts of event likelihood and impact/severity described in the risk assessment guidelines from NIST8. In this case, the likelihood of an event is assumed to be inversely proportional to the number of authentication steps, while the impact/severity of an event relates to the number of records exposed.
Based on these ideas, the breach risk assessment is defined as the ratio of the number of records that are accessible divided by the number of authentication steps required to enable that access. The larger the value, the higher the risk. To accommodate a wide range of values, the breach risk magnitude (BRM) is expressed as the common logarithm of the breach risk assessment.
As an example, assume a system with one million records, all of which can be accessed by the system administrator with a single authentication step. In this case,
Number of accessible records = 1,000,000
Number of authentication steps = 1
Ratio of accessible records to authentication steps = 1,000,000/1 = 1,000,000
Breach Risk Magnitude (BRM) = log10 (1,000,000) = 6
By expressing the measure as a log value, much like the Richter scale for earthquake magnitude9, an extremely wide range of values can be communicated quickly and easily. For example, the above system with only 100,000 records would have a BRM of 5, while the same system with 10,000,000 records would have a BRM of 7. Thus, each increase of one in the BRM value represents 10 times the potential security vulnerability.
In a given system, there are typically several different classes of users that access records using different procedures. There may be individuals who only are permitted to access their own record, researchers who may access a large subset (or all) of the records, and a system administrator who typically has access to all the records. Recognizing that the security vulnerability of a system relates to its weakest link, the BRM value for a given system is calculated as the maximum value for any class of users with access to the stored records. In many such systems, the user class with the maximum BRM will be the system administrator.
Calculation of the BRM measure does not require access to operational database systems. The input parameters of number of records and number of authentication steps for each user are evident from the organization of a database implementation. For example, any user given the capability to search the entire database by definition has access to all the records.
Results of Breach Risk Magnitude (BRM) Application
The BRM measure was evaluated for several sample database system configurations, each with one million records. Table 1 shows the calculation for a standard relational database management system. Three types of users are allowed in this example: 1) individuals with access to only their own record; 2) searchers who may access all the records for search purposes; and 3) system administrators who may access all data for any purpose. The BRM value for this system is 6.
Table 1.
Breach Risk Magnitude calculation for Relational Database Management System (ROMS)
| User Type | Authentications Required | # Records Accessed | Breach Risk | BRM |
|---|---|---|---|---|
| Individual | Login (1) | 1 | 1/1 | 0 |
| Searcher | Login (1) | 1 million | 1 million/1 | 6 |
| System Admin | Login (1) | 1 million | 1 mil lion/1 | 6 |
| OVERALL | Max BRM | 6 | ||
Table 2 shows the effect of requiring a second authentication step for searching or for the system administrator to access the records. While the risk ratio drops by 50%, the BRM measure only decreases to 5.7.
Table 2.
Breach Risk Magnitude calculation for RDMS requiring double authentication for access to entire dataset
| User Type | Authentications Required | # Records Accessed | Breach Risk | BRM |
|---|---|---|---|---|
| Individual | Login (1) | 1 | 1/1 | 0 |
| Searcher | Login and Entire dataset access (2) | 1 million | 1 million/2 | 5.7 |
| System Admin | Login and Entire dataset access (2) | 1 million | 1 million/2 | 5.7 |
| OVERALL | Max BRM | 5.7 | ||
Table 3 shows the impact of separating the demographic data from the remainder of the data for each person’s record, a commonly proposed technique for increasing security. A separate password is required to access the demographic and non-demographic data. While the searcher users no longer have access to complete records, the system administrator can still access all the data, albeit with three passwords. Therefore, the BRM for this approach only decreases to 5.52.
Table 3.
Breach Risk Magnitude calculation for RDMS requiring separate authentication for access to demographic and non-demographic records
| User Type | Authentications Required | # Records Accessed | Breach Risk | BRM |
|---|---|---|---|---|
| Individual | Login (1) | 1 | 1/1 | 0 |
| Searcher | Login and Access to deidentified nondemographic records (2) | No complete records | 0/2 | n/a |
| System Admin | Login, Access to de-identified nondemographic records, Access to demographic records (3) | 1 million | 1 million/3 | 5.52 |
| OVERALL | Max BRM | 5.52 | ||
Table 4 shows the results for a different data storage architecture known as the personal grid10. The personal grid is specifically designed to improve the security of personal information by storing each person’s data in a separate file with its own separate encryption. The encryption/decryption key for each file consists of two distinct and independent parts, one supplied by the user and the other by the system.
Table 4.
Breach Risk Magnitude calculation for personal grid architecture
| User Type | Authentications Required | # Records Accessed | Breach Risk | BRM |
|---|---|---|---|---|
| Individual | Login; System master password; System record password (3) | 1 | 1/3 | -0.48 |
| Search Server | Login (3 passwords); System master password; System record password (5) | 100(10,000 servers) | 100/5 | 1.3 |
| 2,000 (500 servers) | 2,000/5 | 2.6 | ||
| System Admin | Login; Cannot access individual records | 0 | 0/1 | n/a |
| OVERALL | Max BRM | 2.6 | ||
Since each record is separately encrypted and there is no inverted index of the records (like in a relational database), searching with the personal grid must be done sequentially one record at a time. To accelerate this prohibitively slow process, search operations are parallelized using cloud computing. When a search is needed, a large number of servers (e.g., between 500 and 10,000) are allocated temporarily (in a cloud computing environment) and each server simultaneously processes its share of the records. For example, with a database of 1,000,000 records, each search server would evaluate between 100 and 2,000 records (for 10,000 or 500 search processors respectively).
Finally, in the personal grid, no system administrator has access to any of the records in the database. System administrators only can access the system portion of the key for each record; the user portion is supplied by each user and is unavailable except when the user is logged in. Two separate system administrators are required to initiate a search operation, and no users (even system administrators) have access to any of the search servers when they are operating.
Since this architecture is designed to improve security, it is not surprising that its BRM value is between 1.3 and 2.6 depending on how many search servers are available for allocation. Note that this is several orders of magnitude lower than the 5.52 to 6.0 range for the prior examples of standard relational systems.
Discussion
A major security issue for health information systems is the potential for large-scale loss of data from a single unauthorized intrusion. These concerns have been reinforced by multiple, large-scale, widely reported examples of huge health data breaches, such as Anthem11 and Premera12. Such incidents have contributed to the widespread belief that central repositories of health records will not be trusted by consumers13. However, methods to objectively quantify this perceived risk have not been available.
In assessing the breach potential of database systems, it is generally understood that the risk is greater when the number of records available increases or the number of authentication steps needed to access records decreases. This observation is the basis for the BRM measure, which is designed to quantify this notion in an easily computable and understandable form. The BRM measure can also be viewed as an assessment of the “hacking reward/risk ratio” that provides a numerical representation of the number of records that will be accessible (the “reward”) for each authentication step (the “risk”). This is consistent with the observation that larger databases are more attractive targets for unauthorized access, and are more likely to be breached if there are fewer obstacles (i.e., authentication steps) needed for entry.
The results for the examples above show the usefulness of the BRM. For a standard relational database system, where a single authentication of a system administrator can provide access to the entire one million record database, the BRM value is 6. Adding an additional authentication step for data access only reduces the value to 5.7, while separating the demographic from non-demographic data results in a value of 5.52. This is consistent with the modest improvement in security provided by these alternative access arrangements.
However, when a substantially different architecture designed to improve security (the personal grid) is evaluated, the BRM measure drops dramatically to between 1.3 and 2.6 (depending on the number of search servers used). This shows clearly the security advantages of the personal grid, which separately stores and encrypts each patient’s record, thereby eliminating any path to access of the entire dataset in unencrypted form. This quantification of the security improvements for such an architecture could be very helpful to any organization considering its use.
Note that the security improvements of the personal grid result in longer search times and higher costs than an equivalent relational database. While retrieval of individual records (e.g., for clinical care) remains immediate, searching across 1 million records in a personal grid with 500 parallel processors is estimated to require about 80 seconds and cost $0.4410. For a fixed number of processors, both search times and costs will vary in direct proportion to the total number of records. However, the cost does not change with the number of processors since the overall CPU time required (which is the basis of the cost) is independent of how many search servers are allocated. Thus, personal grid search times may be reduced at no additional cost by utilizing the maximum number of parallel servers available.
In terms of the desiderata for such a measure described above, the BRM appears to meet all the criteria. It is consistent with the intuitive notion that risk is higher when more records can be exposed with fewer steps. Its quantitative nature inherently results in higher values when the risk is higher and vice versa. Since it is expressed as a logarithmic value, it can describe a very wide range of risks over many orders of magnitude. It is easy to calculate and does not depend on the specific characteristics of various system configurations. Finally, as has been shown by the examples described above, it is straightforward to interpret.
Limitations
The BRM measure for security assessment provides a specific, limited assessment of breach risk. It is not designed or intended to provide a comprehensive assessment of security risk, as it ignores several important factors that may contribute to security vulnerabilities including, but not limited to:
Strength of passwords/encryption. The BRM measure assumes that each authentication step functions as an effective barrier to system access and does not account for stronger or weaker passwords or encryption methods. It also does not consider the use of automatic account locking after multiple failed login attempts, or multi-factor authentication techniques. A possible variation on the BRM calculation would be to count each authentication factor separately, so two-factor authentication would be equivalent to two separate authentication steps.
Time factors. When data is distributed or stored inefficiently, it may take longer to retrieve. As a result, an attacker attempting to copy all the data may require an extended time period, making detection of the attack more likely. The BRM measure does not account for this possibility.
Other search risks. When data is distributed and must be fetched from other (outside) systems, the fetch process exposes the data to additional risks of interception or alteration. Such potential risks are not included in the BRM measure.
Personnel risk. Most data breaches involve some level of cooperation from insiders. The BRM measure does not assess the reliability of personnel, security training that may be provided, or the screening processes used for hiring.
Direct attack on unencrypted data. The BRM measure assumes that data are encrypted and cannot be accessed via an “out of channel” process that evades the normally required authentication procedures.
Number of users with maximum record access. In calculating the BRM, only categories of users are evaluated. The number of users having “maximum access” to records in a system is not considered. The BRM is based on the user category that provides the maximum access with the minimum number of authentication steps. Clearly, the larger the number of users in this “maximum access” category, the more likely it is that one such user’s credentials could be compromised.
Organizational, physical, and policy issues. The BRM measure does not include these potentially important factors.
In essence, the BRM measure only provides an estimate of the security risk of a specific data architecture and its associated access protocols. It will yield useful comparative results only if all other factors, including those described above, are held constant. Other methods should be used to assess security risk issues beyond the scope of the BRM measure.
Finally, in situations where the underlying database architecture of an application cannot be changed (e.g., when the database is a tightly integrated feature of a commercial EHR system), the BRM would only indicate the current level of risk, but would not be able to guide actions that can reduce that risk. In such cases, the BRM may be of limited value.
Conclusion
The breach risk magnitude (BRM) quantitatively measures the potential security vulnerability of databases. For a given system, it is calculated as the maximum value of the common logarithm of the number of accessible database records divided by the number of authentication steps needed to achieve such access. Estimating security risk for data organization and access arrangements and evaluation of the impact of alternatives can be facilitated using the BRM measure.
References
- 1.Doyle K. Health data breaches on the rise. Reuters 14 Apr 2015. Available at: http://www.reuters.com/article/us-healthcare-security-data-breaches-idUSKBN0N51PT20150414 (Accessed 10 Mar 2016)
- 2.Zhang Z, Nait-Abdesselam F, Lin X, Ho P. A model-based semi-quantitative approach for evaluating security of enterprise networks; Proceedings of the ACM workshop on applied computing; 2008. pp. 1069–1074. [Google Scholar]
- 3.Harel A, Shabtai A, Rokach L, Elovic Y. M-score: Estimating the potential damage of data leakage incident by assigning misuseability weight; Proceedings of the ACM workshop on insider threats; 2010. pp. 13–20. [Google Scholar]
- 4.Aissa AB, Abercrombie RK, Sheldon FT, Milli A. Defining and computing a value based cyber-security measure; Proceedings of the Second Kuwait Conference on e-Services and e-Systems; 2011. Article 5 (9 pages) [Google Scholar]
- 5.Bhattacharjee J, Sengupta A, Mazumdar C, Barik MS. A two-phase quantitative methodology for enterprise information security risk analysis; Proceedings of the CUBE International Information Technology Conference; 2012. pp. 809–815. [Google Scholar]
- 6.Aime MD, Atzeni A, Pomi PC. The risks with security metrics; Proceedings of the 4th ACM workshop on Quality of Protection; 2008. pp. 65–70. [Google Scholar]
- 7.Layman L, Basili VR, Zelkowitz MV. A methodology for exposing risk in achieving emergent system properties. ACM Trans Software Eng and Methodology. 2014;23(3) Article 22 (28 pages) [Google Scholar]
- 8.National Institute of Standards and Technology. Guide for Conducting Risk Assessments. Computer Security Division, Special Publication 800-30 (revision 1), September 2012. Available at: http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-30r1.pdf(Accessed 23 Jun 2016)
- 9.Richter CF. An instrumental earthquake magnitude scale. Bulletin Seismological Soc of America. 1935;25(1):1–32. [Google Scholar]
- 10.Yasnoff WA. A secure and efficiently searchable health information architecture. J Biomed Inform. 2016;61:237–246. doi: 10.1016/j.jbi.2016.04.004. Available at: http://dx.doi.org/10.1016/i.ibi.2016.04.004 (Accessed 2 May 2016) [DOI] [PubMed] [Google Scholar]
- 11.Abelson R, Goldstein M. Millions of Anthem customers targeted in cyberattack. New York Times 5 February 2015. Available at: http://www.nytimes.com/2015/02/05/business/hackers-breached-data-of-millions-insurer-says.html (Accessed 10 Mar 2016)
- 12.Reuters Premera Blue Cross says data breach exposed medical data. New York Times, 17 March 2015. Available at: http://www.nytimes.com/2015/03/18/business/premera-blue-cross-says-data-breach-exposed-medical-data.html(Accessed 10 Mar 2016)
- 13.Gibson R. Washington, DC: Subcommittee on Technology and Innovation, U.S. House of Representatives Committee on Science and Technology; 2010. Written testimony. In Hearing on Standards for Health IT: Meaningful Use and Beyond. Available at: http://science.house.gov/sites/republicans.science.house.gov/files/documents/hearings/093010 Gibson.pdf(Accessed 10 Mar 2016) [Google Scholar]
