Abstract
Objective: Online data centers (ODCs) are becoming increasingly popular for making health-related data available for research. Such centers provide good privacy protection during analysis by trusted researchers, but privacy concerns may still remain if the system outputs are not sufficiently anonymized. In this article, we propose a method for anonymizing analysis outputs from ODCs for publication in academic literature.
Methods: We use as a model system the Secure Unified Research Environment, an online computing system that allows researchers to access and analyze linked health-related data for approved studies in Australia. This model system suggests realistic assumptions for an ODC that, together with literature and practice reviews, inform our solution design.
Results: We propose a two-step approach to anonymizing analysis outputs from an ODC. A data preparation stage requires data custodians to apply some basic treatments to the dataset before making it available. A subsequent output anonymization stage requires researchers to use a checklist at the point of downloading analysis output. The checklist assists researchers with highlighting potential privacy concerns, then applying appropriate anonymization treatments.
Conclusion: The checklist can be used more broadly in health care research, not just in ODCs. Ease of online publication as well as encouragement from journals to submit supplementary material are likely to increase both the volume and detail of analysis results publicly available, which in turn will increase the need for approaches such as the one suggested in this paper.
Keywords: data anonymization, confidentiality, privacy, biomedical research, health services research
BACKGROUND AND SIGNIFICANCE
Having access to data for population health and health services research is a powerful way to influence health policy and health promotion, hence enhancing the health outcomes of individuals.1,2 A number of studies have investigated privacy risks in deidentified health data across a range of scenarios. For example, Sweeney3 studied the privacy risks inherent in patient-level health data made publicly available by the State of Washington for a small fee. The data did not include patient names or addresses, except ZIP codes. Sweeney showed that the data could be compared with newspaper stories to uniquely reidentify 35 of the 81 cases in the 2011 data.
Privacy-related legislation and regulations applicable to population health and health services research contain requirements for privacy protection. However, there is a noticeable lack of details on what this means in practice or how it might be achieved.
While legal and ethical governance, data use agreements, and information security measures provide significant privacy protection, additional protection has traditionally been provided by specialized data anonymization treatments; that is, technical approaches designed to reduce privacy risk. Unfortunately, such treatments also reduce data utility, hence they are generally applied by experienced practitioners within data custodian agencies, building on long traditions of useful and privacy-protecting data releases.
A brief summary of approaches to privacy protection in the research use of data follows.4 We classify approaches as either noninteractive, whereby an anonymized dataset is downloaded to the researcher’s computer, or interactive, whereby a dataset remains in the custodian’s secure environment while being accessed by the user. In an interactive system, generally the dataset is more detailed, so anonymity in the output may need to be protected with statistical mechanisms.
In the noninteractive approach, users register with a custodian agency, and possibly sign a user agreement, before receiving anonymized data to be analyzed offsite. Typical anonymization treatments include removing identifying information, releasing samples of the original data, reporting values of variables in ranges, removing records or values from the data, rounding, swapping record values, and adding random noise. Agencies can also provide synthetic data, with observed values replaced by values generated from one or more population models, or another method.
Interactive approaches include the use of secure onsite data or online data centers. In almost all cases, analysis results are checked for privacy risk by an expert of the hosting agency; however, it is recognized that this solution may not be feasible in the long term as demand rises. Therefore, some agencies are starting to experiment with relying on users to anonymize their own analysis results.5 Remote analysis is also interactive, but more restrictive; users submit a restricted range of statistical queries through an online interface, analyses are carried out in a secure environment, and users receive results from the analyses only after automatic modification to protect privacy.6–10
Differential privacy has been proposed as a privacy risk standard for biomedical data anonymization applicable across both noninteractive and interactive approaches.11 An algorithm is differentially private if it satisfies a formal condition, essentially that when applied to any two datasets that differ in a single element, the algorithm will give very similar answers. This is often achieved with the addition of noise to the data or the analysis outputs. Differential privacy has a number of important strengths but also faces a number of empirical and practical barriers to its deployment in health care settings.12
The majority of published data anonymization guidelines have been developed for the noninteractive setting13,14 and are not relevant to the subject of this paper. In contrast, there are just a few published output anonymization guidelines. The European Centre of Excellence on Statistical Disclosure Control guidelines15 are designed for use by privacy risk experts within national statistical agencies, to determine whether outputs generated by researchers can be released for use in publications or presentations. In contrast, the Statistics New Zealand Microdata Output Guide16 describes methods and rules researchers must use for anonymizing output produced from Statistics New Zealand's microdata. Even when these are applied, most outputs are still subject to manual checks by privacy risk experts within Statistics New Zealand before release. Since both these sets of guidelines were designed to cover all possible statistical outputs and/or were written for statistical experts, they are naturally rather long, detailed, and technical. In their present form, they may not be immediately applicable for use by health researchers who are not experts in statistics.
This paper seeks to address the need for practical output anonymization guidelines designed for health researchers accessing data via an ODC. The guidelines we propose are presented as a checklist designed for application by researchers who are not necessarily experts in statistics or statistical disclosure control. With the growing popularity of ODCs, such approaches may become increasingly common. Our checklist can be used for any statistical analysis output, not just obtained through an ODC.
The paper is structured as follows: In the Methods section we describe an ODC setting and factors influencing the design of our solution to protecting privacy in analysis outputs. The Results and Discussion section provides a description of the solution. The Conclusions section provides a brief summary and potential implications of the solution.
METHODS
The model system used for this work is the Secure Unified Research Environment (SURE),17 an online computing environment that allows researchers to access and analyze linked health-related data for approved studies in Australia. Files enter or leave SURE only via the Curated Gateway, a secure application specifically developed for the facility17 (Section 5.3). Under current protocols, outbound files uploaded to the Gateway for use outside SURE are reviewed for privacy risk by a senior investigator on the research team. Other parties may be involved in the review of files passing through the Gateway as required on a case-by-case basis for individual studies. All such files are logged and are subject to audit by SURE staff.
We assume that data custodians prepare datasets to be compliant with all applicable legislation, regulations, and assurances given to data providers, and that researchers comply with researcher agreements, are authorized to view dataset records, and are unrestricted in the transformations and analyses they can perform on the data. Hence, we do not need to protect the privacy of individuals in the dataset against disclosure to researchers; in particular, we need not consider malicious privacy attacks by researchers, such as massively repeated regressions or regressions designed to reveal response variable values. We only need to consider privacy risks from release or publication of outputs from genuine statistical queries.
In an ODC, first the hosting custodian prepares a dataset and makes it available in the ODC. The researcher selects an available dataset and then can select a subset of it and/or apply transformations to tailor it for the desired analysis. The researcher then submits an analysis request and receives the output. Once satisfied, the researcher prepares the output for inclusion in academic publications, presentations, reports, or other dissemination channels.
In order to understand the range and nature of analysis outputs that are generated in health research, we reviewed a sample of published research papers and statistical reports. We also conducted a review of approaches to privacy protection in ODCs implemented on administrative data more general than health data.18–20 Based on these analyses and reviews, we propose a two-step approach to the challenge of highlighting potential privacy risks associated with release of analysis outputs generated within an ODC such as SURE, and treating the outputs to reduce privacy risk.
RESULTS AND DISCUSSION
Since researchers have unrestricted access to data in an ODC, anonymization treatments can be applied only to the dataset or to the analysis outputs.
Dataset preparation
Although we aim to provide researchers with the most detailed and unmodified datasets possible, it is still necessary to apply some basic anonymization treatments to datasets before making them available through an ODC.
Our literature review found a good degree of consistency about privacy protection in dataset preparation, as described below.
All identifiers should be removed from datasets. Obvious identifiers include names, addresses, dates, e-mail addresses, and license numbers; less obvious ones include biometric identifiers and Internet Protocol address numbers. A useful guide is given in the US Health Insurance Portability and Accountability Act of 1996.21
Privacy risk can be high for small datasets or small samples of larger datasets, so it is useful to require that the number of records, people, or organizations in a released dataset, or in each component of a combined released dataset, be greater than a minimum threshold. In practice, the value of the threshold can vary depending on other factors, such as the number and sensitivity of the variables available in the data. These requirements are not normally very restrictive, since datasets that fail these tests will generally be very small and hence have low utility.
A common method for reducing privacy risk is to reduce dataset detail by aggregating data categories; for example, reporting ages in 5- or 10-year groupings or aggregating dates to weekly, monthly, or yearly. However, such generalizations can damage correlations. If researchers need more detail, then they would need to seek access to more detailed data through, for example, an onsite data center or a more trusted user status.
Variables with a small number of non-zero values or combinations of variables that occur infrequently may carry high privacy risk across a number of analyses. One protective measure is to ensure that there are sufficiently large numbers of data values in both cases. Where such variables are needed in the dataset, detail can be reduced by data aggregation. Sparse variables and cross-classifications with few records generally have low utility, so this aggregation would rarely have an undue impact on data utility.
Published results of the same analysis run on two datasets differing in only a small number of records can present a privacy risk through analysis of differences between the analysis results. One protective measure is to ensure that all such datasets differ by at least a minimum threshold number of observations. While there is a risk that different users will request different samples that do not satisfy such a minimum threshold difference, the risk of a disclosure arising this way is probably low in the health research environment.
In some cases it may be necessary for data custodians to apply stronger statistical disclosure control methods, such as swapping data or adding noise, or apply a differentially private process, as shown in the Background and Significance section.
With health-related data, applying these treatments to all variables may not be necessary for privacy protection. We call a variable a quasi-identifier if it can be used, alone or in combination with other quasi-identifiers, to reidentify a record with high probability.22 The most common way this can occur is by matching quasi-identifiers to an external database containing identifying variables such as names and addresses. Examples of quasi-identifiers include sex, date of birth or age, location (such as post code or census geography), ethnic origin, total years of education, marital status, criminal history, income, profession, event dates (such as admission, discharge, procedure, or visit), codes, and country of birth. Examples of variables that are normally not quasi-identifiers include blood pressure, weight, height, and other physical observations.
In practice, only quasi-identifiers and combinations of them are likely to lead to identification and disclosure of sensitive information, thus the above privacy protection treatments only need to be applied to them. On the other hand, it is possible that attributes that are not quasi-identifiers today might become quasi-identifiers in the future, so it could be argued that this practical simplification should be implemented with care. More generally, the actual privacy risk of any research output will depend on the study context and other information available, particularly external datasets facilitating matching of quasi-identifiers, that may change over time.
In summary, data custodians should ensure that:
Identifiers are removed;
Each dataset has a sufficient number of records;
Datasets differ by a sufficient number of records;
Each variable and combination of variables has a sufficient number of records;
Where necessary, detail is reduced in some variables (such as dates and locations) using data aggregation; and
Where necessary, additional statistical disclosure control measures such as data swapping or addition of noise are applied.
Since only quasi-identifiers and combinations of them are likely to lead to identification and disclosure of sensitive information, these treatments need only be applied to them.
Output anonymization
To develop a tailored approach, we first specialized to the range of analysis outputs that occurred most commonly in the population health and health services literature. We then augmented the Statistics New Zealand and European Statistical System Centres and Networks of Excellence guidelines with additional privacy risk assessment methods and output anonymization treatments from the literature.6,7,14 We found that when we specialized the augmented guidelines from general statistical outputs to health research, we could simplify them sufficiently that preparing a checklist for use by health researchers was possible.
Our summary of the main ways in which disclosure can occur in statistical analysis outputs is as follows:
Individual data: Individual data values are always a potential privacy risk. These can be quoted directly or implied by other outputs, for example, the jump points in an empirical cumulative density function plot. This can include identifiers, which are sometimes revealed in the text of publications.
Threshold: A statistic computed on a small number of records is always a potential privacy risk, particularly when several analyses have been performed on a single dataset and there are a number of such statistics. This includes the familiar case of small cell counts in tables, but also statistics associated with other summaries such as means, modes, regression coefficients for interaction terms, or residual degrees of freedom.
Dominance: A statistic computed on a number of records where one is dominant is always a potential privacy risk. This includes the familiar case of small cell counts, but also statistics associated with other tabulations or summaries such as means, modes, regression coefficients for interaction terms, or residual degrees of freedom.
Differencing: Comparing values of the same statistic computed on two samples that differ in a very small number of records is a potential privacy risk.
Linear or other algebraic relationships: These can be exploited algebraically to increase potential privacy risk.
Precision: The more significant figures/decimal places provided in the output, the higher the potential privacy risk.
This summary suggests the following list of the most common anonymity tests:
Individual data: The value of an identifier or quasi-identifier is directly revealed.
Threshold n: A cell or statistic, determined by one or more quasi-identifiers, is calculated on fewer than n data values. Where a cell or statistic is determined by a combination of quasi-identifiers and other variables, the test is applied to the subcell or restricted statistic determined by only the quasi-identifiers.
Threshold p%: A cell determined by one or more quasi-identifiers contains more than p% of the values in a table margin corresponding to a quasi-identifier. Where a cell is determined by a combination of quasi-identifiers and other variables, the test is applied to the subcell determined by only the quasi-identifiers.
Dominance (n,k): Among the records used to calculate a cell value or statistic determined by one or more quasi-identifiers, the n largest account for at least k% of the value. Where a cell is determined by a combination of quasi-identifiers and other variables, the test is applied to the subcell determined by only the quasi-identifiers.
Dominance p%: Among the records used to calculate a cell value or statistic determined by one or more quasi-identifiers, the total minus the two largest values is less than p% of the largest value. Where a cell is determined by a combination of quasi-identifiers and other variables, the test is applied to the subcell determined by only the quasi-identifiers.
Differencing: Two cells or statistics determined by one or more quasi-identifiers are calculated in the same way but on populations differing in fewer than n records. Where a cell is determined by a combination of quasi-identifiers and other variables, the test is applied to the subcell determined by only the quasi-identifiers.
Relationships: The statistic involves linear or other algebraic relationships.
Precision: The output has a high level of precision in terms of significant figures and/or decimal places.
Degrees of freedom: Model output has less than a threshold number of degrees of freedom.
For rules with associated parameters (eg, n for threshold rules), the parameters loosely correspond to different levels of potential privacy risk. There is a range of values of the various parameters given in the literature. We suggest the following as starting points, which can always be adjusted for particular scenarios:
n = 10 for threshold, differencing, and degrees of freedom
p = 90 for threshold
n = 2, k = 90 for dominance
p = 10 for dominance
We developed a checklist for researchers to use when downloading outputs from ODCs, particularly when preparing them for inclusion in academic publications. The checklist is first used to highlight potential privacy risks in outputs through application of a number of anonymity tests. We stress that if an output fails a test, this does not mean it is necessarily a privacy risk. The most important example of this is that in practice, the tests only need to be applied to variables that are quasi-identifiers or combinations of quasi-identifiers. However, the privacy risk can also depend on the study context and other available information.
The checklist is presented according to types of statistical analysis output that commonly appear in public health research results: statistics such as means, graphical outputs such as Kaplan-Meier plots, modeling output such as relative risk, and tables. For each output type that appears in the first column of the table, say p-value of a test, the anonymity test column presents the applicable anonymity tests in separate rows. If a particular output fails an anonymity test, say if the p-value is given to extremely high precision, the anonymization treatment column suggests how to reduce the potential privacy risk; in this case, reduce the precision of the p-value. These treatments are designed to reduce the highlighted potential privacy risk if this is considered necessary or desirable by the researcher. A final column provides some notes, which could be deleted in a production version of the checklist. An important benefit of this approach is that researchers can ensure that the anonymization treatments applied do not adversely impact statistical inferences and conclusions drawn.
The full checklist and the above summary of the tests is presented as a Supplementary Appendix. As an example, the section for ratios and percentages is reproduced in Table 1. A general observation is that if a particular test is not relevant for an output class, it is not listed in the anonymity test column for that class. For example, linear relationships are not considered in the context of tables, since tables automatically involve linear relationships in their margins. That said, it has often been a matter of judgment as to which tests are listed. It is usually possible to think of circumstances in which an omitted test might be applicable even if it would not be among the primary options. For example, for a mean with known sample size, so the sum is also known, this sum might have low privacy risk if sufficiently rounded. So the precision test could be applied to the mean. We believe that the tests suggested are most immediately relevant and more widely applicable.
Table 1.
Checklist for assessing potential privacy concerns and anonymizing ratios and percentages in population health and health services research outputs
| Statistic | Anonymity test | Anonymization treatment | Notes |
|---|---|---|---|
| Ratios and percentages | Individual value | Do not report individual values |
|
| Threshold n |
|
||
| Threshold p% | |||
| Dominance (n,k) | |||
| Dominance p% | |||
| Differencing | Redefine one or both populations | ||
| Relationships | Reduce precision of relationships by rounding | ||
| Precision | Round values |
Other considerations
We have assumed that data custodians prepare datasets to be compliant with all applicable legislation, regulations, and assurances given to data provider organizations. One consequence is that no concern for the privacy of organizations or communities is required; we assume that this is considered in the ethical review and project approval phase.
We have focused on developing practical and useful anonymity tests for a single analysis output. However, privacy risk generally increases as the number of outputs from analyses conducted on the same dataset increases, and our tests were not designed to address this increased risk.
Data Environment Analysis has been developed by researchers at the University of Manchester in response to the recognition that assessment of privacy risk needs to consider the data environment into which a proposed dataset is to be released.23
CONCLUSION
We have considered the challenge of protecting the privacy of individuals represented in data made available for public health and health services research. We focused on the increasingly important ODCs, where data are kept in a secure environment and researchers are provided access over secure links. We have assumed that researchers using ODCs are fully authorized to view data records, and that they comply with applicable researcher agreements required by the center. Consequently, we do not need to protect dataset records from researchers or prevent malicious privacy attacks by researchers. The main concern is to protect privacy in released or published outputs from genuine queries.
We have recommended a two-stage approach:
A data preparation stage, in which data custodians apply basic anonymization treatments to datasets before making them available to researchers through a secure interface.
An output anonymization stage, involving the highlighting of potential privacy risks and the application of anonymization treatments to reduce the risks to acceptable levels, if necessary.
While there are general statements in applicable legislation, regulations, and guidelines referring to protecting privacy in research outputs and publications, there is a noticeable lack of practical advice or guidelines assisting researchers in achieving this protection. The purpose of this paper is to address this gap, in the form of a checklist for highlighting and treating potential privacy risks in analysis outputs. Future research could address refining the checklist and expanding it to better address the issue of cumulative queries, or indeed replacing this approach with a new and improved one.
Our approach enables researchers to check and treat their outputs by hand when removing them from ODCs for inclusion in research publications. In the future, this step may be able to be automated.
We believe it is essential to train researchers before they access an ODC. Such training should cover the regulatory environment and privacy expectations and requirements, privacy risk and how disclosures can occur, privacy risk assessment and application of anonymization treatments as needed, and, finally, how to ensure that anonymization treatments do not adversely impact statistical inferences and conclusions.
Although our method was developed in the context of an ODC, in fact it is applicable more broadly in other population health and health services research settings. For example, consider a data custodian making data available for offsite use. The custodian needs to conduct the actions in the data preparation stage before providing the data to the researcher. The researcher then conducts the actions in the output anonymization stage when preparing research results for publication.
Finally, we remark that public health and medical journals are encouraging authors to submit supplementary materials, including datasets and additional analysis results, for online publication. The ease of online publication is likely to increase both the volume and detail of analysis results publicly available, which in turn is likely to increase privacy risk.
Supplementary Material
ACKNOWLEDGMENTS
The authors thank Joseph Chien, Daniel Elazar, and Joanna Khoo for valuable contributions to the project. The first author thanks the Isaac Newton Institute for Mathematical Sciences, University of Cambridge, for support and hospitality during its Data Linkage and Anonymisation program, where work on this paper was undertaken.
SUPPLEMENTARY MATERIAL
Supplementary material is available at Journal of the American Medical Informatics Association online.
FUNDING
This work was supported by the Population Health Research Network, an initiative of the Australian government conducted as part of the Super Science Initiative and financed by the Education Investment Fund. This work was partially supported by a grant from the Simons Foundation. This work was supported by the UK Engineering and Physical Sciences Research Council, grant no. EP/K032208/1.
COMPETING INTERESTS
The authors have no competing interests to declare.
CONTRIBUTORS
CO'K led all aspects of the work, providing background context and information, leading the development of the confidentiality protection measures for online data centers, and drafting the manuscript. MW participated in all aspects of the work, particularly in creating the first version of the checklist and providing comments on the manuscript. MO'S participated in all aspects of the work, including writing and editing sections of a report that informed this manuscript. AI participated in all aspects of the work. TC posed the challenge of developing a confidentiality assessment and treatment checklist for researchers, provided background information on the SURE system, and gave guidance and comments on all aspects of the work.
REFERENCES
- 1. Safran C, Bloomrosen M, Hammond WE et al. Toward a National Framework for the Secondary Use of Health Data: An American Medical Informatics Association White Paper. J Am Med Inform Assoc 2007;14:1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Weiner MG, Embi PJ. Toward reuse of clinical data for research and quality improvement: the end of the beginning? Ann Intern Med 2009; 151: 359–60. [DOI] [PubMed] [Google Scholar]
- 3. Sweeney L. Matching Known Patients to Health Records in Washington State Data Harvard University: Data Privacy Lab; 2013. [Google Scholar]
- 4. O'Keefe CM, Rubin DB. Individual privacy versus public good: protecting confidentiality in health research. Stats Med 2015;34:3081–103. [DOI] [PubMed] [Google Scholar]
- 5. O'Keefe CM, Westcott M, Ickowicz A et al. Protecting confidentiality in statistical analysis outputs from a virtual data centre. Working Paper Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality. Ottawa, Canada; 2013. http://www.unece.org/stats/documents/2013.10.confidentiality.html. Accessed April 16, 2016. [Google Scholar]
- 6. Gomatam S, Karr AF, Reiter JP, Sanil AP. Data dissemination and disclosure limitation in a world without microdata: a risk-utility framework for remote access analysis servers. Stat Sci 2005;1:163–77. [Google Scholar]
- 7. Sparks R, Carter C, Donnelly JB et al. Remote access methods for exploratory data analysis and statistical modelling: Privacy-Preserving Analytics®. Comput Methods Programs Biomed 2008;91(3):208–22. [DOI] [PubMed] [Google Scholar]
- 8. Sparks R, Carter C, Donnelly J, Duncan J, O’Keefe C, Ryan L. A framework for performing statistical analyses of unit record health data without violating either privacy or confidentiality of individuals. Proc 55th Session of the International Statistical Institute 2005. [Google Scholar]
- 9. Lucero J, Zayatz L, Singh L, You J, DePersio M, Freiman M. The current stage of the microdata analysis system at the US Census Bureau. In Proc 58th Congress of the International Statistical Institute2011:3115–33. [Google Scholar]
- 10. Thompson G, Broadfoot S, Elazar D. Methodology for the Automatic Anonymization of Statistical Outputs from Remote Servers at the Australian Bureau of Statistics. UNECE Work Session on Statistical Data 2013. [Google Scholar]
- 11. Dwork C, Pottenger R. Toward practicing privacy. J Am Med Inform Assoc 2013; 20: 102–07. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Dankar F, El Emam K. The application of differential privacy to health data. Proc 5th International Workshop on Privacy and Anonymity in the Information Society 2012:158–66. [Google Scholar]
- 13. Elliot M, Mackey E, O’Hara K, Tudor C. The Anonymisation Decision-Making Framework. UK Anonymisation Network. http://ukanon.net/wp-content/uploads/2015/05/The-Anonymisation-Decision-making-Framework.pdf. Accessed August 20, 2016. [Google Scholar]
- 14. US Department of Health and Human Services. Guidance Regarding Methods for De-Identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. http://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/. Accessed August 20, 2016. [Google Scholar]
- 15. Hundepool A, Domingo-Ferrer J, Franconi L et al. Statistical Disclosure Control. United Kingdom: John Wiley and Sons; 2012. [Google Scholar]
- 16. Statistics New Zealand. Data Lab Output Guide. Wellington: Statistics New Zealand; 2011. http://www.stats.govt.nz/tools_and_services/microdata-access/data-lab.aspx. Accessed April 6, 2016. [Google Scholar]
- 17. Sax Institute. Secure Unified Research Environment (SURE) Guide, Version 1.3. 2012. [Google Scholar]
- 18. UK Data Archive. Secure Data Service Website. http://ukdataservice.ac.uk/get-data/how-to-access/accesssecurelab.aspx. Accessed April 6, 2016. [Google Scholar]
- 19. Ford DV, Jones KH, Verplancke JP et al. The SAIL Databank: building a national architecture for e-health research and evaluation. BioMed Central Health Services Res 2009;9:157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. University of Chicago, NORC Website. http://www.norc.org. Accessed April 6, 2016. [Google Scholar]
- 21. US Government. Health Insurance Portability and Accountability Act (HIPAA), 1996. http://www.legalarchiver.org/hipaa.htm Accessed April 6, 2016. [Google Scholar]
- 22. El Emam K. Guide to the De-identification of Personal Health Information. Boca Raton: CRC Press; 2013. [Google Scholar]
- 23. Elliot M, Lomax S, Mackey E et al. Data environment analysis and the key variable mapping system. In: Domingo-Ferrer J, Magkos E, eds. Privacy in Statistical Databases. Springer: Berlin Heidelberg; 2010:138–47. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
