Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
letter
. 2024 Mar 5;121(11):e2321882121. doi: 10.1073/pnas.2321882121

The key role of absolute risk in the disclosure risk assessment of public data releases

V Joseph Hotz a,1, Christopher R Bollinger b, Tatiana Komarova c, Charles F Manski d, Robert A Moffitt e, Denis Nekipelov f, Aaron Sojourner g, Bruce D Spencer h
PMCID: PMC10945743  PMID: 38442168

We welcome Jarmin et al.’s (1) desire to engage with our article (2). We argued that absolute disclosure risk should be used to assess privacy risk. Jarmin et al. disagree and argue that differential privacy (DP) is superior. We find their arguments unpersuasive; our views are unaltered by their article. In addition, Jarmin et al. often muddled our article’s arguments. Interested parties should read our article to understand our arguments.

A major point unmentioned by Jarmin et al. is that individuals care about the absolute disclosure risk with the public release of data, which is needed to assess the benefits and costs of such a release. We illustrated this with a public health example. Jarmin et al. mention this example but do not make its point clear. In the context of data privacy, individuals may care about how much their privacy is reduced if they participate in a publicly released data set in the sense of relative disclosure risk. However, it should matter a great deal what the level of absolute disclosure risk is. That is, does the absolute disclosure risk change from 1 to 2 in a million or from 1 to 2 in a 100? Jarmin et al. fail to acknowledge what we noted ref. 3 established, namely that the DP criterion bounds relative disclosure risk, which is just the ratio of absolute disclosure risk to an intruder’s prior. As we note in our article, the latter result complicates the way the DP criterion controls absolute disclosure risk.

To bolster this point, we referenced an important article (4) which demonstrated it dramatically with an illustrative simulation exercise. The article showed that absolute disclosure risk “can be small even when [DP bounding] is relatively large” (p. 536) and provided an example demonstrating that “decreasing [DP’s] epsilon from ten to one did not noticeably reduce [absolute disclosure risk]” (p. 536). Our article concurred with the conclusion of ref. 4 that data stewards need to consider absolute disclosure risk directly when considering data release.

Jarmin et al. argue that DP’s management of relative disclosure risk is superior to focusing on controlling absolute risk because DP has convenient mathematical properties. One is composability (additivity of risk bounds across multiple queries); using absolute risk would require more difficult calculations. Another is that DP is independent of intruder knowledge, whereas absolute disclosure risk requires an assessment of that knowledge, which can be difficult. The technical points are correct. However, one should not choose a risk measure for mathematical convenience or ease of computation if it ignores the absolute risk of people’s identities being disclosed. Thus, we argue for using a risk measure that is calibrated to how much intruder knowledge exists and how it may change over time.

Jarmin et al. list 13 criteria against which risk measures should be assessed. Many seem vague and marginal to the issues at the heart of these decisions, chosen to highlight an advantage of DP not based on substantive importance.

We welcome continued discussion of these important issues for American society.

Acknowledgments

Author contributions

V.J.H., C.R.B., T.K., C.F.M., R.A.M., D.N., A.S., and B.D.S. wrote the paper.

Competing interests

The authors declare no competing interest.

References

  • 1.Jarmin R. S., et al. , An in-depth examination of requirements for disclosure risk assessment. Proc. Natl. Acad. Sci. U.S.A. 120, e2220558120 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Hotz V. J., et al. , Balancing data privacy and usability in the federal statistical system. Proc. Natl. Acad. Sci. U.S.A. 119, e2104906119 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Gong R., Meng X.-L., “Congenial differential privacy under mandated disclosure“ in Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference (Association for Computing Machinery, Virtual Event, USA, 2020), pp. 59–70. [Google Scholar]
  • 4.McClure D., Reiter J. P., Differential privacy and statistical disclosure risk measures: An investigation with binary synthetic data. Trans. Data Privacy 5, 535–552 (2012). [Google Scholar]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES