Skip to main content
. 2021 Jun 17;136(5):554–561. doi: 10.1177/00333549211026817

Table.

Privacy characteristics used to create datasets and how they differ between the public-use and scientific-use datasets, developed in 2020 by the Centers for Disease Control and Prevention for design of 2 public datasets for COVID-19 case surveillance

Variable Definition Public-use dataset Scientific-use dataset
No. of fields Total fields 11 31
Privacy threshold Minimum acceptable value for privacy calculations. K is the threshold for k-anonymity; l is the threshold for l-diversity.
  • k = 5 [minimum allowed records sharing quasi-identifier values]

  • l = 2 [minimum allowed values of confidential fields by records sharing quasi-identifier values]

  • k = 5 [minimum allowed records sharing quasi-identifier values]

  • l = 2 [minimum allowed values of confidential fields by records sharing quasi-identifier values]

Quasi-identifier fields Dataset fields that may identify individuals
  • sex [sex]

  • age_group [age group]

  • race_ethnicity_combined [race and ethnicity]

  • sex [sex]

  • age_group [age group]

  • race_ethnicity_combined [race and ethnicity]

  • res_county [county of residence]

  • res_state [state of residence]

  • hc_work_yn [health care worker status]

Confidential fields Dataset fields that do not identify individuals but contain confidential information pos_spec_dt [date of first positive specimen] pos_spec_dt [date of first positive specimen]