Privacy-Preserving Publication of Diagnosis Codes for Effective Biomedical Analysis

. Author manuscript; available in PMC: 2015 Jan 7.

Published in final edited form as: ITAB Corfu Greece (2010). 2010 Nov;2010:1–6. doi: 10.1109/ITAB.2010.5687720


Algorithm 1 Clustering-Based Anonymizer (CBA)

	input: Dataset , utility policy , comprised of potentially identifying sets of codes, and k
	output: Anonymized dataset
1.	←
2.	Populate a priority queue PQ with all sets of codes in
3.	while (PQ is not empty)
4.	Retrieve the top-most set of codes p from PQ
5.	foreach (i_m ∈ p)
6.	if (i_m is a generalized term)
7.	i_m ← the set of ICD codes mapped to i_m
8.	if (sup(p, ) ≥ k)
9.	remove p from PQ
10.	else
11.	while (sup(p, ) < k)
12.	find a pair {i_m, i_s} such that i_m is contained in p,
	i_m and i_s are contained in the same utility
	constraint u ∈ and ILM( (i_m, i_s) ) is minimal
13.	ĩ ← anonymize({i_m, i_s}, p)
14.	update p by replacing {i_m,i_s} with ĩ
15.	store the mapping of ĩ with the set of all ICD codes
	contained in it in {i_m,i_s}
16.	remove p from PQ
17.	return