Skip to main content
Clinical Medicine & Research logoLink to Clinical Medicine & Research
. 2014 Sep;12(1-2):103. doi: 10.3121/cmr.2014.1250.d1-2

D1-2: A Novel Method for Calculating Aggregate Counts While Censoring Small Cells in Multi-site Studies

Eric Baldwin 1
PMCID: PMC4453403

Abstract

Background/Aims

Healthcare research is often studied within small subgroups of a population where anonymity usually provided by aggregation breaks down due to small cell counts. In previous HMORN studies this problem has been handled by censoring counts if they are less than five, six or ten depending on the governing IRB or funding agency. Censoring small cells per site has the unintended effect of preventing calculation of the multi-site counts that are large enough to pose no reidentification risk. The proposed method censors small cells on a site-by-site basis while allowing the originating site to calculate aggregates unaffected by censoring.

Methods

The proposed method splits data delivery into two steps. In the first step a small cell censored file is sent to the originating site as usual. Each site then also creates a file containing the non-censored datum along with four or more dummy values. In the second step a file with row numbers of the true data is sent to the coordinating center and the file with dummy values is sent by all sites to a third party for addition. The third party adds each permutation of records within the five-row dummy blocks received. The sum file is then sent to the coordinating center along with a column indicating which rows from each site contributed to the final total. The coordinating center can then use the address files received from each data site to identify which row total is made up of all real values as opposed to rows that include dummy values. This is the true aggregate value.

Results

The proposed method allows discover true aggregate values while censoring small cell counts so as to preserve confidentiality of the members being represented. However, the anonymous third party handling does introduce logistical and privacy concerns that need to be managed carefully.

Conclusions

This method increases investigators’ ability to comply with the principle of releasing the minimum data necessary for medical research while not censoring more than needed. The approach also speaks to a growing concern about reidentification in nominally deidentified data.

Keywords: Informatics, Data collection


Articles from Clinical Medicine & Research are provided here courtesy of Marshfield Clinic

RESOURCES