Table 2.
Characteristics of privacy protection. Our categorization of privacy protection methods is based on terminology as reported by the original studies. However, the definitions of “de-identification” and “anonymization” vary across contexts; thus, the risk implications should be interpreted with caution.
| Characteristics | Values, n (%) | ||
| Ethical review | |||
|
|
Yes | 419 (90.3) | |
|
|
No | 45 (9.7) | |
| Patient consent | |||
|
|
Waiver of informed consent | 224 (48.3) | |
|
|
Not report | 148 (31.9) | |
|
|
Informed consent has been obtained | 92 (19.8) | |
| Data availability declaration | |||
|
|
Not report | 203 (43.8) | |
|
|
Corresponding author on reasonable request | 160 (34.5) | |
|
|
Not open | 66 (14.2) | |
|
|
Public | 35 (7.5) | |
| Privacy protection technology | |||
|
|
Not report | 178 (38.4) | |
|
|
Deidentification | 158 (34.1) | |
|
|
|
Cannot judge from report | 116 (73.4) |
|
|
|
Based on manual | 17 (10.8) |
|
|
|
Based on rule matching | 13 (8.2) |
|
|
|
Othersa | 12 (7.6) |
|
|
Anonymization | 91 (19.6) | |
|
|
Deidentification+Anonymization | 23 (5.0) | |
|
|
Othersb | 14 (3.0) | |
| Is there a statement to remove any personally identifiable information? | |||
|
|
No | 363 (78.2) | |
|
|
Yes | 101 (21.8) | |
| Were direct identifiers or indirect identifiers removed? | |||
|
|
Direct identifiers | 166 (35.8) | |
|
|
Indirect identifiers | 9 (1.9) | |
|
|
Cannot judge | 107 (23.1) | |
|
|
Not report | 182 (39.2) | |
| Whether the degree of deidentification is assessed? | |||
|
|
No | 458 (98.7) | |
|
|
Yes | 6 (1.3) | |
| Reidentification protection technology used? | |||
|
|
No | 455 (98.1) | |
|
|
Yes | 9 (1.9) | |
| Declaration of compliance with safety standards | |||
|
|
Health Insurance Portability and Accountability Act | 44 (9.5) | |
|
|
General Data Protection Regulation | 6 (1.3) | |
|
|
Both | 2 (0.4) | |
|
|
Not report | 412 (88.8) | |
aBased on rule matching + machine learning + deep learning (n=3), based on LLMs (n=2), based on rule matching + manual (n=2), based on rule matching + machine learning (n=1), based on synthetic data (n=1), based on postprocessing (n=1), based on machine learning (n=1), and based on deep learning+ postprocessing (n=1).
bData hosting (n=5), anonymization + data hosting (n=3), federated learning (n=1), anonymization + data hosting + homomorphic encryption (n=1), anonymization + homomorphic encryption (n=1), deidentification + data hosting (n=1), data augmentation (likely referred to synthetic data generation; n=1), and homomorphic encryption (n=1).