Skip to main content
. 2024 Mar 8;26:e53008. doi: 10.2196/53008

Table 2.

Generative AIa in health care categories, data sources, and security or privacy threats in the data collection and processing phase.

AI categories Data source Security and privacy threats


Unintentional (integrity and privacy threats) Intentional (availability and integrity attacks)
Medical diagnostics
  1. Medical images (eg, x-rays, CTb scans, MRIc scans, pet scans, and microscopy images)

  2. Patient reports and EHRsd (eg, laboratory results, comorbidities, and symptoms)

  3. Clinical measurements (vital signs, tumor measurements, and fluid output)

  4. Patient metadata (demographics and family history)

  5. Expert annotations to train models

1-4: Incorrect, missing, or incomplete patient data or images occur owing to hardware or software errors, measurement and label errors, and human errors (eg, distorted images, partial images, and mismatched data or laboratory results or images)
1-4: Data integration errors occur when integrating data from various sources (eg, by mislabeling data attributes and mismatching patient information with their images and laboratory results)
1-4: Organic biases occur because of the nature of the disease and the demographics of patients, and selection biases rise because of human biases
5. Annotation errors and biases occur in all sources of data because of expert mistakes and human biases
1-4: Errors and bias in synthetic data or images
1-7: Privacy breaches (eg, reidentify patients)
1-3: Software tampering, medical sensor spoofing, medical equipment tampering or poisoning (eg, CT and MRI scanning equipment tampering), medical image tampering (eg, image scaling, copy-move tampering, sharpening, blurring, and resampling), generative fake data and images (eg, generative fake CT and MRI images undetectable by both human experts and generative AI), and medial data tampering or poisoning (eg, noise injection and maliciously synthesized data)
5: Annotation errors by intention
Drug discovery
  1. Genomic databases (DNA or RNA sequencing data)

  2. Chemical compound or protein structure databases

  3. Bioactivity assay data (in vivo and in vitro)

  4. Disease or treatment knowledge bases (peer-reviewed findings)

  5. Patient clinical trial data

  6. Toxicity predictions from pharmacokinetic models

1-2: Duplication issues (eg, sequence redundancies or sequence duplications with minor variations), structural errors, and assembly or carried-over errors owing to poor data quality of sources
1-6: Data integration errors occur when integrating data from various sources
4-5: Wrong findings and errors in trials
1-6: Missing and incomplete data, missing or incorrect annotations, and human errors
1-6: Errors and bias in synthetic data
6: Incorrect or inaccurate models
1-7: Privacy breaches (eg, reidentify patients)
1-5: Genomic data tampering or poisoning (eg, maliciously forge and inject structures or sequences, analyses, and findings)
1-5: Annotation errors by intention
6: Model tampering
Virtual health assistants
  1. EHRs

  2. Insurance claims data

  3. Patient symptom reports

  4. Mobile health data: data collected from mobile apps

  5. Speech and text inputs: data from patient interactions, including spoken dialogue and written communication

  6. Digitized medical reference information (guides and protocols)

  7. Custom health care knowledge bases

1-5: Incorrect, missing, or incomplete patient data
1-7: Data integration errors occur when integrating data from various sources
1-7: Organic biases occur because of the nature of the disease and the demographics of patients, and selection biases rise because of human biases
2: Errors owing to unknown fraudulent claims
6: Incorrect or inaccurate models
5-7: Errors and bias in synthetic data and AI hallucination
1-7: Privacy breaches (eg, reidentify patients)
1-7: Data or records tampering or poisoning (eg, noise injection using maliciously synthesized data, analyses, and findings)
1-7: Annotation errors by intention
1-7: AI hallucination
Medical research
  1. Clinical trial and study data sets

  2. Epidemiological data from public health departments

  3. Biomedical publications and preprint archives

  4. Physician’s notes and patient diagnosis histories

  5. Genomics databases

  6. NIHe open-source data repositories

  7. Biobanks: collections of biological samples

1-7: All the errors and biases mentioned in the above cells could be applicable 1-7: All the attacks mentioned in the above cells could be applicable
Clinical decision support
  1. Real-time patient data feeds (vitals, laboratory results, etc)

  2. EHRs

  3. Population health data

  4. Hospital medical reference or treatment protocol guides

  5. Custom evidence-based clinical rules or guidelines

  6. Medical insurance claims data

  7. g. Pharmaceutical reference database

1-7: All the errors and biases mentioned in the above cells could be applicable 1-7: All the attacks mentioned in the above cells could be applicable

aAI: artificial intelligence.

bCT: computed tomography.

cMRI: magnetic resonance imaging.

dEHR: electronic health record.

eNIH: National Institutes of Health.