Skip to main content
. 2022 Jun 13;27(8):3129–3137. doi: 10.1038/s41380-022-01635-2

Table 1.

Sources of dirty data.

Category Problem Examples
Phenotypic measures Measures are subjective • Poor inter-rater reliability and high variability in gold-standard diagnostic tools and behavioral measures [3335, 81]
Measures are nonspecific • High false-positive rate on ADOS in adults with schizophrenia [36]
Measures focus on the tails of behavior • Healthy controls will be zero inflated on questionnaire data [37, 38]
Participants Comorbidity • Symptoms of psychiatric disorders often overlap across diagnoses, while the majority of predictive models in psychiatry rely on more binary classification approaches
Medication • Psychiatric medications have the ability to alter BOLD signal patterns. This becomes difficult to study the psychiatric phenomena of interest as signals are confounded
Episodic symptoms • Symptoms change as a function of disease state. Data from scans based on one day may be vastly different in brain states relative to scans based on another day
Data collection Multi-site • Inter-scanner differences can induce significant variability [82, 83], and the complexity of the data analysis workflows could affect reproducibility [84]
Missing data

• Subjects not completing questionnaires

• Inability to complete behavioral testing or scan sessions in clinical populations [85, 86]