Social bias |
Worse, or systematically different, performance for marginalized groups; Reflects bias in dataset composition, annotation, or algorithm construction |
Perform bias audits; Retrain model with less problematic data or algorithm; Critically consider goals and (mis)uses of algorithms |
Causal Inference |
Model performance may reflect causal or confounding relationships in data, and model cannot distinguish them |
Continue using normal causal identification strategies (e.g., experiments, instruments) |
Interpretability |
Large number of parameters and nonlinear relationships render models opaque/inexplainable in human terms |
Visualize units’ “receptive fields”; “lesion” parts of model or augment data to reveal function |
Costly to train |
Large, high-quality training datasets can be expensive to collect/create; Training large models can require expensive hardware and incur large electricity costs |
Use pretrained models; Use smaller “distilled” models that offer similar performance with fewer parameters; Share costs with other researchers |
Performance |
Most existing models still perform worse than human gold standard; The types of errors made by models may be very different from those made by humans |
Wait for state-of-art to improve; tolerate scale vs. accuracy tradeoff; examine error patterns |
Generalization |
Model performance generally degrades under “distribution shift” – i.e., models can interpolate within the examples they have been trained on, but often fail to extrapolate to new regions of the feature/task space; Versions of the same model trained on the same data with different random seeds can generalize very differently |
Audit performance on own data; Fine-tune pretrained models to improve generalization to specific use case; Avoid deploying models to cases far beyond their training set; Stress test different versions of the same model |
Symbolic Reasoning |
Models cannot generically solve non-differentiable or symbolic problems, and unsupervised clustering; Large models can memorize specific symbol patterns but cannot generalize rules |
Use symbolic AI; Use hybrid deep learning-symbolic AI systems; Avoid non-differentiable problems; Audit for memorization |
Feedback |
Models are feed-forward only, meaning they cannot model feedback processes that occur in the brain; Limits ability to model temporal dynamics |
Use non-feed-forward ANNs (e.g., spiking networks); Model longer timescales (e.g., time course of learning) |
Technical skills |
Relatively high level of programming proficiency; acquisition of many skills specific to deep learning |
Create and use open learning resources (e.g., Jupyter Books); Amend graduate curriculum |