Guidelines for Conducting Ethical Artificial Intelligence Research in Neurology: A Systematic Approach for Clinicians and Researchers

Sharon Chiang; Rosalind W Picard; Winston Chiong; Robert Moss; Gregory A Worrell; Vikram R Rao; Daniel M Goldenholz

doi:10.1212/WNL.0000000000012570

. 2021 Sep 28;97(13):632–640. doi: 10.1212/WNL.0000000000012570

Guidelines for Conducting Ethical Artificial Intelligence Research in Neurology

A Systematic Approach for Clinicians and Researchers

Sharon Chiang ^1,^✉, Rosalind W Picard ¹, Winston Chiong ¹, Robert Moss ¹, Gregory A Worrell ¹, Vikram R Rao ¹, Daniel M Goldenholz ¹

PMCID: PMC8480407 PMID: 34315785

Abstract

Preemptive recognition of the ethical implications of study design and algorithm choices in artificial intelligence (AI) research is an important but challenging process. AI applications have begun to transition from a promising future to clinical reality in neurology. As the clinical management of neurology is often concerned with discrete, often unpredictable, and highly consequential events linked to multimodal data streams over long timescales, forthcoming advances in AI have great potential to transform care for patients. However, critical ethical questions have been raised with implementation of the first AI applications in clinical practice. Clearly, AI will have far-reaching potential to promote, but also to endanger, ethical clinical practice. This article employs an anticipatory ethics approach to scrutinize how researchers in neurology can methodically identify ethical ramifications of design choices early in the research and development process, with a goal of preempting unintended consequences that may violate principles of ethical clinical care. First, we discuss the use of a systematic framework for researchers to identify ethical ramifications of various study design and algorithm choices. Second, using epilepsy as a paradigmatic example, anticipatory clinical scenarios that illustrate unintended ethical consequences are discussed, and failure points in each scenario evaluated. Third, we provide practical recommendations for understanding and addressing ethical ramifications early in methods development stages. Awareness of the ethical implications of study design and algorithm choices that may unintentionally enter AI is crucial to ensuring that incorporation of AI into neurology care leads to patient benefit rather than harm.

Artificial intelligence (AI) applications have begun to transition from a promising future to clinical reality in neurology. AI can transform neurologic clinical practice, with effects on quality, cost, and access to care.¹ Epilepsy serves as a paradigmatic case of AI's potential in neurology, for which the breadth and depth of emerging AI applications spans a rapidly expanding number of diagnostic, therapeutic, and prognostic applications.^2-10

Critical ethical concerns have begun to arise with AI incorporation into clinical practice. These include maximization of patient benefit while avoiding harm, risks to patient privacy, perpetuation of bias, and tradeoffs between competing ethical goals. A fundamental question that the epilepsy and broader neurology community must answer in coming years is the degree of responsibility each party (researchers, industry, clinicians, regulatory agencies) carries in the AI pipeline, in order to facilitate a common goal of ensuring that AI promotes rather than endangers ethical clinical practice.¹¹

With the rapid proliferation of AI in health care, there have been a number of broad initiatives^12-14 that provide general guidance on ethical values that AI should promote in health care. There is little consensus in the neurologic community on where responsibility lies in the AI pipeline for ensuring that benefit outweighs harm. The Food and Drug Administration (FDA)'s action plan to increase oversight over AI-based medical software¹⁵ is anticipated to help guide safe usage of AI in later market development stages. However, consideration of the ethical implications of AI research is important starting from early development/validation stages. First, from the viewpoint of promoting safe AI, many fundamental study design and algorithm choices are made during early development/validation stages that have downstream ethical implications. Second, from the viewpoint of direct benefit to early-stage researchers, early adoption of appropriate practices will decrease workload later when the system goes to market. It may be in researchers' interests to consider these factors early rather than engage in post hoc consideration that may require data retraining or recollection.

Whereas literature exists on good statistical practices to improve rigor and reproducibility from a technical standpoint,¹⁶ as well as a number of documents on guiding values for AI,^12-14 there is limited practical guidance to researchers/clinicians on how to systematically identify ethical implications of design choices in emerging AI research. Given the domain-specificity of AI data streams, patient vulnerabilities, and outpatient/inpatient differences in neurologic subspecialties, it may be helpful to consider distinct neurologic subfields individually. The purpose of the present work is to discuss a practical framework that researchers, peer reviewers, and clinicians may find helpful when evaluating the potential ethical ramifications of emerging AI research in the field of neurology, focusing on epilepsy as a paradigmatic case.

Five Key Ethical Principles for AI and Systematic Framework

The 4 core principles of bioethics—respect for patient autonomy, beneficence, nonmaleficence, and justice¹⁷—are pertinent also in AI. In addition, there is consensus that a fifth key essential principle arises when evaluating AI: explicability, or transparency of process^14,18 (eTable 1, available at Dryad, doi.org/10.5061/dryad.9zw3r22f8). Although the community has established that an “ethical AI” should enhance these 5 principles,¹ there are limited guidelines on how to implement these principles in practice when conducting or evaluating AI research. Consideration of the effect on these 5 ethical principles by developers is generally ad hoc.

It may be useful to contextualize ethical concerns in neurologic AI research by breaking down the AI development pipeline into 5 stages (eFigure 1, available at Dryad, doi.org/10.5061/dryad.9zw3r22f8)¹⁹: conceptualization; development, during which data collection, algorithm development, training, and testing take place; calibration, during which performance is evaluated; implementation in clinical practice; and monitoring, or maintenance in the clinical environment. Below, we focus on the initial 3 stages during which many fundamental AI choices are made: conceptualization, development, and calibration. We decompose each stage into the various design/algorithm choices made in that stage and discuss the implications of each choice for the 5 key ethical principles of AI.

Recommendations for Ethical Considerations by Stage

We followed published guidelines for development of health research reporting guidelines²⁰ for recommendations development. Anticipatory case analysis was first conducted with a focus on examples from the field of epilepsy. eAppendix 1 (available at Dryad, doi.org/10.5061/dryad.9zw3r22f8) shows case scenarios in epilepsy, some hypothetical and some based on real cases, that illustrate unintended consequences of AI applications that may endanger rather than promote ethical values. Each of these cases motivates the recommendations developed in this article, illustrates potential failure points, and raises discussion of checkpoints that the neurologic community can take in AI development. Anticipatory case analysis was then combined with systematic literature review and modified Delphi methodology (Figure 1 and eAppendix 2 [available at Dryad, doi.org/10.5061/dryad.9zw3r22f8]) to develop a set of 15 operational recommendations for conducting AI research in neurology (Tables 1–3).

Details available in eAppendix 2 (available at Dryad, doi.org/10.5061/dryad.9zw3r22f8). AI = artificial intelligence.

Table 1.

Checklist of Ethical Considerations When Conducting or Evaluating Artificial Intelligence (AI) Research in Epilepsy During Algorithm Conceptualization

Open in a new tab

Table 2.

Checklist of Ethical Considerations When Conducting or Evaluating Artificial Intelligence (AI) Research in Epilepsy During Algorithm Development

Open in a new tab

Table 3.

Checklist of Ethical Considerations When Conducting or Evaluating Artificial Intelligence (AI) Research in Epilepsy During Algorithm Calibration

Open in a new tab

Considerations in Stage 1 (Algorithm Conceptualization)

Q1. To What Extent Were Key Stakeholders Directly or Indirectly Involved in the Conceptualization/Design Phase of the AI Application?

Stakeholders who may benefit or be affected by the AI application should be defined at conceptualization. Most commonly, stakeholders of AI research in neurology include the users of AI applications, and may include patients, health care providers, patient families, or care providers. It is helpful to include key stakeholders early in AI development to understand how likely a specific data type is to be accepted for collection or reliably acquired by the patient community. Patient concerns about privacy and data security are a major public concern that may limit a system's implementation.^21,22 For example, specific data streams (such as video cameras, motion detectors, or chronic outpatient EEG) may affect patient privacy or fundamental rights²³; others, such as mobile phones or smart glasses, may be turned off or not used constantly. Understanding use cases early will help researchers understand anticipated limitations of potential data streams when deciding which variables to incorporate into algorithms. It is important to acknowledge when citing use cases that stakeholders may not be able to imagine all possible use cases of a technology that does not exist yet. Focus groups are often time-intensive to conduct and it may not be feasible to incorporate key stakeholders directly in the conceptualization/design phase. If prior research has been conducted describing the needs and concerns of key stakeholders with regards to the AI application, references to this literature should be provided. This step helps promote transparency about intentions, provides a means for fundamental rights and privacy assessment early in development, and helps ensure that patient perspectives are included early in the design phase

Q2. Is the Explainability of Methods Justified Against Potential Harm in the Case of Erroneous Predictions or Unreliable Human Supervision?

Methods with technical explainability can help to improve AI safety through understanding of key assumptions, unintended biases, and cases where performance may be low.^14,15 However, explainability of AI decisions is not always possible (or adequate), particularly in the case of “black box” approaches, such as deep neural networks. Although some advocate for avoiding “black box” approaches,^21,24 technical solutions such as post hoc and hybrid approaches can increase explainability for different machine learning models (eTable 2, available at Dryad, doi.org/10.5061/dryad.9zw3r22f8). Tradeoffs may also sometimes be necessary between increased accuracy (at the cost of explainability) and enhanced explainability (at the cost of accuracy). The level of expected technical explainability should also be balanced against the degree of explainability in the corresponding gold standard non-AI process; for example, if an AI process attempts to reproduce a human decision that is not explainable, the degree of technical explainability reasonably expected from the AI may not be as high. In all cases, but particularly in cases where explainability is reduced, including a discussion of other measures (e.g., limitations and generalizability of testing data, traceability, auditability, and transparent communication about system capabilities) is needed for internal verification, which also allows external reviewers to understand how unintended biases and model assumptions may affect performance.¹⁴ When determining whether the provided level of explainability is appropriate for an emerging AI, the potential severity of consequences in case of inaccurate predictions or unreliable human supervision should be considered. For example, certain outcomes, such as identification of surgical candidates, seizure forecasting, and sudden unexpected death in epilepsy prediction, may have more severe consequences than others in case of inaccurate predictions. Because of the often tacit assumption that AI predictions are closer to truth than patient report, it is important for clinicians to be cognizant of this latent assumption, especially in AI with low explainability, and evaluate when incorrect.

Q3. Is the AI Algorithm Intended for Locked or Continuous Learning?

Continuously learning applications automatically update using inputs during use, as opposed to locked applications, which do not change after initial training. Evaluating safety, efficiency, and equity for continuous and locked learning involves contain distinct challenges. “Distributional drift” is a phenomenon that occurs most frequently in locked learning when training data do not match ongoing testing data. Various types of distributional drift can occur, including covariate drift (where the input distribution changes), prior probability drift (where the outcome distribution changes), and concept drift (where the relationship between covariates and predicted outcome changes). Locked learning AI algorithms are particularly susceptible to distributional drift, which may lead to inaccurate conclusions. To ensure that AI operating in dynamic environments does not degrade over time, drift detection and performance re-evaluation when drift is suspected/detected are needed, particularly in locked learning. The rate of distributional drift will vary on a case-by-case basis. If original test data are diverse and large, distributional shift may be gradual, whereas if original test data are nonrepresentative, small, or if a major event occurs, distributional drift may occur quickly. Drift detection methods²⁵ are helpful for detecting when distributional drift occurs. If distributional drift is detected, performance estimates may no longer be up-to-date and should be re-evaluated. Benefits of performance re-evaluation must be reasonably weighed against financial costs and computational time. Alternatively, continuous learning AI mitigates distributional shift, but unless performance estimates are published in real time, it can lead to outdated and inaccurate performance estimates. The FDA does not currently have defined guidelines for monitoring changes in performance in continuously learning systems.

Q4. Is the AI Algorithm Assistive or Autonomous?

Assistive AI algorithms provide recommendations whereas autonomous AI algorithms operate autonomously without human supervision. Some devices and algorithms, such as responsive neurostimulation (RNS) or physiology-based smartwatches,^3,26 can operate as either. For example, both FDA-approved/cleared versions of these systems run autonomously in their seizure detection capacities; however, the clinician (in the case of RNS) or patient (in the case of a smartwatch) assists the AI in confirming detected events as true seizures or false alarms. If an algorithm intended solely for assistive capacity is misused in an autonomous capacity, this may lead to harm due to lack of appropriate human supervision. AI proposed for solely autonomous usage, such as closed-loop systems that operate outside of human without supervision, should be held to higher performance standards. There may be different implications for assumption of responsibility and liability in assistive vs autonomous AI systems, with some advocating for liability to be imposed on AI developers for autonomous systems.^21,27

Considerations in Stage 2 (Algorithm Development)

Q5. How Well Are Latent Biases in Training Data and Sources of Missingness Assessed and Mitigated?

Training datasets often include demographic inequalities, historical bias, and incompleteness. There are at least 2 major types of bias that can be present in training data. First, training data may not reflect accurately the epidemiology within a given demographic; for example, underdiagnosis of dissociative seizures in areas without access to tertiary epilepsy centers, underrepresentation of low socioeconomic status or rural populations in top research centers, or off-label use of medications/devices. Second, training data may undersample specific subgroups. For example, patients with rare epilepsies, children, and elderly patients are often underrepresented in training and testing data. Missing data can lead similarly to unrecognized systematic undersampling; for example, members of groups that have historically faced discrimination or other disadvantages may be more reluctant to provide personal information that could be used against them. These sources of bias may result in decreased accuracy in undersampled subgroups, perpetuate discrimination/marginalization, or lead to over- or underestimation of risk in specific populations.²⁸ Identifiable sources of bias should be acknowledged and removed in the data collection phase when possible. Strategies can include recruiting from diverse backgrounds, training the algorithm exclusively on the cohort alone, or training on data evenly distributed across cohorts. To promote transparency in potential latent biases, demographic characteristics should be reported in training/testing data for all AI algorithms. There are several key demographics in epilepsy (age, sex, socioeconomic status, intellectual/developmental disability) that can be useful to report in epilepsy due to common demographic inequities. Depending on the application, other characteristics may be relevant as well, such as race, seizure frequency, height/weight, comorbidities, medications, and epilepsy etiology. Whenever demographic subpopulations are underrepresented, latent bias should be acknowledged as a limitation.

Q6. Are Proxy Outcomes Used and What Are Sources of Measurement Error?

Use of proxy outcomes and measurement error may lead to algorithmic bias against patient groups. If proxy outcomes are used, a careful evaluation is warranted of cases where the proxy outcome will not reflect the desired outcome and consideration of how differences may result in algorithmic bias against patient subgroups. Examples of proxy outcomes include the use of hospital visits as a proxy for illness,²⁹ heart rate escalations as a proxy for seizures,³⁰ and sustained detection of epileptiform activity as a proxy for electrographic seizures.²³ Measurement error can independently be present in variables themselves, which can also result in algorithmic bias. For example, measurement error in counting self-reported seizures may be higher in seizures with loss of consciousness, which may result in algorithmic bias against patients with focal dyscognitive or generalized seizures.³¹

Q7. Could the AI Lead to Self-fulfilling Prophecy and Perpetuate Disparities Present in Training Data?

Training algorithms on real-world data will reflect disparities present in data and may result in perpetuation of AI bias and deescalation of care, violating nonmaleficence through self-fulfilling prophecy. For example, if clinicians de-escalate antiseizure medications (ASMs) early for patients predicted to be at high likelihood for failure from a particular ASM, then further training the AI on data reflecting these clinical decisions will likely classify these patients as likely to fail the ASM, resulting in even higher likelihood of early ASM deescalation. To mitigate this, sources of training bias should be acknowledged and attempts made to decrease effects of bias. Algorithms trained on real-world data, which are at greater risk for being subject to this bias, can also be trained on randomized clinical trial data to reduce bias. However, if algorithms are trained only on randomized trial data, this omits clinically important sources of knowledge present in real-world data. Samples studied in randomized clinical trials may also be outliers to the broader epilepsy community, including generally higher seizure frequencies than typical patients.³² Therefore, in data subject to bias from self-fulfilling prophecy, training on both real-world data and randomized clinical data is ultimately needed, along with acknowledgment of known bias sources and attempts for mitigation.

Q8. How Is Data Ownership/Access Defined?

It is important to define data ownership and patients' and researchers' rights to access data up front. There is an open debate about these choices.³³ Stakeholders claiming ownership may include patients, researchers, institutions, industry, and funding agencies. Data are often owned by the entity collecting the data, such as industry, funding agencies, or institutions. Patients may also seek access to their own research data.²³ Although considerations of autonomy suggest that patients should be provided direct access to research data, unregulated access may also lead to harm when there are no guidelines available for interpretation of raw data. Allowing open access is beneficial to the scientific community; however, doing so may decrease the competitive advantage of entities that have invested significant resources into data collection, who will likely incur additional costs to ensure data privacy protection and appropriate sharing. Establishing how these issues will be handled early on can help avoid downstream issues, as each choice has different implications on autonomy, nonmaleficence, and beneficence.

Considerations in Stage 3 (Algorithm Calibration)

Q9. How Comprehensive Is the Performance Testing?

Due to the context-specific nature of AI applications in neurology traditional testing on simulated data and a single real-world case example carries limited generalizability. Models tested on one clinical dataset may poorly generalize to datasets with other patient groups. Testing conducted by both internal and independent external parties increases auditability. Multi-institutional datasets, adversarial testing to “break” the system, incentive competitions for external developers, and AI self-play can be considered (eTable 2, available at Dryad, doi.org/10.5061/dryad.9zw3r22f8).

Q10. Are Practices Employed That May Lead to Overly Optimistic Performance Estimates?

Practices leading to overly optimistic performance estimates of AI systems include failure to compare with null models or the gold standard, comparison only to subpar competitors, improper separation of training and testing data, overfitting, and poor-quality or biased labeling practices.^34,35

Q11. When Optimizing Performance Testing Metrics, Was Optimization Tailored Toward Metrics Most Valued by the Target Population?

At minimum, all AI should report performance metrics that are standard in statistical practice. Appropriate performance testing practices are not within the scope of this article and are discussed elsewhere.³⁴ Accuracy, sensitivity, and specificity should not be reported in isolation. For example, when predicting whether an event will occur (such as seizure forecasting), one can achieve 100% sensitivity by always predicting that the event will occur. Similarly, because accuracy is the weighted average of sensitivity and specificity, it should never be reported in isolation; one can easily achieve near perfect accuracy even in the presence of low specificity if there is a high prevalence of events. As different patient populations may weight false-positives and -negatives differently, it is helpful to conduct an analysis or literature review prior to algorithm development of which populations are most likely to utilize the algorithm, and to gain an understanding of the relative importance of false-positives and false-negatives to the populations at greatest probability of usage. eTable 2 (available at Dryad, doi.org/10.5061/dryad.9zw3r22f8) highlights several examples.

Q12. Is There Equity in Performance Testing?

Ideally, estimates of performance should be provided for multiple patient subgroups in the intended use population. Subgroups evaluated should be of sufficient size for valid performance estimation. As additional data collection incurs cost to research/development, benefits must be balanced against cost practicalities.

Q13. Is There a Reasonable Plan for Periodic Reevaluation of AI Performance?

The safety, efficiency, and equity distribution of AI performance will change over time as clinical contexts change. A reasonable plan for reevaluation is needed in both locked and continuous learning AI systems.

Q14. Is the AI's Performance Level Justified Against the Potential Cost to Patients in Case of AI Error?

AI errors can include incorrect predictions, induced human complacency, and data/device failure modes (e.g., unreliable data collection or supervision). Minimum standards for reasonable performance and reliability should be weighed against the potential for patient, stakeholder, and societal harm and realistic worst case scenarios for harm caused by AI errors.

Q15. What Is the Ecological Impact?

AI systems with large computational costs may have an ecological effect in terms of carbon footprint. For example, natural language processing models generally incur high computational costs and may leave a greater carbon footprint than, for example, models built on seizure counts, which have far fewer events and categories to classify than natural language.³⁶ However, computational cost and ecologic effect must be balanced against performance and reproducibility, as the benefit of better performance or greater reproducibility may or may not outweigh the ecological cost of a greater carbon footprint. Strategies such as transfer learning and variational inference can help reduce resource consumption. In cases where equal performance and high reproducibility can be attained, more efficient approaches are preferred.

Discussion

Awareness of the potential ethical implications of study design and algorithm choices that may unintentionally enter AI research is crucial to ensuring that the effect of AI in neurology leads to patient benefit rather than harm. Concrete steps in the early stages of research can help preempt inherent structural issues contributing to later biases and unintended consequences.

This work is intended to provide AI developers and researchers with an operational set of guidelines for conducting ethical AI research in neurology and to provide clinicians and peer reviewers with a systematic approach to evaluating the potential ethical consequences of emerging AI research. We focus on epilepsy as a paradigmatic case, but similar approaches may be followed in other subfields of neurology: for example, considerations of assistive/autonomous usage, locked/continuous learning, and potential for self-fulfilling prophecy in automated detection of stroke and intracerebral hemorrhage³⁷; explicability and patient privacy in deep learning to predict Alzheimer disease³⁸; and patient privacy and latent bias in AI-based systems to predict diabetic neuropathy using facial recognition from home cameras.³⁹ Adopting a systematic approach to considering the ethical ramifications of emerging research on the principles of beneficence, nonmaleficence, patient autonomy, justice, and explicability can help ensure that the patient's benefit remains at the forefront of the neurologic community's efforts.

Limitations

The field of AI is fluid, and there are several caveats to these recommendations. (1) The proposed recommendations are intended for AI research in development/validation stages in neurology. There are various other ethical issues and questions that arise in later stages of AI development (e.g., implementation and maintenance in the clinical environment), and by other stakeholders, including end users and deployers, which have been addressed by other experts.⁴⁰ Regulatory guidance from the FDA is needed at later stages.¹⁵ (2) These recommendations are intended only to address ethical considerations related to AI use in neurology, and not its technical quality, which is addressed in other resources. (3) Issues already covered by institutional review board or FDA requirements, such as data privacy and protection, usability testing, regulations for off-label indications, informed consent, and liability/redress are not addressed here.

Glossary

AI: artificial intelligence
ASM: antiseizure medication
FDA: Food and Drug Administration
RNS: responsive neurostimulation

Appendix. Authors

Appendix.

Open in a new tab

Study Funding

The authors report no targeted funding. D.M.G. was funded in part by NIH KL2 5KL2TR002542. W.C. was funded in part by R01MH126997 and by R01MH114860.

Disclosure

R.W.P. is the cofounder of and a shareholder in Empatica, Inc., a digital health company that makes wearables and AI biomarkers, including the Embrace smartwatch used for monitoring and alerting to generalized tonic-clonic seizures; her research is funded at the MIT Media Lab by a consortium of companies including Google, Samsung, Merck KGaA, NEC, Takeda, and NTT Data and institutions including Massachusetts General Hospital and the Abdul Latif Jameel Clinic for Machine Learning in Health; and she receives speaker fees from Stern Strategy. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. R.M. is the cofounder/owner of Seizure Tracker, LLC, which has received funding from Cyberonics, Courtagen, Engage Therapeutics, Greenwich Biosciences, Neurelis, UCB, Brain Sentinel, Xenon Pharmaceuticals, and grants from Tuberous Sclerosis Alliance. G.A.W. has licensed intellectual property developed at Mayo Clinic to Cadence Neuroscience, Inc., and NeuroOne, Inc.; and is an investigator with Medtronic, Inc., for the Deep Brain Stimulation Therapy for Epilepsy Post-Approval Study (EPAS) and NIH Public Private Partnership UH3-NS95495. V.R.R. has served as a consultant for NeuroPace, Inc., manufacturer of the RNS System. D.M.G. serves as an advisor for and owns stock options in Magic Leap, is an advisor for EpilepsyAI, and is supported by the National Institutes of Health award number KL2 5KL2TR002542. S.C. is the founder of EpilepsyAI, LLC, a statistical consulting company. The remainder of the authors report no disclosures relevant to the manuscript. Go to Neurology.org/N for full disclosures.

References

1.Jobin A, Ienca M, Vayena E. The global landscape of AI ethics guidelines. Nat Mach Intell. 2019;1:389-399. [Google Scholar]
2.Beniczky S, Conradsen I, Henning O, Fabricius M, Wolf P. Automated real-time detection of tonic-clonic seizures using a wearable EMG device. Neurology. 2018;90(5):e428-e434. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Regalia G, Onorati F, Lai M, Caborni C, Picard RW. Multimodal wrist-worn devices for seizure detection and advancing research: focus on the Empatica wristbands. Epilepsy Res. 2019;153:79-82. [DOI] [PubMed] [Google Scholar]
4.Cunha JP, Choupina HM, Rocha AP, et al. NeuroKinect: a novel low-cost 3Dvideo-EEG system for epileptic seizure motion quantification. PLoS One. 2016;11(1):e0145669. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Varatharajah Y, Berry B, Cimbalnik J, et al. Integrating artificial intelligence with real-time intracranial EEG monitoring to automate interictal identification of seizure onset zones in focal epilepsy. J Neural Eng. 2018;15(4):046035. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Kleen JK, Speidel BA, Baud MO, et al. Accuracy of omni-planar and surface casting of epileptiform activity for intracranial seizure localization. Epilepsia. 2021;62(4):947-959. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Brinkmann BH, Wagenaar J, Abbot D, et al. Crowdsourcing reproducible seizure forecasting in human and canine epilepsy. Brain. 2016;139(pt 6):1713–1722. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Goldenholz DM, Goldenholz SR, Romero J, Moss R, Sun H, Westover B. Development and validation of forecasting next reported seizure using e-diaries. Ann Neurol. 2020;88(3):588-595. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Proix T, Truccolo W, Leguia MG, et al. Forecasting seizure risk in adults with focal epilepsy: a development and validation study. Lancet Neurol. 2020;20(2):127-135. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Gleichgerrcht E, Munsell B, Bhatia S, et al. Deep learning applied to whole-brain connectome to determine seizure control after epilepsy surgery. Epilepsia. 2018;59(9):1643-1654. [DOI] [PubMed] [Google Scholar]
11.Pedersen M, Verspoor K, Jenkinson M, Law M, Abbott DF, Jackson GD. Artificial intelligence for clinical decision support in neurology. Brain Commun. 2020;2(2):fcaa096. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Universite de Montreal. Montreal Declaration of a Responsible Development of Artificial Intelligence: 2018. Universite de Montreal; 2018: 4-12. 5dcfa4bd-f73a-4de5-94d8-c010ee777609.filesusr.com/ugd/ebc3a3_c5c1c196fc164756afb92466c081d7ae.pdf. [Google Scholar]
13.AI Ethics Initiative. Accessed on February 1, 2021. The Ethics and Governance of Artificial Intelligence Initiative. aiethicsinitiative.org/. [Google Scholar]
14.Independent High-Level Expert Group on AI. Ethics Guidelines for Trustworthy AI (April 8, 2019). ai.bsa.org/wp-content/uploads/2019/09/AIHLEG_EthicsGuidelinesforTrustworthyAI-ENpdf.pdf. [Google Scholar]
15.US Food and Drug Administration. Artificial Intelligence/Machine Learning (AI/ML)–Based Software as a Medical Device (SaMD) Action Plan. US Food and Drug Administration; 2021. [Google Scholar]
16.Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BJOG. 2015;122:434-443. [DOI] [PubMed] [Google Scholar]
17.Childress JF, Beauchamp TL. Principles of Biomedical Ethics. Oxford University Press; 2001. [Google Scholar]
18.Floridi L, Cowls J. A unified framework of five principles for AI in society. Harv Data Sci Rev. 2019;1. [Google Scholar]
19.Char DS, Abràmoff MD, Feudtner C. Identifying ethical considerations for machine learning healthcare applications. Am J Bioeth. 2020;20(11):7-17. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Moher D, Schulz KF, Simera I, Altman DG. Guidance for developers of health research reporting guidelines. PLoS Med. 2010;7(2):e1000217. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Murphy K, Di Ruggiero E, Upshur R, et al. Artificial intelligence for good health: a scoping review of the ethics literature. BMC Med Ethics. 2021;22(1):14. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.McCradden MD, Baba A, Saha A, et al. Ethical concerns around use of artificial intelligence in health care research from the perspective of patients with meningioma, caregivers and health care providers: a qualitative study. CMAJ Open. 2020;8(1):e90-e95. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Hegde M, Chiong W, Rao VR. New ethical and clinical challenges in "closed loop" neuromodulation. Neurology. 2021;96(17):799-804. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Rudin C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Machine Intelligence. 2019;1:206-215. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Moreno-Torres JG, Raeder T, Alaiz-Rodríguez R, et al. A unifying view on dataset shift in classification. Pattern Recognit. 2012;45:521-530. [Google Scholar]
26.Nair DR, Laxer KD, Weber PB, et al. Nine-year prospective efficacy and safety of brain-responsive neurostimulation for focal epilepsy. Neurology. 2020;95(9):e1244-e1256. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Pasquale F. When Medical Robots Fail: Malpractice Principles for an Era of Automation, 2020. Accessed on January 25, 2021. brookings.edu/techstream/when-medical-robots-fail-malpractice-principles-for-an-era-of-automation/. [Google Scholar]
28.Diao JA, Inker LA, Levey AS, Tighiouart H, Powe NR, Manrai AK. In search of a better equation: performance and equity in estimates of kidney function. N Engl J Med. 2021;384(5):396-399. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(6464):447-453. [DOI] [PubMed] [Google Scholar]
30.Osorio I, Manly BF. Probability of detection of clinical seizures using heart rate changes. Seizure. 2015;30:120-123. [DOI] [PubMed] [Google Scholar]
31.Mielke H, Meissner S, Wagner K, Joos A, Schulze-Bonhage A. Which seizure elements do patients memorize? A comparison of history and seizure documentation. Epilepsia. 2020;61(7):1365-1375. [DOI] [PubMed] [Google Scholar]
32.Romero J, Larimer P, Chang B, Goldenholz SR, Goldenholz DM. Natural variability in seizure frequency: implications for trials and placebo. Epilepsy Res. 2020;162:106306. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Martens B. The Impact of Data Access Regimes on Artificial Intelligence and Machine Learning. JRC Digital Economy Working Paper; 2018. [Google Scholar]
34.Friedman J, Hastie T, Tibshirani R. The Elements of Statistical Learning, Vol. 1. Springer Series in Statistics; 2001. [Google Scholar]
35.Smialowski P, Frishman D, Kramer S. Pitfalls of supervised feature selection. Bioinformatics. 2010;26(3):440-443. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Strubell E, Ganesh A, McCallum A.. Energy and policy considerations for deep learning in NLP. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics; 2018. [Google Scholar]
37.McLouth J,Elstrott S, Chaibi Y, et al. Validation of a deep learning tool in the detection of intracranial hemorrhage and large vessel occlusion. Front Neurol. 2021;12:655. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Park C, Ha J, Park S. Prediction of Alzheimer's disease based on deep neural network by integrating gene expression and DNA methylation dataset. Expert Syst Appl. 2020;140:112873. [Google Scholar]
39.Nadimi ES, Majtner T, Yderstraede KB, Blanes-Vidal V. Facial erythema detects diabetic neuropathy using the fusion of machine learning, random matrix theory and self organized criticality. Sci Rep. 2020;10(1):16785-16814. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Mirsky Y, Mahler T, Shelef I, Elovici Y. Proceedings of the 28th USENIX Conference on Security Symposium 461-478. USENIX Association; 2019. [Google Scholar]

[R1] 1.Jobin A, Ienca M, Vayena E. The global landscape of AI ethics guidelines. Nat Mach Intell. 2019;1:389-399. [Google Scholar]

[R2] 2.Beniczky S, Conradsen I, Henning O, Fabricius M, Wolf P. Automated real-time detection of tonic-clonic seizures using a wearable EMG device. Neurology. 2018;90(5):e428-e434. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Regalia G, Onorati F, Lai M, Caborni C, Picard RW. Multimodal wrist-worn devices for seizure detection and advancing research: focus on the Empatica wristbands. Epilepsy Res. 2019;153:79-82. [DOI] [PubMed] [Google Scholar]

[R4] 4.Cunha JP, Choupina HM, Rocha AP, et al. NeuroKinect: a novel low-cost 3Dvideo-EEG system for epileptic seizure motion quantification. PLoS One. 2016;11(1):e0145669. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Varatharajah Y, Berry B, Cimbalnik J, et al. Integrating artificial intelligence with real-time intracranial EEG monitoring to automate interictal identification of seizure onset zones in focal epilepsy. J Neural Eng. 2018;15(4):046035. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Kleen JK, Speidel BA, Baud MO, et al. Accuracy of omni-planar and surface casting of epileptiform activity for intracranial seizure localization. Epilepsia. 2021;62(4):947-959. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Brinkmann BH, Wagenaar J, Abbot D, et al. Crowdsourcing reproducible seizure forecasting in human and canine epilepsy. Brain. 2016;139(pt 6):1713–1722. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Goldenholz DM, Goldenholz SR, Romero J, Moss R, Sun H, Westover B. Development and validation of forecasting next reported seizure using e-diaries. Ann Neurol. 2020;88(3):588-595. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Proix T, Truccolo W, Leguia MG, et al. Forecasting seizure risk in adults with focal epilepsy: a development and validation study. Lancet Neurol. 2020;20(2):127-135. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Gleichgerrcht E, Munsell B, Bhatia S, et al. Deep learning applied to whole-brain connectome to determine seizure control after epilepsy surgery. Epilepsia. 2018;59(9):1643-1654. [DOI] [PubMed] [Google Scholar]

[R11] 11.Pedersen M, Verspoor K, Jenkinson M, Law M, Abbott DF, Jackson GD. Artificial intelligence for clinical decision support in neurology. Brain Commun. 2020;2(2):fcaa096. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Universite de Montreal. Montreal Declaration of a Responsible Development of Artificial Intelligence: 2018. Universite de Montreal; 2018: 4-12. 5dcfa4bd-f73a-4de5-94d8-c010ee777609.filesusr.com/ugd/ebc3a3_c5c1c196fc164756afb92466c081d7ae.pdf. [Google Scholar]

[R13] 13.AI Ethics Initiative. Accessed on February 1, 2021. The Ethics and Governance of Artificial Intelligence Initiative. aiethicsinitiative.org/. [Google Scholar]

[R14] 14.Independent High-Level Expert Group on AI. Ethics Guidelines for Trustworthy AI (April 8, 2019). ai.bsa.org/wp-content/uploads/2019/09/AIHLEG_EthicsGuidelinesforTrustworthyAI-ENpdf.pdf. [Google Scholar]

[R15] 15.US Food and Drug Administration. Artificial Intelligence/Machine Learning (AI/ML)–Based Software as a Medical Device (SaMD) Action Plan. US Food and Drug Administration; 2021. [Google Scholar]

[R16] 16.Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BJOG. 2015;122:434-443. [DOI] [PubMed] [Google Scholar]

[R17] 17.Childress JF, Beauchamp TL. Principles of Biomedical Ethics. Oxford University Press; 2001. [Google Scholar]

[R18] 18.Floridi L, Cowls J. A unified framework of five principles for AI in society. Harv Data Sci Rev. 2019;1. [Google Scholar]

[R19] 19.Char DS, Abràmoff MD, Feudtner C. Identifying ethical considerations for machine learning healthcare applications. Am J Bioeth. 2020;20(11):7-17. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Moher D, Schulz KF, Simera I, Altman DG. Guidance for developers of health research reporting guidelines. PLoS Med. 2010;7(2):e1000217. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] 21.Murphy K, Di Ruggiero E, Upshur R, et al. Artificial intelligence for good health: a scoping review of the ethics literature. BMC Med Ethics. 2021;22(1):14. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.McCradden MD, Baba A, Saha A, et al. Ethical concerns around use of artificial intelligence in health care research from the perspective of patients with meningioma, caregivers and health care providers: a qualitative study. CMAJ Open. 2020;8(1):e90-e95. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] 23.Hegde M, Chiong W, Rao VR. New ethical and clinical challenges in "closed loop" neuromodulation. Neurology. 2021;96(17):799-804. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] 24.Rudin C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Machine Intelligence. 2019;1:206-215. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] 25.Moreno-Torres JG, Raeder T, Alaiz-Rodríguez R, et al. A unifying view on dataset shift in classification. Pattern Recognit. 2012;45:521-530. [Google Scholar]

[R26] 26.Nair DR, Laxer KD, Weber PB, et al. Nine-year prospective efficacy and safety of brain-responsive neurostimulation for focal epilepsy. Neurology. 2020;95(9):e1244-e1256. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] 27.Pasquale F. When Medical Robots Fail: Malpractice Principles for an Era of Automation, 2020. Accessed on January 25, 2021. brookings.edu/techstream/when-medical-robots-fail-malpractice-principles-for-an-era-of-automation/. [Google Scholar]

[R28] 28.Diao JA, Inker LA, Levey AS, Tighiouart H, Powe NR, Manrai AK. In search of a better equation: performance and equity in estimates of kidney function. N Engl J Med. 2021;384(5):396-399. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] 29.Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(6464):447-453. [DOI] [PubMed] [Google Scholar]

[R30] 30.Osorio I, Manly BF. Probability of detection of clinical seizures using heart rate changes. Seizure. 2015;30:120-123. [DOI] [PubMed] [Google Scholar]

[R31] 31.Mielke H, Meissner S, Wagner K, Joos A, Schulze-Bonhage A. Which seizure elements do patients memorize? A comparison of history and seizure documentation. Epilepsia. 2020;61(7):1365-1375. [DOI] [PubMed] [Google Scholar]

[R32] 32.Romero J, Larimer P, Chang B, Goldenholz SR, Goldenholz DM. Natural variability in seizure frequency: implications for trials and placebo. Epilepsy Res. 2020;162:106306. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] 33.Martens B. The Impact of Data Access Regimes on Artificial Intelligence and Machine Learning. JRC Digital Economy Working Paper; 2018. [Google Scholar]

[R34] 34.Friedman J, Hastie T, Tibshirani R. The Elements of Statistical Learning, Vol. 1. Springer Series in Statistics; 2001. [Google Scholar]

[R35] 35.Smialowski P, Frishman D, Kramer S. Pitfalls of supervised feature selection. Bioinformatics. 2010;26(3):440-443. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] 36.Strubell E, Ganesh A, McCallum A.. Energy and policy considerations for deep learning in NLP. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics; 2018. [Google Scholar]

[R37] 37.McLouth J,Elstrott S, Chaibi Y, et al. Validation of a deep learning tool in the detection of intracranial hemorrhage and large vessel occlusion. Front Neurol. 2021;12:655. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] 38.Park C, Ha J, Park S. Prediction of Alzheimer's disease based on deep neural network by integrating gene expression and DNA methylation dataset. Expert Syst Appl. 2020;140:112873. [Google Scholar]

[R39] 39.Nadimi ES, Majtner T, Yderstraede KB, Blanes-Vidal V. Facial erythema detects diabetic neuropathy using the fusion of machine learning, random matrix theory and self organized criticality. Sci Rep. 2020;10(1):16785-16814. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] 40.Mirsky Y, Mahler T, Shelef I, Elovici Y. Proceedings of the 28th USENIX Conference on Security Symposium 461-478. USENIX Association; 2019. [Google Scholar]

PERMALINK

Guidelines for Conducting Ethical Artificial Intelligence Research in Neurology

Sharon Chiang, MD, PhD

Rosalind W Picard, ScD

Winston Chiong, MD, PhD

Robert Moss, BS

Gregory A Worrell, MD, PhD

Vikram R Rao, MD, PhD

Daniel M Goldenholz, MD, PhD

Abstract

Five Key Ethical Principles for AI and Systematic Framework

Recommendations for Ethical Considerations by Stage

Figure 1. Flowchart Demonstrating Steps Used to Generate Recommendations.

Table 1.

Table 2.

Table 3.

Considerations in Stage 1 (Algorithm Conceptualization)

Q1. To What Extent Were Key Stakeholders Directly or Indirectly Involved in the Conceptualization/Design Phase of the AI Application?

Q2. Is the Explainability of Methods Justified Against Potential Harm in the Case of Erroneous Predictions or Unreliable Human Supervision?

Q3. Is the AI Algorithm Intended for Locked or Continuous Learning?

Q4. Is the AI Algorithm Assistive or Autonomous?

Considerations in Stage 2 (Algorithm Development)

Q5. How Well Are Latent Biases in Training Data and Sources of Missingness Assessed and Mitigated?

Q6. Are Proxy Outcomes Used and What Are Sources of Measurement Error?

Q7. Could the AI Lead to Self-fulfilling Prophecy and Perpetuate Disparities Present in Training Data?

Q8. How Is Data Ownership/Access Defined?

Considerations in Stage 3 (Algorithm Calibration)

Q9. How Comprehensive Is the Performance Testing?

Q10. Are Practices Employed That May Lead to Overly Optimistic Performance Estimates?

Q11. When Optimizing Performance Testing Metrics, Was Optimization Tailored Toward Metrics Most Valued by the Target Population?

Q12. Is There Equity in Performance Testing?

Q13. Is There a Reasonable Plan for Periodic Reevaluation of AI Performance?

Q14. Is the AI's Performance Level Justified Against the Potential Cost to Patients in Case of AI Error?

Q15. What Is the Ecological Impact?

Discussion

Limitations

Glossary

Appendix. Authors

Study Funding

Disclosure

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases