Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Sep 3.
Published in final edited form as: Am J Bioeth. 2024 Sep 3;24(9):84–86. doi: 10.1080/15265161.2024.2377139

From Human-in-the-loop to Human-in-power

Elise Li Zheng a, Weina Jin b, Ghassan Hamarneh b, Sandra Soo-Jin Lee a
PMCID: PMC11384285  NIHMSID: NIHMS2008203  PMID: 39226019

When using Artificial Intelligence (AI) in medical settings, the critical role of humans in decision-making and accountability is often emphasized using terms such as “human-in-the-loop (HITL).” Salloch and Eriksen (2024) compellingly argue for enlisting patients as the “humans” in the loop and as epistemic partners in clinical decision support systems (CDSS). However, the choice of using the term HTIL carries significant implications and reflects how power dynamics in AI development and implementation can frame understandings of human-machine relationships.

The term HITL originated in the machine learning (ML) community, where it assumes that the role of humans is to aid ML models’ autonomous decisions when the ML models cannot fully handle edge cases or unexpected scenarios, such as overfitting. In this role, instead of actively engaging in clinical decisions, humans are relegated to merely checking or validating AI models’ decisions despite bearing the responsibility and accountability for those decisions. Therefore, the assumption of HITL downplays the role of humans as primary epistemic subjects. Such assumptions, prevalent in the technical sector, are inappropriate and unjust in medical settings. If this concept is uncritically adopted in clinical contexts, there is a risk of implicitly prioritizing AI’s role and marginalizing the role and power of humans or failing to align human objectives with AI’s development. The ethical debate centers on balancing human capability with AI technologies that may erode autonomy, skills, and independence of humans.

The adoption of HITL in clinical settings reflects in detail an overarching power imbalance between technical and clinical sectors, which differ in values, goals, and sociotechnical practices. The expanding influence from the technical sector is embedded in the framing of ethical concerns and solutions, including what problem needs to be addressed with AI, and where humans fit into the process. The power imbalance affects the key assumptions, methodologies, and objectives of clinical AI, often contradicting medical value systems and bioethical principles. To illustrate this issue, we present two aspects of empirical evidence corresponding to Salloh and Eriksen’s promises on co-reasoning and the integration of ethical principles.

First, as Salloh and Eriksen noted, co-reasoning involves contributions from patients and doctors as epistemic subjects. However, the current state of CDSS technical development does not reflect an equal power distribution. For example, Explainable AI (XAI) techniques are meant to explain AI decisions in ways humans can understand, a prerequisite for clinical co-reasoning. However, these techniques are often developed in a way to focus on superficial solutions for AI explanations in so-called “toy applications,” such as explaining why a natural image is recognized containing daily objects (cat, flower, bike, etc.). This approach assumes that the AI explanations are domain-agnostic and “surreptitiously substitutes” (Frank, Gleiser, and Thompson 2024) technical progress in toy applications for true progress in the real world, ignoring the specific needs of medical users. Clinical users, on the other hand, reported such techniques as out of context and even, at times, useless (Jin et al. 2024). The lack of clinical domain input is also evident in XAI evaluation metrics. A commonly used metric, plausibility, measures how well AI explanations align with human explanations. This metric is flawed because it misleads users into “trusting” AI explanations when they overlap with human explanations, even if AI predictions are incorrect (Jin, Li, and Hamarneh 2024). This metric primarily serves the technical sector’s objective of gaining trust but goes against users’ interests in using informative explanations to verify AI predictions (Jin et al. 2023), which is critical in clinical contexts.

Second, the power imbalance between technical and medical sectors also explains the pervasive “tradeoff” narrative between ethics principles, which Salloh and Eriksen argue against. There is a performance culture prevalent in the technical sector, whose influence is evident in technological development priorities and objectives of AI tools. For example, profit drives and performance/quality indicators might be in conflict with quality of care (Char, Shah, and Magnus 2018). Performance metrics often prioritize probabilistic approximation over causal reasoning, with post hoc explanations preferred over pre-placing reasoning with clinical insights (Rudin 2019). The performance versus explainability mentioned by Salloh and Eriksen is usually framed as an inevitable tradeoff. However, is it truly a necessary tradeoff, or does it reflect the technical sector’s power to establish AI’s epistemic superiority? Our ongoing work on the Theorem of Conditions for XAI Complementarity tells another story (Jin, Li, and Hamarneh 2024). The theorem theoretically proves that as long as AI can accurately indicate its decision reliability using explanations, AI can help humans achieve better performance than either humans or AI alone, regardless of the performance of the AI. This new technical objective of AI on being a reliable human collaborator integrates ethical principles of beneficence/efficiency and human control/autonomy, rather than framing them as performance tradeoffs.

These cases demonstrate that the power imbalance, manifested in unreflected ideologies, common practices, and technical standards in the technical sector, normalized and prioritizes certain objectives and benefits while trading off others. If left unchecked, this power imbalance can cause epistemic injustice in ML-based clinical decision support systems (ML_CDSS). This means clinical staff and patient perspectives may be ignored or underrepresented. It can also erode human autonomy by incorrectly assuming that automated solutions are appropriate for non-technical problems. Not all clinical problems can, or should, be solved by technological solutions. For example, during periods of high patient volume, increasing staffing, ensuring fair pay and reducing administrative burden might be more ethical and effective than relying on automated systems to offload medical tasks. Therefore, Salloh and Eriksen’s call for epistemic position of patients also demands a change in objectives and evaluation, which would fundamentally alter what type of information the system is producing, and how efficacy and effectiveness are measured with human involvement.

To ensure epistemic partnership, we need to operationalize ethical principles and approaches that acknowledge the power imbalance during the development stage and the capacity of various stakeholders to influence the objectives, measurements and metrics of technological design. We need to make power visible by incorporating sociotechnical analysis of technical decisions, and understand how dominant forces in the technical sector create languages, narratives, and value systems that might contradict clinical contexts and ethics. These forces influence decision-making by setting objectives, formulating problems, and deciding measurements and metrics, which should be made explicable and visible. Systems should be designed not to augment or legitimize AI’s epistemic capabilities but to show its limitations. Patients, as well as healthcare professionals, can benefit from such disclosure. There should also be stronger incentive for, and awareness by clinicians so they not only accept to get involved but rather actively seek being involved. Most importantly, we need to change the overall narrative of AI in the clinical context to reposition ML_CDSS within networks of human autonomy, complementing human decision-making capabilities and obligations. We call for a power shift to redefine problems and a different narrative in AI development to fully support the partnership model, reevaluating the overall pursuit of technical growth, efficiency, and automation that often leads to human displacement.

Reference

  1. Char Danton S., Shah Nigam H., and Magnus David. 2018. “Implementing Machine Learning in Health Care — Addressing Ethical Challenges.” New England Journal of Medicine 378 (11): 981–83. 10.1056/NEJMp1714229. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Frank Adam, Gleiser Marcelo, and Thompson Evan. 2024. The Blind Spot: Why Science Cannot Ignore Human Experience. MIT Press. [Google Scholar]
  3. Jin Weina, Fatehi Mostafa, Guo Ru, and Hamarneh Ghassan. 2024. “Evaluating the Clinical Utility of Artificial Intelligence Assistance and Its Explanation on the Glioma Grading Task.” Artificial Intelligence in Medicine 148 (February):102751. 10.1016/j.artmed.2023.102751. [DOI] [PubMed] [Google Scholar]
  4. Jin Weina, Li Xiaoxiao, Fatehi Mostafa, and Hamarneh Ghassan. 2023. “Guidelines and Evaluation of Clinical Explainable AI in Medical Image Analysis.” Medical Image Analysis 84 (February):102684. 10.1016/j.media.2022.102684. [DOI] [PubMed] [Google Scholar]
  5. Jin Weina, Li Xiaoxiao, and Hamarneh Ghassan. 2024. “Why Is Plausibility Surprisingly Problematic as an XAI Criterion?” arXiv. 10.48550/arXiv.2303.17707. [DOI] [Google Scholar]
  6. Rudin Cynthia. 2019. “Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead.” Nature Machine Intelligence 1 (5): 206–15. 10.1038/s42256-019-0048-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Salloch Sabine, and Eriksen Andreas. 2024. “What Are Humans Doing in the Loop? Co-Reasoning and Practical Judgment When Using Machine Learning-Driven Decision Aids.” The American Journal of Bioethics 0 (0): 1–12. 10.1080/15265161.2024.2353800. [DOI] [PubMed] [Google Scholar]

RESOURCES