Skip to main content
Singapore Medical Journal logoLink to Singapore Medical Journal
. 2024 Mar 26;65(3):150–158. doi: 10.4103/singaporemedj.SMJ-2023-279

Ethics of artificial intelligence in medicine

Julian Savulescu 1, Alberto Giubilini 2,, Robert Vandersluis 2, Abhishek Mishra 2
PMCID: PMC7615805  EMSID: EMS194524  PMID: 38527299

Abstract

This article reviews the main ethical issues that arise from the use of artificial intelligence (AI) technologies in medicine. Issues around trust, responsibility, risks of discrimination, privacy, autonomy, and potential benefits and harms are assessed. For better or worse, AI is a promising technology that can revolutionise healthcare delivery. It is up to us to make AI a tool for the good by ensuring that ethical oversight accompanies the design, development and implementation of AI technology in clinical practice.

Keywords: Artificial intelligence, ethics, medical artificial intelligence, responsibility, trust

INTRODUCTION

Artificial intelligence (AI), including generative AI and large language models (LLMs), is revolutionising medicine. Artificial intelligence assembles and connects vast amounts of information extremely rapidly to provide more effective means to achieve medical goals, such as diagnosis or treatment. In a way, ethics is like physics. In physics, the direction in which an object moves is determined by the sum of the vectors of force. Each vector has a direction and strength. In ethics, the vectors are reasons for acting in certain ways. To determine what we should do, we should weigh the relevant reasons for different possible actions. In this sense, whether and how to employ AI is an ethical issue.

ETHICAL RELATIVISM VERSUS CONTEXT SPECIFICITY

Ethical relativism is the view that ethics is relative to culture, time, people’s attitudes or other factors related to particular societal, group or personal norms. Ethical relativism is arguably false. The Nazis were wrong in doing what they did, even if their culture or group endorsed those practices. The idea of universal human rights involves the rejection of ethical relativism. However, ethics is context specific. A type of act that is wrong in one context (killing an innocent person) may be right in another (killing an innocent person who is dying and suffering and who desires to die). The particular facts matter to ethical judgement. Therefore, it is not possible to decide whether AI is good or bad, or whether it should or should not be employed. It will depend on the particular facts, the relevance of different values or the reasons for actions, in different cases.

CASE EXAMPLES

Breast cancer

Researchers at the University of Cambridge recently developed a prognostic and treatment algorithm for breast cancer based on data from almost 1 million women.[1] Male patients were excluded from the modelling process because breast cancer is rare in men (as are clinical trials) and presents and behaves differently in men. The algorithm was selectively deployed on female patients. Therefore, it produces the best treatment options for female breast cancer patients.[1]

DermAssist

Google developed a skin disease algorithm in 2020 to diagnose skin conditions, including melanoma. It was as good as dermatologists and better than general practitioners and nurses. In 2021, Google launched the DermAssist smartphone app. It was released to patients of all skin tones in Europe and granted a CE mark as a Class I medical device in the European Union, which is a form of self-certification.[2] The DermAssist app has not received regulatory approval in the USA.[3]

Google marketed the DermAssist app as merely a search tool (a ‘journey’), rather than a medical diagnostic device, but various press releases and website materials may have led patients to believe they could use it to make a medical diagnosis.[3] The model underlying DermAssist was also criticised for being trained and validated using data heavily skewed towards light-skinned patients. For Type VI skin on the Fitzpatrick scale (the darkest), only 46 samples (out of 16,530) were used for training and only one Type VI sample (out of 4146) was used for validation.[4,5] Consequently, there are concerns about its reliability in dark-skinned patients.

Algorithmic high-risk care management

‘High-risk care management’ programmes use AI to provide patient risk scores, which are then used for patient selection in ‘high-risk care management’ programmes. Concerns have been raised about race-based discrimination, as Black patients were underrepresented compared to White patients in such programmes across levels of sickness.[6] The reason for this outcome is that the algorithm used past medical expenditure by patients with private insurance as a proxy for healthcare need. However, as Black patients, even with insurance, utilised less services, the measure underestimated their true healthcare need. It was postulated that this lower utilisation was due to obstacles to access or distrust of the healthcare system based on past injustices and exploitation such as the Tuskagee syphilis experiments, where Black patients with syphilis were deliberately left untreated to follow the course of syphilis.

In summary, the aforementioned case examples highlight many of the risks and potential harms of AI in medicine: the perpetuation of injustice and inequality, distortion and subversion of the doctor–patient relationship, undermining of autonomy and consent, and harm to patients. Importantly, we see the risks and potential harms from the use of AI in medicine appear across the therapeutic process, from preconsultation to diagnosis, treatment selection and recommendation, and further disease management and prevention. Whether and how AI should be pursued in a particular context in medicine will involve weighing and managing the risks and benefits.

RISKS AND REASONS TO REGULATE ARTIFICIAL INTELLIGENCE IN MEDICINE

In this section, we outline ten ethical risks that can (and do) arise when using AI in medicine. As mentioned, these risks span the use of AI across the therapeutic chain — from prediagnosis to diagnosis, treatment and disease management. These risks are not meant to be exhaustive, nor do they (or some subset of them) collectively form any hierarchical structure. They reflect themes and concerns about ethics that arise when using AI.

While some of these risks might be interrelated (e.g. reliability of AI systems and their ability to provide explanations can affect the extent to which we trust such systems), they have been specified in a way to capture the key elements of the literature around the ethical concerns from the use of AI in medicine. These concerns have been captured for the purposes of this article, not through a systematic meta-analysis but through a more subjective selection of relevant discussions as referenced throughout this paper.

Effectiveness, reliability and evaluation

In medicine, pharmaceuticals undergo strict regulation and require evidence from clinical trials before they are licenced. However, AI does not require such strict evaluation. For example, the DermAssist app is self-certified only in Europe, though this may not be permitted in 2025 when new European legislation is introduced. Also, AI does not require randomised controlled trials. For example, AI based on time-lapse videography of embryo development has been introduced to select embryos in in vitro fertilisation without any randomised controlled trials comparing it to human embryologist selection. While performance data are impressive, there are a number of problems with such non-ecological evaluation, especially of non-interpretable black box models, and a number of recommendations have been made pertaining to AI in general [Box 1].[7]

Box 1.

Summary of recommendations for use of artificial intelligence (AI) in embryo selection.

1. Use of replicable, interpretable machine learning tools and data
2. Well-designed and conducted randomised controlled trials
3. Postimplementation surveillance
4. Regulatory oversight requiring interpretable AI whenever possible
5. Funding for public institutions to transparently develop and evaluate machine learning models, and open access to code used in models
6. Procedures for maintaining security of patient/embryo data while permitting ethical data sharing
7. Fully informed consent to use AI
8. Inclusion of patient values into AI programmes where possible
9. Training for clinicians to understand AI models and explain them to patients

Randomised controlled trials have been the vanguard in the evidence-based medicine revolution. When it comes to AI, they should be performed on a system formed by AI and the humans who use it, since the result is determined by AI and how it is used. This is obviously complicated, not least because it is difficult to predict the contribution of the human factor in the use of AI in clinical settings. Moreover, AI evolves quickly, so randomised controlled trial is not as adept as it is for drug trials. New forms of evaluation of AI will need to be determined to assess it in ecological environments — that is, in the context in which it is actually being deployed. This will need to be richer, deeper and more comprehensive than typical postmarketing surveillance.

Justice, inequality, bias, discrimination and fairness

As the case examples show, AI will typically have different (unequal) performance for different groups. Because AI uses big data, there is potential for grouping individuals into many different categories. Performance of medical AI can be measured for such specific groups. The main problems arise when the criterion for such grouping is one of the ‘protected categories or characteristics’. These typically single out groups that have been subjected to discrimination in the past and for whom differential treatment must be explicitly and convincingly justified. In Singapore, protected characteristics include: (a) age; (b) nationality; (c) sex, marital status, pregnancy status and caregiving responsibilities; (d) race, religion and language; and (e) disability and mental health conditions. In two of the case examples discussed (DermAssist and algorithmic high-risk care management), questions have been raised — one of which is whether differential treatment based on sex and skin colour (which correspond to protected characteristics) constitutes discrimination and injustice.

Discrimination occurs when like cases are treated differently without there being a significant enough, morally relevant difference. What counts as a significant enough difference to justify differential treatment is a value judgement and is up for debate. In the case example of breast cancer, there are putatively morally relevant differences between male and female cancer patients. The reason why they are treated differently by the algorithm is that breast cancer is more prevalent in women, and therefore, the algorithm is more effective when it is trained with more data about women. Effectiveness is a morally relevant factor as it means more cases of cancer can be treated and lives saved.

Whether unequal outcomes represent injustice and how to address them if they do are important and difficult questions. The answers depend largely on what theory of justice one employs [Box 2]. There are at least four non-mutually exclusive responses to unjust outcomes:

Box 2.

Methods of ethics for artificial intelligence (AI).[14]

Conceptual analysis
• Discrimination: treat like cases alike unless there is a morally relevant difference
• Responsibility: requires knowledge of consequences and control
• Praise, blame: dependent on responsibility
• Cocreation
• Moral status
• Consciousness/sentience
Theories
• Consequentialism, including utilitarianism
• Deontology — rights, correcting injustice, equality
• Virtue ethics — ‘good doctor’
Mid-level (four) principles
• Respect for autonomy
• Beneficence (benefitting)
• Non-maleficence (not harming)
• Justice (utilitarianism [maximise utility for all], egalitarianism [equal treatment for equal need], prioritarianism [priority to the worst off], maximin [worst off are as well off as possible])
Consistency
• Actual practice: AI versus other technology (telemedicine, Internet, etc.)
• Thought experiments: Turing test
  1. Correct training sets, so that all groups have equal or fairer outcomes. This is often called inclusion and diversity (see algorithmic high-risk care management case example).

  2. Inform relevant groups of relevant performance data (sensitivity, specificity, calibration, etc.) and allow members to choose whether to employ AI. This could be required in DermAssist.

  3. Exclude groups for which AI is less effective and reliable (see breast cancer case example).

  4. Employ AI anyway if the utility for all the groups is sufficiently high.

The goal of precision medicine is to make predictions and apply treatments that are specific to individuals. As AI uses groups of smaller and smaller size, which are more and more relevant to the individual (e.g. women aged 20–25 years with a body mass index of 20 kg/m2 and a family history of diabetes mellitus), performance will become less reliable, as there are fewer and fewer members of that group whose data can be used to train the algorithm. Membership of large protected categories may be medically relevant (e.g. sex in breast cancer and skin pigmentation in melanoma are relevant because they are associated with significant variation in risk profile for these conditions). Hence, the questions of inequality of outcome and sufficient performance involve value judgements, including which theory of justice to employ, for example, whether suboptimal results in a particular protected category are justified by overall higher efficacy on the population considered as a whole.

There are various kinds of bias that can occur in AI design. The algorithmic high-risk care management case example is an instance of ‘label bias’, in that the data label used (‘medical expenditure’) meant different things for different patients. This is “because it is an imperfect proxy [for future healthcare need] that is subject to health care disparities rather than an adjudicated truth”.[8] Other biases include ‘minority bias’, which results in the underrepresentation of minority groups in the training data (as in the case of DermAssist), ‘missing data bias’, when data are missing for protected groups in a non-random fashion, and ‘informativeness bias’, when chosen features are less informative for prediction in a protected group.[9]

Privacy and confidentiality

Artificial intelligence requires big data, and this risks breaching the privacy and confidentiality of patients. There are a number of responses to these concerns. Firstly, valid consent can be obtained for data usage. Because it is difficult to predict the different ways in which data will be used and to what extent individuals will consent to unknown future uses, different models of consent have been introduced alongside the traditional one, to better account for the possibilities opened up by big data. These include ‘broad consent’, which enables individuals to consent to a wide range of more or less specified future uses[9]; meta-consent, where individuals retain control over which type of consent they will want to give to future uses (e.g. blanket consent on certain types of uses, but specific consent on other types)[10]; and dynamic consent, where consent is personalised and based on interactive online platforms in which participants can constantly engage as they see fit, in real time.[11] Secondly, data can be de-identified or anonymised to protect privacy, though there are risks of reidentification, which is linked to the level of ‘uniqueness’ of each individual.[12] Thirdly, new authorities can be created to manage data and data linkage, which protect confidentiality, such as the TRUST platform in Singapore. Other models have been suggested, including the use of novel privacy protection technologies, such as ‘block chain’ technologies.[13]

One method of ethics is consistency across relevantly similar cases [Box 2]. This could imply that the standards of protection of disclosure of data in other areas of life (such as social media, purchasing, browsing, etc.) should be employed for the use of medical data. Whatever level of risk is acceptable in these other areas should be considered acceptable in the case of medical data. However, one possible objection to this claim is that medical data are more sensitive, though this may not always be the case.

Machine paternalism and respect for autonomy

Paternalism involves doing what is in someone’s best interests, against that person’s expressed preference. Soft or weak paternalism involves acting in the best interests of someone whose decision-making capacity or information base is reduced, i.e. the individual is not in a position to adequately assess what is in their best interest. Hard or strong paternalism is acting in the best interests of a patient who is competent without his/her consent, i.e. acting against a sufficiently competent and informed patient’s own judgement about the best interest.[15] In the last 20 years, there has been a move away from paternalistic medicine towards models that give greater importance to patient autonomy. Alternative models include the informative (‘shared decision-making’ where doctor provides information, patient provides values), interpretative (where doctor helps patient to identify values) and deliberative (doctor deliberates with patient) models.[16]

Artificial intelligence threatens to reintroduce paternalism by failing to consider the patient’s values. Instead, it relies on the values of those who programme AI. This is particularly the case when we consider AI systems used for treatment recommendation (rather than, for instance, diagnostics). For example, IBM’s Watson for Oncology system ranks treatments according to improvements in length (not quality) of life and does not ‘encourage doctors and patients to recognize treatment decision making as value-laden at all’.[17] This argument can be extended to any AI system that, like Watson for Oncology, prioritises certain treatment recommendations over others, as doing so requires a metric (like longevity) to make that prioritisation. By picking a value or values to be maximised, AI excludes other values that may be important to the patient. This approach can be called ‘machine paternalism’.

McDougall argues for a value-flexible design, which involves creating AI capable of accommodating the patient’s values.[17] However, we need not go this far. As long as doctors understand the goals of AI and its performance according to a range of relevant values, this information can be explicitly communicated to the patient. As long as doctors involve patients in dialogue about what matters to them, and relate this to available AI, the utility and limitations of AI in assisting diagnosis or treatment for the patient can be identified. For these reasons, doctors must be involved in the translation of AI recommendations to patient-centred care.

Accommodating value pluralism and disagreement

Artificial intelligence is a tool. It must be designed to perform a task or achieve a goal. Setting goals involves aiming at something that is of value and worth achieving. In medicine, these values are often widely agreed: prolongation of life or improvement of its quality. But, as we have seen, these can be in tension and there can be value disagreements about what is quality of life and what constitutes a good life or well-being. Designers of AI (especially those designing systems that go beyond diagnostics into treatment recommendation, either directly or indirectly through patient selection as in the algorithmic high-risk care management case example) must set these values, though generative AI might create new values. But there is widespread disagreement about value, such as the use of enhancement technologies, gender-affirming care, or love drugs to alter sexual orientation. The necessity to set values for programming or the evaluation of AI invites the possibility of machine paternalism — the acceptance of a narrow, disputed or implicit set of values. In the following section, we discuss how AI can address value pluralism and disagreement.

Responsibility

Responsibility for an outcome is usually, ethically and legally, dependent on the degree of control an agent has over the outcomes and the foreseeability of the consequences. Culpable ignorance exists when an agent should have known better. One of the great conundrums facing today’s clinicians who choose to use AI is the question of who is responsible if harm occurs.[18] Legally, this is a complex question which, in some ways, is yet to be decided. Equally, in ethical terms, moral responsibility depends on whether a clinician has exercised sufficient care in the application of the AI system. The buck generally stops with the treating clinician who takes on responsibility for the patient, unless there is fault in the technology (i.e. AI does not do what it is supposed to do), which could not have been foreseen or discovered by the treating doctor.

Responsibility mandates that clinicians evaluate evidence of performance and use AI appropriately, explaining its benefits and risks, and their level of confidence in it to a particular patient. In this way, using AI in medical contexts is little different from using any other medical tool, such as drugs and devices. But it will require upskilling in education about AI. It will also require doctors to understand the values driving AI, its likely outcomes and the patient’s own values. Doctors will be as important and responsible as ever.

Blame is a function of responsibility for harm caused. Doctors will be blameworthy if they fail to satisfactorily evaluate the performance of AI; communicate its risks, benefits and alternatives, and their own rational confidence in its performance; or apply AI appropriately. The rules for designers are somewhat different. They will be responsible not only if they fail to make clear the values that AI seeks to maximise and the limits of confidence in performance for relevant, specific patient groups, but also if it is insufficiently safe, reliable and effective.

Trust

Trust is a relationship normally occurring among humans. In professional contexts, for instance, we trust doctors, lawyers and teachers. It is a form of reliance on someone considered to have adequate level of knowledge and skill (epistemic trust) and moral features (moral trust), such as good intentions and commitment to professional values. It is questionable whether we can meaningfully trust tools like AI. In some views, the appropriate relationship with tools is not one of trust, but simply one of reliability.[19] It seems trust requires a level of accountability, as to trust someone, we need to be ready to feel betrayed by that individual.[20] Thus, to trust AI is metaphorical language for trusting those who design or use it, in the same way we trust any other tool. Perhaps we can say that we trust our car, or our laptop, or an AI tool. But that is metaphorical language. Trust is directed at designers or users. As mentioned in the previous sections, they are the ones who are morally responsible and accountable.

Need for explanation and justification

Bjerring and Busch[21] note that if clinicians relying on opaque deep learning systems cannot understand why certain decisions are made, they are unable to communicate this information to the patient, which in turn does not allow the patient to make an informed decision. On the basis of these arguments, some have argued that patients have a right to refuse diagnostics and treatment planning by AI systems in favour of a human-only alternative.[22] Others have argued in favour of interpretable AI,[23] that is, AI constrained, so that a human can easily understand it, though this seems impossible with generative AI and LLMs.

Generally, the explanation of how something works is valuable, particularly in predicting the conditions under which it will work. However, it is not always necessary to know the understanding behind it to justify its use. Many medical treatments such as aspirin and statins have been employed pragmatically and without understanding how they benefit. The justification for their use lies in evidence of beneficial effect through clinical trials. We do not ask how a hammer works — rather, we ask whether a human using the hammer has done a good job.

Indeed, what matters most is justification, not explanation. What matters to patients is how something will affect their well-being and autonomy, not how it does that. In ethics, there is a deep distinction between explanatory or motivating reasons (why someone acted) and normative or justificatory reasons (whether the act was right). Justification in medicine is provided by values that patients endorse (e.g. prolongation of life or improvement of its quality) and validity conferred by adequate scientific research.

Obsolescence, dehumanisation and deskilling

One of the great fears around medical AI is that it will make humans obsolete, i.e., replace their role in medicine, and that their jobs will evaporate and be replaced by AI. A related concern is that humans will become slaves to AI — students will just employ ChatGPT to answer their essay questions or doctors will employ ChatGPT to write reports. This will result in deskilling of the medical profession. These concerns might materialise. But if humans can stay ‘in the loop’ by being educated in the science of AI and the relevant ethics, and by engaging patients in both science and ethics, they will play a pivotal role in the future of medicine. One reason to think humans will be kept in the loop is the need for accountability and trust in medicine, which, as mentioned before, cannot be fulfilled by AI. As long as accountability is in demand, so will human doctors.

Catastrophic dual use

Artificial intelligence has been used to develop novel lethal drugs,[24] and represents an existential threat. It could be used by a psychopathic or rogue actor to develop some super-lethal biological threat. Or, in a less-plausible but not impossible scenario, it could be AI itself developing to such a stage that it becomes self-aware and possesses intelligence greater than human beings. The existential risks warrant regulation, and in many ways. Humans are unfit for the future, including a future of AI.[25] This is a broader problem related to the potential for harm that new technologies allow and is not specific to AI use in medicine.

ARTIFICIAL INTELLIGENCE: USE, BENEFITS AND ETHICS

We can now discuss the key ethical benefits from the use of AI in medicine. As with the risks discussed, these benefits are not meant to be exhaustive or mutually exclusive. They indicate our distillation of the key benefits as captured in the literature as well as through our research.

Big data and better performance

Big data refers to the ability of AI to utilise vast amounts of relevant data. For example, the Cambridge breast cancer algorithm is based on nearly 1 million women.[26] In the case of LLMs, clinicians and patients have access to global best expertise. They can potentially bring together knowledge from all of humanity, and specifically trained LLMs can produce expert advice rivalling the best human doctors.[27] This reduces risks to patients and provides better medical performance in diagnosis and treatment. In this way, it fulfils two core principles of biomedical ethics, namely, non-maleficence and beneficence.

Efficiency, productivity and reduction in human resources

Artificial intelligence increases speed and power, and is able to replace many human tasks. This increases human productivity, reduces human performance of mundane tasks (e.g. LLMs could be used to write letters or patient summaries) and can provide greater level of expertise in low-resource settings such as the low- and middle-income countries. For example, medical imaging AI can assist in remote diagnosis of investigations in low-income countries.[28]

Making values explicit and weighing of values

Every action requires a goal (or a value to be maximised) and a means to achieve that goal. Artificial intelligence is a tool (like a hammer) to achieve human goals. Medicine is a value-laden practice. For example, allocation of limited life-saving resources such as ventilators, vaccines, expensive novel therapies requires balancing or weighing different values such as probability of survival, length of survival, quality of life, responsibility for illness, social utility, dependents, past injustice, age, etc.[29] Medicine is based on a principle of providing interventions which are in the best interests of patients, but this involves evaluation of patient’s well-being, which is an inherently ethical and philosophical concept surrounded by much disagreement. Such disagreements are most marked in medical decisions to withhold or withdraw life-saving treatment in young children.[30]

The performance of AI must be evaluated, and this requires articulation of values to judge performance. Artificial intelligence provides an opportunity to make the values that guide decisions explicit in the assessment, rather than leaving them implicit, unspoken or within the sole purview of medical professionals. For example, once values are made explicit, (e.g. relating to coercion in public health such as mandatory vaccination), it is possible to construct algorithms or decision trees that utilise these values in decision-making.[31] Such algorithms are capable of interrogation and revision. Similarly, in organ transplantation, AI requires that values be made more explicit than the current practice where committees meet in secret and keep their processes opaque to the public.

Medicine requires that thresholds be set for statistical significance, effectiveness and safety; these inherently involve value judgements.[32] Thresholds must similarly be set in AI. One example pertains to the acceptable levels of true and false positives and negatives; such decisions involve value judgements, and since these figures need to be made more explicit when AI is assessed, they are liable to scrutiny. In the case of DermAssist, for instance, it has been claimed that Google accepted high rates of false-positive results, particularly in dark-skinned patients where there was little data, to protect against false-negative results, such as failure to diagnose a melanoma.[33] More false-negative results lead to worry in patients, increase in use of limited health resources, and concerns about a ‘tsunami of overdiagnosis’.[33] Just as where the maximum speed limit set in different countries is a result of weighing the vectors of economic efficiency, pleasure, convenience, safety, carbon emissions, etc., so too where the thresholds of sensitivity and specificity are set involve complex valuable judgements that warrant public ethical deliberation.

Precision medicine and confidence limits

Before AI, patients were often presented with large population-level data, which may or may not be relevant to them. Artificial intelligence has an extraordinary power to provide information about groups, down to smaller and smaller groups, potentially enabling different groups to gain more precise information, such as risk data. This is a move towards precision medicine. In the case of DermAssist, as the risk predictions were less reliable for dark-skinned groups, the confidence level was lower. But as AI moves to express more explicit estimates of confidence in prediction for specific groups, this will enable more precise and informed decision-making and consent.

Improvement in doctor–patient relationship

While many worry that AI will undermine the doctor–patient relationship, and that direct-to-consumer AI such as DermAssist will cause patient worry and confusion, AI also has potential to improve the doctor–patient relationship. If doctors are properly educated in AI, they can use these tools to improve performance, reduce mundane tasks and participate with the patient in more personalised medicine. Doctors could also play a vital role in communicating the reliability of different AI tools. It will require in-depth knowledge of AI evaluation and performance, and the ability to communicate this meaningfully to patients. Because the programming and evaluation of AI require making values more explicit, this opens the possibility of more value-based dialogue with patients, for example, over whether they prioritise length or quality of life in cancer care.

Autonomy, empowerment and personalised LLMs

The process of exposing the last few layers of an LLM’s neural network to a specialised text is usually referred to as ‘fine-tuning’. As a result, the model retains its access to global knowledge, but its output is influenced by a more specialised training set.[34]

Fine-tuning could be used to produce specialised medical services. For example, LLMs could be trained on information relevant to a particular procedure, such as breast surgery. Patients could have the opportunity to interact and ‘converse’ with such models before surgery (‘ConsentGPT’). This may produce superior understanding and facilitate better informed consent than the consent obtained from time-poor, less-knowledgeable junior doctors.[35]

Fine-tuning could also be used to produce personalised models of either doctors or patients. Doctors performing research could train LLMs such as ChatGPT on their existing research, so that the model would more accurately reflect their own values, knowledge, contribution and narrative. This has been shown to be superior to the performance of untrained models and could be used as a form of co-creation to generate novel papers, grant applications or presentations.[36] This will require new norms of attribution as well as ensuring sufficient human effort, commitment, originality and contribution to warrant praise. Personalised models could be used by practising doctors to generate personalised letters, reports and diagnosis or treatment plans.

Personalised models of patients could be produced in a variety of ways: data stored in electronic health records and biobanks; responses to medical questionnaires; value-eliciting choice experiments undertaken by the individual while competent; other text produced by individuals, such as blog posts or other published writings,[36] social media activity (e.g. Facebook ‘likes’)[37] and emails.

These models could be used to create ‘ethical avatars’ of the patient, which represent the patient’s values and, to some extent, their voice. They could be used as a surrogate of the patient in cases where it is not possible to obtain a patient’s explicit consent, for instance, when the patient is unconscious and there is no advance directive.[38] Since such personalised models would also be fused with the general corpus of human thought and knowledge, the patient could consult them in normative dialogue about which course of action is best for him or her.

Personalised models can also be produced of famous, influential or important people. One model of the famous philosopher Daniel Dennett has been trained on his work. Its responses to novel questions could have been convincingly attributed to Dennett himself, on the basis of his own responses as predicted by experts.[39] Models of Aristotle, Confucius, Kant, Buddha, Jesus and Peter Singer could be produced, and patients could consult them about either prudential (including medical) and moral dilemmas (what we might call a ‘Moral Guru Model’).

In all these ways, AI offers the prospect of enhancing autonomy, enabling richer, deeper and more informed decision-making and patient empowerment.

Promoting justice

Artificial intelligence can be used to promote equality, fairness and justice. It could be used in poorer countries to advance medical decision-making and empower patients. More generally, AI requires that explicit goals be set and that performance be evaluated. Both these steps require explicit values, and justice can be one of these values. While humans, including doctors, are ‘trusted’ to make ethical decisions, they are often subject to bias and prejudice and use their own idiosyncratic conceptions of fairness. Although AI cannot be trusted in this way, it has an ethical advantage, as argued above, which means those values would need to be made explicit.

Artificial intelligence explict ethics

Consider the behaviour of cars. Humans decide on how to drive it and how to respond to threats or unexpected events. In contrast, AI can respond much faster and to a much wider range of input variables, but it must be programmed or at least evaluated on its performance. This requires explicit ethics.

Awad et al.,[40] in their Moral Machines experiment, evaluated over 40 million preferences from 233 different regions on how autonomous vehicles should distribute risks and harms. They found, in order of strength, the following public preferences: (a) human lives over animal lives; (b) more lives rather than fewer; (c) the young over the elderly; (d) law abiding over law breakers; (e) high social status over low social status; (f) healthy weight over those who are overweight; (g) females over males; (h) pedestrians over passengers; and (i) preference for the vehicle to continue in its motion over a vehicle taking evasive action. Awad et al. argue the three strongest preferences provide a basis for public policy (and programming AI).

Similarly, public health involves distribution of benefits and burdens, for example, through policies related to restriction of liberty and coercion, such as quarantine, isolation, lockdown and mandatory vaccinations. However, questions of justice are not restricted to public health in medicine. Studies have been conducted on the allocation of intensive care unit beds, ventilators and vaccines in the pandemic, involving evaluation of chance of survival, length of survival, quality of life, age, social utility, etc.[41] Ethical algorithms have been constructed for vaccination and allocation of ventilators.[42] Some widespread dilemmas involving distribution of harms and benefits are: (a) distribution of medical resources/limitation to treatment, (b) organ allocation for transplantation, (c) infectious disease (quarantine/lockdown/mandatory vaccination), (d) antimicrobial resistance, (e) road, food, environmental and other safety concerns, (f) autonomous vehicles, (g) carbon emissions and climate change, and (h) wars/drones.

How should such ethical decisions be made about the programming or evaluation of AI? Given the observed diversity of ethical risks that arise from the use of AI in medicine, a general approach to making progress against them requires fluidity and adaptability. Figure 1 and Box 2 outline the basic concepts and approaches. We argue that an effective approach needs to draw upon a diversity of conceptual and ethical tools for tackling the varied risks [Box 2], and the application of such tools needs to be grounded in an overarching methodology such as collective reflective equilibrium, as discussed below and in Figure 1. The following suggestions are a start on how we might start assembling such a flexible toolkit.

Figure 1.

Figure 1

Collective reflective equilibrium.

Collective reflective equilibrium is an approach that aims to bring both public preferences and ethical theories/concepts/ principles into maximum coherence[43] [Figure 1]. It would be a mistake for such decisions to blindly follow people’s preferences. Public views on moral questions can be prejudiced, biased or deeply mistaken, such as low support for organ donation despite a shortage of organ donation, being rightly seen as a problem to be overcome [Figure 1].[44]

CONCLUSION

In a way, AI is just another human tool. It requires humans to use it well. But it also makes ethics unavoidable — we must now face the values which drive our choices and which ground the evaluation of our actions. Arguably, AI is the greatest experiment humans have ever undertaken. It will radically transform medicine. Whether we master AI and reap its enormous potential, or become slaves to it, undermining our very humanity and that of our patients, is up to us.

Financial support and sponsorship

This research was funded in whole or in part by the Wellcome Trust [203132].

Conflicts of interest

Savulescu J is a partner investigator on Australian Research Council (grant LP190100841), which involves industry partnership from Illumina. He does not personally receive any funds from Illumina. He is also a consultant in the Bioethics Committee of Bayer and an advisory panel member for the Hevolution Foundation (2022).

REFERENCES

  • 1.Alaa AM, Gurdasani D, Harris AL, Rashbass J, van der Schaar M. Machine learning to guide the use of adjuvant therapies for breast cancer. Nat Mach Intell. 2021;3:716–26. [Google Scholar]
  • 2.Murgia M. Google launches AI health tool for skin conditions. Financial Times. 2021 [Google Scholar]
  • 3.Simonite T. Google launches a new medical app – Outside the US. Wired, June 23. 2021 [Google Scholar]
  • 4.Liu Y, Jain A, Eng C, Way DH, Lee K, Bui P, et al. A deep learning system for differential diagnosis of skin diseases. Nat Med. 2020;26:900–8. doi: 10.1038/s41591-020-0842-3. [DOI] [PubMed] [Google Scholar]
  • 5.Liu X, Faes L, Kale AU, Wagner SK, Fu DJ, Bruynseels A, et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: A systematic review and meta-analysis. Lancet Digital Health. 2019;1:e271–97. doi: 10.1016/S2589-7500(19)30123-2. [DOI] [PubMed] [Google Scholar]
  • 6.Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366:447–53. doi: 10.1126/science.aax2342. [DOI] [PubMed] [Google Scholar]
  • 7.Afnan MAM, Liu Y, Conitzer V, Rudin C, Mishra A, Savulescu J, et al. Interpretable, not black-box, artificial intelligence should be used for embryo selection. Hum Reprod Open 2021. 2021 doi: 10.1093/hropen/hoab040. hoab040. doi: 10.1093/ hropen/hoab040. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Rajkomar A, Hardt M, Howell MD, Corrado G, Chin MH. Ensuring fairness in machine learning to advance health equity. Ann Intern Med. 2018;169:866–72. doi: 10.7326/M18-1990. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Hofmann B. Broadening consent--and diluting ethics? J Med Ethics. 2009;35:125–9. doi: 10.1136/jme.2008.024851. [DOI] [PubMed] [Google Scholar]
  • 10.Ploug T, Holm S. Going beyond the false dichotomy of broad or specific consent: A meta-perspective on participant choice in research using human tissue. Am J Bioeth. 2015;15:44–6. doi: 10.1080/15265161.2015.1062178. [DOI] [PubMed] [Google Scholar]
  • 11.Kaye J, Whitley E, Lund D, Morrison M, Teare H, Melham K. Dynamic consent: A patient interface for twenty-first century research networks. Eur J Hum Genet. 2015;23:141–6. doi: 10.1038/ejhg.2014.71. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Dankar FK, El Emam K, Neisa A, Roffey T. Estimating the re-identification risk of clinical data sets. BMC Med Inform Decis Mak. 2012;12:66. doi: 10.1186/1472-6947-12-66. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Porsdam Mann S, Savulescu J, Ravaud P, Benchoufi M. Blockchain, consent and prosent for medical research. J Med Ethics. 2021;47:244–50. doi: 10.1136/medethics-2019-105963. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Wilkinson D, Herring J, Savulescu J. Medical Ethics and Law: A Curriculum for the 21st Century. Elsevier Health Sciences. 2019 [Google Scholar]
  • 15.Dworkin G. Zalta EN, editor. Paternalism. The Stanford Encyclopedia of Philosophy (Fall 2020 Edition) Available from: https://plato.stanford. edu/archives/fall2020/entries/paternalism/
  • 16.Emanuel EJ, Emanuel LL. Four models of the physician-patient relationship. JAMA. 1992;267:2221–6. [PubMed] [Google Scholar]
  • 17.McDougall RJ. Computer knows best? The need for value-flexibility in medical AI. J Med Ethics. 2019;45:156–60. doi: 10.1136/medethics-2018-105118. [DOI] [PubMed] [Google Scholar]
  • 18.Wieringa M. What to account for when accounting for algorithms: A systematic literature review on algorithmic accountability. Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (FAT*’20). New York: Association for Computing Machinery. 2020:1–18. [Google Scholar]
  • 19.Kerasidou C, Kerasidou A, Buscher M, Wilkinson S. Before and beyond trust: Reliance in medical AI. J Med Ethics. 2022;48:852–6. doi: 10.1136/medethics-2020-107095. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Holton R. Deciding to trust, coming to believe. Australas J Philos. 1994;72:63–76. [Google Scholar]
  • 21.Bjerring JC, Busch J. Artificial intelligence and patient-centred decision-making. Philos Technol. 2020;34:349–71. [Google Scholar]
  • 22.Ploug T, Holm S. The right to refuse diagnostics and treatment planning by artificial intelligence. Med Health Care Philos. 2020;23:107–14. doi: 10.1007/s11019-019-09912-8. [DOI] [PubMed] [Google Scholar]
  • 23.Afnan MAM, Liu Y, Conitzer V, Rudin C, Mishra A, Savulescu J, et al. Interpretable, not black-box, artificial intelligence should be used for embryo selection. Hum Reprod Open 2021. 2021 doi: 10.1093/hropen/hoab040. hoab040. doi: 10.1093/ hropen/hoab040. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Urbina F, Lentzos F, Invernizzi C, Ekins S. Dual use of artificial intelligence-powered drug discovery. Nat Mach Intell. 2022;4:189–91. doi: 10.1038/s42256-022-00465-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Persson I, Savulescu J. Unfit for the Future: The Need for Moral Enhancement. OUP Oxford. 2012 [Google Scholar]
  • 26.Candido dos Reis FJ, Wishart GC, Dicks EM, Greenberg D, Rashbass J, Schmidt MK, et al. An updated PREDICT breast cancer prognostication and treatment benefit prediction model with independent validation. Breast Cancer Res. 2017;19:1–13. doi: 10.1186/s13058-017-0852-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Thirunavukarasu AJ, Mahmood S, Malem A, Foster WP, Sanghera R, Hassan R, et al. Large language models approach expert-level clinical knowledge and reasoning in ophthalmology: A head-to-head cross-sectional study. medRxiv. doi: 10.1371/journal.pdig.0000341. 2023.07.30.23293474. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Mollura DJ, Culp MP, Pollack E, Battino G, Scheel JR, Mango VL, et al. Artificial intelligence in low-and middle-income countries: Innovating global health radiology. Radiology. 2020;297:513–20. doi: 10.1148/radiol.2020201434. [DOI] [PubMed] [Google Scholar]
  • 29.Wilkinson D, Zohny H, Kappes A, Sinnott-Armstrong W, Savulescu J. Which factors should be included in triage? An online survey of the attitudes of the UK general public to pandemic triage dilemmas. BMJ Open. 2020;10:e045593. doi: 10.1136/bmjopen-2020-045593. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Wilkinson D, Savulescu J. Ethics, Conflict and Medical Treatment for Children: From Disagreement to Dissensus. Elsevier. 2018:180. [PubMed] [Google Scholar]
  • 31.Savulescu J. Good Reasons to Vaccinate: Mandatory or Payment for Risk? J Med Ethics. 2021;47:78–85. doi: 10.1136/medethics-2020-106821. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Savulescu J, Hope T. Ethics of research. In: Skorupski J, editor. The Routledge Companion to Ethics. Abingdon: Routledge; 2010. pp. 781–95. [Google Scholar]
  • 33.Davey M. Doctors Fear google skin check app will lead to “tsunami of overdiagnosis.” The Guardian. 2021 [Google Scholar]
  • 34.Church KW, Chen Z, Ma Y. Emerging trends: A gentle introduction to fine-tuning. Nat Lang Eng. 2021;27:763–78. [Google Scholar]
  • 35.Allen JW, Earp BD, Koplin J, Wilkinson D. Consent-GPT: Is it ethical to delegate procedural consent to conversational AI? J Med Ethics. 2024;50:77–83. doi: 10.1136/jme-2023-109347. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Porsdam Mann S, Earp BD, Møller N, Vynn S, Savulescu J. AUTOGEN: A personalized large language model for academic enhancement —ethics and proof of principle. Am J Bioeth. 2023;23:28. doi: 10.1080/15265161.2023.2233356. [DOI] [PubMed] [Google Scholar]
  • 37.Lamanna C, Byrne L. Should artificial intelligence augment medical decision making? The case for an autonomy algorithm. AMA J Ethics. 2018;20:902–10. doi: 10.1001/amajethics.2018.902. [DOI] [PubMed] [Google Scholar]
  • 38.Earp B, Porsdam Mann S, Allen J, Salloch S, Suren V, et al. A personalized patient preference predictor for substituted judgments in healthcare: Technically feasible and ethically desirable. Am J Bioeth. 2024:1–14. doi: 10.1080/15265161.2023.2296402. doi: 10.1080/15265161.2023.2296402. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Schwitzgebel E, Schwitzgebel D, Strasser A. Creating a large language model of a philosopher. Mind Language. 2023:1–23. [Google Scholar]
  • 40.Awad E, Dsouza S, Kim R, Schulz J, Henrich J, Shariff A, et al. The moral machine experiment. Nature. 2018;563:59–64. doi: 10.1038/s41586-018-0637-6. [DOI] [PubMed] [Google Scholar]
  • 41.Wilkinson D. Pluralism and Allocation of Limited Resources: Vaccines and Ventilators. In: Savulescu J, Wilkinson D, editors. Pandemic Ethics: From COVID-19 to Disease X [Internet] Oxford (UK): Oxford University Press; 2023 Apr. [PubMed] [Google Scholar]
  • 42.Savulescu J, Vergano M, Craxì L, Wilkinson D. An ethical algorithm for rationing life-sustaining treatment during the COVID-19 pandemic. Br J Anaesth. 2020;125:253–8. doi: 10.1016/j.bja.2020.05.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Savulescu J, Gyngell C, Kahane G. Collective reflective equilibrium in practice (CREP) and controversial novel technologies. Bioethics. 2021;35:652–63. doi: 10.1111/bioe.12869. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Minerva F, Savulescu J, Singer P. The ethics of the global kidney exchange programme. Lancet. 2019;394:1775–8. doi: 10.1016/S0140-6736(19)32474-2. [DOI] [PubMed] [Google Scholar]

Articles from Singapore Medical Journal are provided here courtesy of Wolters Kluwer -- Medknow Publications

RESOURCES