Abstract
Background
Breast cancer screening is considered an effective early detection strategy. Artificial intelligence (AI) may both offer benefits and create risks for breast screening programmes. To use AI in health screening services, the views and expectations of consumers are critical. This study examined the preferences of Australian women regarding AI use in breast cancer screening and the impact of information on preferences using discrete choice experiments.
Methods
The experiment presented two alternative screening services based on seven attributes (reading method, screening sensitivity, screening specificity, time between screening and receiving results, supporting evidence, fair representation, and who should be held accountable) to 2063 women aged between 40 and 74 years recruited from an online panel. Participants were randomised into two arms. Both received standard information on AI use in breast screening, but one arm received additional information on its potential benefits. Preferences for hypothetical breast cancer screening services were modelled using a random parameter logit model. Relative attribute importance and uptake rates were estimated.
Results
Participants preferred mixed reading (radiologist + AI system) over the other two reading methods. They showed a strong preference for fewer missed cases with a high attribute relative importance. Fewer false positives and a shorter waiting time for results were also preferred. Strength of preferences for mixed reading was significantly higher compared to two radiologists when additional information on AI is provided, highlighting the impact of information.
Conclusions
This study revealed the preferences among Australian women for the use of AI-driven breast cancer screening services. Results generally suggest women are open to their mammograms being read by both a radiologist and an AI-based system under certain conditions.
Supplementary Information
The online version contains supplementary material available at 10.1007/s40271-025-00742-w.
Key Points for Decision Makers
| Women are accepting of artificial intelligence (AI) use for breast screening if a radiologist is also involved in the process. |
| Provision of specific information on the potential benefits of AI use in healthcare reduced the likelihood of opt-out of breast cancer screening. |
| Results highlight the impact of information on the benefits of AI on women’s preferences for the use of AI in breast cancer screening and reinforce findings regarding the importance of human involvement in the use of AI in healthcare. |
Introduction
Breast cancer remains a significant global health concern. In Australia, breast cancer is recorded as the most common cancer affecting women [1]. One strategy intended to reduce breast cancer-specific mortality is a biennial offer of breast screening to all eligible Australian women. Eligibility for the Australian BreastScreen programme [2] requires one to be a woman, trans or gender diverse [2] with no breast symptoms, aged 50–74. Artificial intelligence (AI) has emerged as a promising tool in the provision of breast cancer screening. Early prospective trial evidence [3] suggests that in breast screening services that double-read each mammogram, the use of AI as the second reader in combination with a radiologist produces results that are non-inferior to services that use two radiologists. While still not routinely used for health screening, a recent UK decision analytic modelling study of the cost-effectiveness of using AI for risk-stratified breast cancer screening found a net monetary benefit of US$77.3 million–US$109.2 million in introducing an optimal regimen [4]. Outside the field of breast cancer screening, other modelling studies have found the use of AI in health screening to be relatively cost-effective depending on the comparator group [5–8]. As this technology moves towards widespread implementation, the preferences of consumers regarding the implementation of AI in breast cancer screening become critically important in fostering a patient-centred and efficient approach to this form of care.
A growing number of studies explore public and patient preferences for the use of AI as a medical screening or diagnostic tool, with most studies based on survey data [9–17]. Studies show a general readiness to accept the use of AI for breast screening among women [9], and a general preference for human and AI mixed reading of mammograms [18–23]. No published studies so far have used structured choice experiments such as discrete choice experiments (DCEs) to gauge women’s preferences for AI-driven breast cancer screening services. Unlike standard surveys, DCEs allow for the estimation of the relative importance of attributes, the trade-offs made between these attributes and the total utility consumers derive from a screening service with specific attribute levels.
The main objective of this study is to gauge the preference of Australian women towards the incorporation of screen-reading AI systems into breast cancer screening services using a DCE. As a quantitative method, DCEs make it possible to establish the value individuals place on the attributes of a service. A typical DCE requires participants to indicate their preferred choice of a service, among alternatives presented to them, each with specific attributes [24]. In addition to the main objective, the study also explores the impact of information on preferences by incorporating two parallel study arms. One arm has more information on the use and benefits of AI in health screening services highlighted. This information is not included in the other study arm and is the only difference between the two arms. This is done to test the theory that general attitudes to the use of AI in healthcare vary depending on patients’ access to information [21]. By providing specific information on the benefits of AI, we ensure that all participants within this arm have access to specific information that should in theory lead to a more favourable perception of AI.
Methods
DCE Development
Identification and Selection of Attributes and Levels
An initial list of attributes and their levels were derived from a systematic literature review of research on the use of AI as a screening and diagnostic tool in healthcare [21]. This was further streamlined using information from a dialogue group discussion with 50 women aged between 50 and 74 years carried out prior to this study [12]. The list of attributes and their respective levels were then presented to seven of the project’s expert reference group members at a meeting during which issues such as the type of experiment (labelled vs unlabelled), the number of alternatives to use if labelled, whether to use an opt-out and, if so, the type of opt-out to use and which attributes to keep were discussed. This group consisted of academics from varying fields (cardiology, cancer, computer science, government, radiology and law) and a patient safety advocate. The group also discussed the definitions of the respective attributes and whether the levels presented were realistic.
The initial list of attributes included outpatient waiting time, sensitivity, specificity, accuracy, follow-up after diagnosis, time between screening and receiving results, radiologist involvement, diagnostic methods, understandability of AI decision-making, supporting evidence, diversity of training and validation data, privacy, accountability and out-of-pocket costs. Accuracy was dropped in favour of sensitivity and specificity as participants were likely to consider both separately in deciding whether to screen or not for breast cancer. Outpatient waiting time was dropped in favour of the time between screening and receiving the results as this was considered more relevant for participants. Follow-up after diagnosis was moved to the preamble and subsequently dropped as an attribute. Radiologist involvement and diagnostic method were combined into a new attribute, the reading method. Supporting evidence was kept, as was the diversity of training and validation data. Out-of-pocket costs were dropped as the BreastScreen programme in Australia is provided at no cost to the patient.
A semi-structured interview was then conducted with three individuals with varying cancer experience during the initial pretest phase of the questionnaire development to verify the clarity of the definitions and understanding of the experiment and to find out whether any important attributes were missed. One participant was a breast cancer survivor in their seventies, a second was aged below 40 with no breast cancer experience while the third was in their early forties with a family member who had experienced cancer. During this pretest, it was discovered that the introduction to the survey was quite long and complicated. This was simplified to ensure that participants could readily understand what was being asked of them. Diversity of training and validation data were redefined as fair representation with the levels changed to reflect the new definition. The final list of attributes included reading method, sensitivity (cases missed), specificity (false positives), time to results, supporting evidence, fair representation in AI training data and responsibility. A second pretest was conducted in the form of a focus group discussion, during which participants (both male and female) with varied cancer screening experiences were invited to complete the survey during an internal seminar, after which a focus group discussion was held. During this discussion, it was highlighted that the information provided on the benefits of AI use in breast screening was likely to influence results. At the end of this pretest, the study team decided to include a separate information sheet that provided all information in the main sheet excluding the information on the benefits of the use of AI in breast screening to examine the impact of the provision of such information on preferences. This led to a two-arm DCE and is further discussed below.
As stated previously, the screen-reading method, which specifies the method used to read mammograms, was included as an attribute within the DCE. In Australia, like many other countries, breast cancer screening uses a double-reading method, where two radiologists read each mammogram independently. If those two clinicians disagree, an arbitrator makes the final decision [2, 25]. We took this into account when considering the levels for the reading method attribute. We therefore used two potential AI reading methods – mixed reading (one radiologist and one AI system read each woman’s mammograms) and AI-only reading (two AI systems independently read each woman’s mammograms) – and a traditional reading method (two radiologists) as the three levels of the reading method attribute. The final list of the seven attributes, their definition and levels are presented in Table 1.
Table 1.
Attributes and levels in DCE
| Attributes | Definition | Levels based on diagnostic methods | ||
|---|---|---|---|---|
| If both radiologists | If AI + radiologist | If both AI | ||
| Reading method | The method used to read mammograms to determine the likelihood of cancer |
2 radiologists 1 radiologist + 1 AI system 2 AI systems |
||
| Screening sensitivity | Ability of the screening test to accurately identify the cancer in people who have breast cancer (expressed in percentages of cancer cases missed) |
Misses 5% of cases Misses 15% of cases Misses 25% of cases |
||
| Screening specificity | Ability of the screening test to accurately identify people who do not have breast cancer (expressed in percentages of non-cancer cases identified as potentially having cancer, also known as “false positives” |
5% of patient without specific disease receive a false positive 10% of patients without disease receive a false positive 15% of patients without disease receive a false positive |
||
| Time between screening and receiving results (in days) | The amount of time it will take before you receive the results after screening |
5 days 10 days 14 days |
2 days 5 days 10 days |
|
| Supporting evidence | The type of evidence used to validate the AI Results | N/A |
Supported by both clinical trials and real-world performance Supported by clinical trial only Supported by real world performance only |
|
| Fair representation | Representation of minority population groups within the data used to train the AI system | N/A |
Minimal minority populations groups represented in AI training data Some minority populations groups represented in AI training data All minority populations groups represented in AI training data |
|
| Who should be held accountable? | Who or what institution could be held medically or legally accountable for any misdiagnosis? |
Radiologist providing the result Healthcare facility providing the screening Government or regulatory agencies (as they regulate and approve both the mammography machines and AI tools) |
Radiologist providing the result Healthcare facility providing the screening Government or regulatory agencies (as they regulate and approve both the mammography machines and AI tools) Institutions/companies who design the AI algorithms |
Healthcare facility providing the screening Government or regulatory agencies (as they regulate and approve both the mammography machines and AI tools) Institutions/companies who design the AI algorithms |
AI artificial intelligence, DCE discrete choice experiment, N/A Not applicable
Experimental Design
The experiment was designed as an unlabelled DCE with three alternatives for screening programmes (“screening service A”, “screening service B” and an opt-out [“neither”]). In cases where participants chose the opt-out option, a subsequent forced-choice scenario was further prompted to still allow them to indicate their preference between two hypothetical screening options.
Given the possible interaction between the reading method and the other attributes, restrictions were imposed to exclude implausible combinations of attribute levels, see supplementary “Appendix A1” (see the electronic supplementary material). A balanced overlap design was adopted to create a total of 108 choice tasks1 using Sawtooth Software [26]. The choice tasks were blocked into 12 versions such that each participant answered a more manageable nine choice tasks. A sample choice task can be found in supplementary Figure A1.
Design of Scenario Information
A recent systematic review revealed that general attitudes to the use of AI in healthcare may vary depending on access to information [21]. With a lot of information already available on the use of AI in several areas of life, we tested whether the provision of specific information on the benefits of the use of AI in breast cancer screening would influence participants’ preferences. As previously mentioned, the decision to test the impact of information on preferences by incorporating two arms within the DCE was made after an internal seminar where participants were fellow health economists with varying levels of experience in conducting DCEs and in analysing breast cancer screening survey data.
Two different information sheets were created for participants, one for each arm of the experiment. The first information sheet included information on the importance of breast cancer screening and the BreastScreen programme in Australia. It also included a link to a video explaining the Australian BreastScreen programme [27]. In addition to this, general information on the potential use of AI in breast cancer screening was provided. This included information on how the current breast screen data are analysed.
The second information sheet included all information provided in the first information sheet plus additional information on the potential benefits of AI use in breast cancer screening. This included information on the number of mammograms a radiologist reads on average, the speed with which AI could perform these tasks, and the potential for AI to improve reading accuracy. The full information sheets are presented in supplementary “Appendix B” (see the electronic supplementary material). Participants were randomised across the two study arms with each arm presenting the 108 choice tasks in 12 blocks of nine tasks each with each participant within an arm facing only one of the 12 blocks within the arm.
Questionnaire
In addition to the blocks of DCE choice sets, we also included questions collecting information on socio-demographic characteristics such as age, geographical location (state or territory), residential postcode, education, employment and socio-economic status. We also collected information on general health and history related to cancer and cancer screening, including but not limited to breast cancer.
Study Population
The online survey was hosted on the Qualtrics survey platform and administered by the online panel company, Dynata. Eligible participants for the study were women aged between 40 and 74 years who were able to easily understand written English and had lived in Australia for more than 1 year. Our age group eligibility was based on the eligibility requirement for the Australian BreastScreen programme [2]. Furthermore, women aged 40–49 years were also included in our sample to gauge their preferences given that, while eligible to screen, they are not actively invited for routine screening as part of the Australian BreastScreen programme. We also focused on individuals who had been in Australia for at least 1 year as they were more likely to be familiar with the Australian healthcare system.
Several methods and rules of thumb have been used in the calculation of sample sizes for DCEs [28]. For this study, we relied on two commonly adopted rules of thumb, Lancsar and Louviere (2008) [29] and Orme (1998) [30], to calculate the sample size. Based on Orme (1998) [30], a sample size of approximately 111 was needed for each block. Lancsar and Louviere (2008) [29], on the other hand, estimated that a minimum of 20 respondents were needed for each block, with studies often using values ranging from 50 to 100 per block. We settled on a minimum sample size of 80 participants per block (that is 960 participants per study arm).
Accounting for the potential of having invalid responses, we aimed to recruit a total of 2500 participants (1250 per information arm). Quota sampling based on age, gender and state was used to ensure the sampled population was reflective of the Australian female population in the age group of interest. The participants were recruited between June and September 2022. The Qualtrics survey was set up to flag bots with responses flagged as bots excluded from the study. This study received ethics approval from the Monash University Human Research Ethics Committee (approval date: 24/03/2022, approval number: 31413).
Statistical Analysis
Differences in summary statistics across the two study arms were examined. Two models were considered for analysis, the multinomial logit (MNL) and the random parameter logit (RPL) models [31]. Relative model performance was evaluated using the Bayesian Information Criterion (BIC) and the Consistent Akaike Information Criterion (CAIC) (supplementary “Appendix C”; see the electronic supplementary material). All our attributes were dummy coded as they were categorical with an alternative specific constant (opt-out) included to reflect the design setting that participants could opt out of breast cancer screening.
To determine whether the data from the two information arms could be pooled together, a homogeneity test, the maximum likelihood equivalent of a Chow test, was carried out. We estimated relative attribute importance based on the range between the most preferred level and the least preferred level of an attribute, normalised over all attributes by dividing each attribute’s range by the total sum of the ranges of attribute preferences.
Goodness of fit results and the results of the homogeneity tests are presented in supplementary Table C1. Homogeneity test results across the basic MNL and RPL models indicate the data could be pooled together. Based on the goodness of fit measures (BIC and CAIC), the RPL was chosen for our analysis. We report results from the pooled data RPL model. Results for the MNL model are presented in supplementary Table C3.
To estimate the uptake rates of breast cancer screening in different scenarios, we established a baseline set of attribute levels using the current research data [32–34] for the cases where two radiologists are involved in breast cancer screening. We used similar levels for the comparator of either an AI system and a radiologist or two AI systems (see supplementary “Appendix D” for an explanation of the estimation method and supplementary Table D1 for the base-case scenario’s attribute levels).
We discuss possible differences across study arms in the preference heterogeneity section. To test whether preferences for reading methods vary with respect to sensitivity and specificity, we interacted the reading method attribute with both the sensitivity and specificity attributes. To understand the role played by individual characteristics including experience with cancer and cancer screening experience, we interacted these characteristics with the attributes related to the method of reading mammograms.
Descriptive statistics were conducted in Stata 15. All non-descriptive analyses were carried out in NLOGIT 6. We followed the DIRECT checklist for reporting DCE results (supplementary “Appendix A3”) [35].
A number of methods were adopted in order to maintain the quality of the responses. To test for rational decision making, a dominance test was conducted where a DCE task with one clearly better alternative option (with all attributes taking the more favourable value) was included. Based on rationality, participants would have to select the better alternative. Participants who failed the dominance test by selecting the worse alternative were again presented with the same test with alternatives switched around. Those failing the second time were excluded from the analysis.
Additionally, the time taken to complete the survey was recorded based on which speeders were excluded from the analysis. A speeder was defined as a participant completing the survey in less than a third of the median survey completion time of all valid participants.
Results
Participant Characteristics
A total of 2166 participants consented to participate and completed the survey. They were randomly assigned to two arms. Among those who submitted their responses, 1106 participants received information sheet 1 (arm 1) and 1060 received information sheet 2 (arm 2). Of the total participants, 34 were classified as speeders and an additional 69 failed the dominance tests a second time and were therefore excluded from the analysis, leaving a total of 2063 participants.
Of the 2063 participants, 1055 were in the first study arm and 1008 in the second study arm (Table 2 and supplementary appendix Table C2—see the electronic supplementary material). There was no statistical difference in the population of both arms. The participants for each group were nationally representative based on age and state or territory of residence. A little over 30% of participants were in the 40- to 49-year-old group2, a group that has not yet received an invitation from the Australian BreastScreen programme, while 230 (11.2%) were aged 70–74 years3. Approximately 42% had a university degree or other tertiary education4, 1437 (69.7%) had been invited to screen for breast cancer at least once, 1495 (72.5%) had been screened for breast cancer5 at least once, and 122 (5.9%) had previously been diagnosed with breast cancer. Over 71% of participants lived in major cities.
Table 2.
Characteristics of participants
| Pooled sample (n = 2063) n (%) | Info sheet 1 (n = 1055) n (%) | Info sheet 2 (n = 1008) n (%) | Pearson χ2 | |
|---|---|---|---|---|
| Age group | ||||
| 40–44 | 324 (15.71%) | 161 (15.26%) | 163 (16.17%) | 7.949 |
| 45–49 | 323 (15.66%) | 168 (15.92%) | 155 (15.38%) | |
| 50–54 | 319 (15.46%) | 151 (14.31%) | 168 (16.67%) | |
| 55–59 | 306 (14.83%) | 156 (14.79%) | 150 (14.88%) | |
| 60–64 | 297 (14.40%) | 170 (16.11%) | 127 (12.60%) | |
| 65–69 | 264 (12.80%) | 127 (12.04%) | 137 (13.59%) | |
| 70–74 | 230 (11.15%) | 122 (11.56%) | 108 (10.71%) | |
| Highest educationa | ||||
| At most high school | 697 (33.79%) | 352 (33.36%) | 345 (34.23%) | 0.585 |
| Trade/trade certificate | 496 (24.04%) | 261 (24.74%) | 235 (23.31%) | |
| University/tertiary institute degree | 858 (41.59%) | 436 (41.33%) | 422 (41.87%) | |
| Financial situation | ||||
| Find it a strain to get by every week | 302 (14.64%) | 158 (14.98%) | 144 (14.29%) | 0.417 |
| Have to be careful with money | 837 (40.57%) | 431 (40.85%) | 406 (40.28%) | |
| Able to manage without much difficulty | 553 (26.81%) | 280 (26.54%) | 273 (27.08%) | |
| Quite/very comfortably off | 371 (17.98%) | 186 (17.63%) | 185 (18.35%) | |
| General health—self-reported | ||||
| Excellent | 126 (6.11%) | 64 (6.07%) | 62 (6.15%) | 0.627 |
| Very good | 579 (28.07%) | 296 (28.06%) | 283 (28.08%) | |
| Good | 815 (39.51%) | 410 (38.86%) | 405 (40.18%) | |
| Fair | 461 (22.35%) | 242 (22.94%) | 219 (21.73%) | |
| Very poor | 82 (3.97%) | 43 (4.08%) | 39 (3.87%) | |
| Long-term health condition | ||||
| No | 1287 (62.38%) | 646 (61.23%) | 641 (63.59%) | 0.644 |
| Yes | 697 (33.79%) | 363 (34.41%) | 334 (33.13%) | |
| Remotenessb | ||||
| Major cities | 1485 (71.98%) | 777 (73.65%) | 708 (70.24%) | 3.111 |
| Inner regional | 393 (19.05%) | 191 (18.10%) | 202 (20.04%) | |
| Outer regional/(very) remote | 161 (7.80%) | 85 (8.06%) | 96 (9.52%) | |
| States and territories | ||||
| Australian Capital Territory (ACT) | 47 (2.28%) | 26 (2.46%) | 21 (2.08%) | 9.819 |
| Northern Territory (NT) | 25 (1.21%) | 9 (0.85%) | 16 (1.59%) | |
| New South Wales (NSW) | 643 (31.17%) | 334 (31.66%) | 309 (30.65%) | |
| Queensland (QLD) | 416 (20.16%) | 215 (20.38%) | 201 (19.94%) | |
| South Australia (SA) | 146 (7.08%) | 80 (7.58%) | 66 (6.55%) | |
| Tasmania (TAS) | 52 (2.52%) | 23 (2.18%) | 29 (2.88%) | |
| Victoria (VIC) | 511 (24.77%) | 270 (25.59%) | 241 (23.91%) | |
| Western Australia (WA) | 223 (10.81%) | 98 (9.29%) | 125 (12.40%) | |
aNot everyone answered this question so percentages will not sum up to 100%
In terms of general health, the majority of participants rated their health as either good (39.5%), very good (28.1%) or excellent (6.1%). One-third (33.8%) of participants indicated having a long-term health condition.
Preferences and Uptake Rates
Preferences for AI Use in Breast Cancer Screening
From the pooled data6, we found that participants generally preferred a service that included one radiologist and one AI system (0.51; 95% CI 0.40–0.62) relative to one that used two radiologists to read mammograms. No statistically significant difference was observed between the preference for two AI systems as opposed to one with two radiologists (see Fig. 1; Table 3).
Fig. 1.
Attribute coefficients (constructed from Table 3). AI artificial intelligence, ASC Alternative Specific Constant
Table 3.
Preferences for AI use in screening for breast cancer*
| Variable | Estimate (95% CI) | Std. Err. | Std. Dev. (95% CI) | Std. Err. |
|---|---|---|---|---|
| Reading method (ref. 2 radiologists) | ||||
| 2 AI systems | 0.056 (− 0.084 to 0.196) | 0.071 | 0.776a (0.666 to 0.885) | 0.056 |
| 1 radiologist and 1 AI system | 0.510a (0.403 to 0.616) | 0.054 | 0.352a (0.155 to 0.548) | 0.100 |
| Percentage of cases missed (sensitivity) (ref. 5% missed) | ||||
| 15% cases missed | − 1.567a (− 1.643 to − 1.490) | 0.039 | 0.165 (−0.054 to 0.384) | 0.112 |
| 25% cases missed | − 3.445a (− 3.601 to − 3.288) | 0.080 | 1.655a (1.519 to 1.791) | 0.069 |
| Percentage of false positives (specificity) (ref. 5% false positives) | ||||
| 10% false positives | − 0.409a (− 0.481 to − 0.336) | 0.037 | 0.206c (−0.006 to 0.418) | 0.108 |
| 15% false positives | − 0.835a (− 0.913 to − 0.756) | 0.040 | 0.551a (0.422 to 0.680) | 0.066 |
| Time from screening to receiving results (ref. 14 days) | ||||
| 2 days | 0.386a (0.250 to 0.527) | 0.071 | 0.212 (−0.140 to 0.565) | 0.180 |
| 5 days | 0.299a (0.206 to 0.391) | 0.047 | 0.159 (−0.098 to 0.415) | 0.131 |
| 10 days | 0.192a (0.102 to 0.281) | 0.046 | 0.033 (−0.465 to 0.532) | 0.254 |
| Supporting evidence (ref. both clinical trials and real-world performance data) | ||||
| Clinical trials only | − 0.121a (− 0.207 to − 0.035) | 0.044 | 0.047 (−0.413 to 0.506) | 0.234 |
| Real-world performance only | − 0.035 (− 0.120 to 0.049) | 0.043 | 0.041 (−0.235 to 0.317) | 0.141 |
| Fair representation (ref. no minority represented) | ||||
| Some minority populations represented | − 0.014 (− 0.104 to 0.076) | 0.046 | 0.026 (−0.291 to 0.344) | 0.162 |
| All minority populations represented | 0.051 (− 0.046 to 0.148) | 0.049 | 0.438a (0.264 to 0.611) | 0.089 |
| Responsibility (ref. government/regulatory agencies) | ||||
| Radiologist | − 0.075 (− 0.164 to 0.015) | 0.046 | 0.365a (0.151 to 0.579) | 0.109 |
| Health facility | 0.067 (− 0.017 to 0.151) | 0.043 | 0.093 (−0.195 to 0.381) | 0.147 |
| AI development agency | − 0.304a (− 0.392 to − 0.215) | 0.045 | 0.282b (0.009 to 0.554) | 0.139 |
| Opt-out ASC | − 4.121a (− 4.337 to − 3.905) | 0.110 | 2.629a (2.449 to 2.809) | 0.092 |
| Number of observations | 18,567 | |||
| Number of participants | 2063 | |||
| AIC | 25,172.9 | |||
| BIC | 25,439.1 | |||
| CAIC | 25,473.1 | |||
| Log likelihood | − 12,552.5 | |||
| K | 34 | |||
AI artificial intelligence, AIC Akaike Information Criterion, ASC Alternative Specific Constant, BIC Bayesian Information Criterion, CAIC Consistent Akaike Information Criterion, CI confidence interval, Err. error, Dev. deviation, normal distribution used for all random coefficients, K degrees of freedom, Std. standard
*Results from pooled data based on homogeneity test results (supplementary Table C1; see the electronic supplementary material)
aP value < 0.01
bP value < 0.05
cP value < 0
On average, participants preferred to have fewer missed cases (i.e. high sensitivity), fewer false positives (high specificity) and a shorter waiting time between screening and receipt of results. They also preferred the use of both clinical trial and real-world evidence in developing the AI systems, as opposed to only clinical trial data (− 0.12; 95% CI − 0.21 to − 0.04). There was no significant difference in whether the source of evidence was from both real-world and clinical trial data or real-world evidence only. Fair representation was found to be insignificant. Participants had a significantly lower preference for holding the AI development agency/institutions responsible compared to the government, for any diagnostic errors (− 0.30; 95% CI − 0.39 to − 0.22). Participants were less likely to opt out (− 4.12; 95% CI − 4.34 to − 3.91) of screening.
Forced choice results in which the option of opt-out was not considered are presented in supplementary appendix Table C4 (see the electronic supplementary material). We find generally very similar results to the opt-out model. However, we find that the sign for the two AI systems for the reading method attribute becomes negative though still insignificant (− 0.01; 95% CI − 0.15 to 0.14). We also find that at the 10% significance level, participants generally prefer to have all minority populations represented as opposed to having no minority populations represented (1.10; 95% CI − 0.002 to 0.21).
Results of the relative attribute importance are presented in supplementary Figure D1, with uptake rate results presented in supplementary Figure D2 of supplementary “Appendix D”. Put briefly, we found that the reading method was the third most important attribute, after sensitivity and specificity. AI uptake rates were highest when the service being offered was the use of AI together with a radiologist rather than two AI systems.
Investigating Heterogeneity in Preferences for AI Use7
How do Preferences for Reading Methods Vary Based on Service Sensitivity and Specificity?
For cases where the reading method was two AI systems, participants preferred better specificity, that is a lower chance of false positives (− 0.29 [95% CI − 0.50 to − 0.08] for 15% false positives compared to 5% false positives). In the case where the reading method was one radiologist and one AI system, participants preferred a high level of both sensitivity (lower percentages of cases missed) (− 0.33 [95% CI − 0.59 to − 0.08] for 25% missed cases compared to 5% missed cases) and specificity (− 0.57 [95% CI − 0.80 to 0.33] for 15% false positives compared to 5% false positives). The results are presented in Table 4.
Table 4.
Heterogeneity in preferences–attribute interactions
| Variable | Estimate (95% CI) | Std. Err. | Std. Dev. (95% CI) | Std. Err. |
|---|---|---|---|---|
| Reading method (ref. 2 radiologists) | ||||
| 2 AI systems | 0.345a (0.113 to 0.594) | 0.123 | 0.807a (0.697 to 0.916) | 0.056 |
| 1 radiologist and 1 AI system | 0.675a (0.464 to 0.886) | 0.108 | 0.255c (−0.005 to 0.516) | 0.133 |
| Percentage of cases missed (sensitivity) (ref. 5% missed) | ||||
| 15% cases missed | − 1.630a (− 1.785 to − 1.475) | 0.079 | 0.145 (−0.084 to 0.375) | 0.117 |
| 25% cases missed | − 3.339a (− 3.551 to − 3.126) | 0.108 | 1.682a (1.542 to 1.823) | 0.072 |
| Percentage of false positives (specificity) (ref. 5% false positives) | ||||
| 10% false positives | − 0.337a (− 0.496 to − 0.178) | 0.043 | 0.182c (−0.024 to 0.388) | 0.105 |
| 15% false positives | − 0.524a (− 0.688 to − 0.360) | 0.044 | 0.511a (0.359 to 0.663) | 0.078 |
| Time from screening to receiving results (ref. 14 days) | ||||
| 2 days | 0.319a (0.172 to 0.465) | 0.075 | 0.123 (−0.433 to 0.687) | 0.283 |
| 5 days | 0.277a (0.179 to 0.376) | 0.050 | 0.172 (−0.081 to 0.424) | 0.129 |
| 10 days | 0.163a (0.065 to 0.262) | 0.050 | 0.106 (−0.218 to 0.431) | 0.166 |
| Supporting evidence (ref. both clinical trials and real-world performance data) | ||||
| Clinical trials only | − 0.130a (− 0.221 to − 0.040) | 0.050 | 0.241c (−0.015 to 0.497) | 0.131 |
| Real-world performance only | − 0.025 (− 0.115 to 0.066) | 0.050 | 0.050 (−0.271 to 0.370) | 0.163 |
| Fair representation (ref. no minority represented) | ||||
| Some minority populations represented | − 0.041 (− 0.136 to 0.055) | 0.049 | 0.082 (−0.220 to 0.384) | 0.154 |
| All minority populations represented | 0.052 (− 0.048 to 0.152) | 0.051 | 0.443a (0.265 to 0.621) | 0.091 |
| Responsibility (ref. government/regulatory agencies) | ||||
| Radiologist | − 0.033 (− 0.129 to 0.062) | 0.049 | 0.345a (0.124 to 0.566) | 0.113 |
| Health facility | 0.100b (0.013 to 0.187) | 0.044 | 0.024 (−0.274 to 0.322) | 0.152 |
| AI development agency | − 0.309a (− 0.401 to − 0.216) | 0.047 | 0.329a (0.090 to 0.568) | 0.122 |
| Opt-out ASC | − 3.998a (− 4.255 to − 3.741) | 0.131 | 2.641a (2.465 to 2.817) | 0.090 |
| Interaction with 2 AI systems | ||||
| Percentage of cases missed (sensitivity) (ref. 5% missed) | ||||
| 15% cases missed | − 0.018 (− 0.235 to 0.199) | 0.111 | 0.151 (−0.127 to 0.428) | 0.141 |
| 25% cases missed | − 0.126 (− 0.363 to 0.111) | 0.121 | 0.197 (−0.255 to 0.649) | 0.230 |
| Percentage of false positives (specificity) (ref. 5% false positives) | ||||
| 10% false positives | − 0.059 (− 0.282 to 0.164) | 0.114 | 0.001 (−0.307 to 0.309) | 0.157 |
| 15% false positives | − 0.291a (− 0.503 to − 0.080) | 0.108 | 0.519a (0.256 to 0.781) | 0.134 |
| Interaction with AI systems and radiologist | ||||
| Percentage of cases missed (sensitivity) (ref. 5% missed) | ||||
| 15% cases missed | 0.213c (− 0.007 to 0.433) | 0.112 | 0.057 (−0.269 to 0.383) | 0.166 |
| 25% cases missed | − 0.333a (− 0.588 to − 0.078) | 0.130 | 0.801a (0.390 to 1.211) | 0.209 |
| Percentage of false positives (specificity) (ref. 5% false positives) | ||||
| 10% false positives | − 0.243b (− 0.465 to − 0.022) | 0.113 | 0.063 (−0.423 to 0.549) | 0.248 |
| 15% false positives | − 0.567a (− 0.802 to − 0.332) | 0.120 | 0.439a (0.086 to 0.791) | 0.180 |
| Number of observations | 18,567 | |||
| Number of participants | 2063 | |||
| AIC | 25,138.9 | |||
| BIC | 25,530.3 | |||
| CAIC | 25,580.3 | |||
| Log likelihood | − 12,519.4 | |||
| K | 50 | |||
AI artificial intelligence, AIC Akaike Information Criterion, ASC Alternative Specific Constant, BIC Bayesian Information Criterion, CAIC Consistent Akaike Information Criterion, CI confidence interval, Err. error, Dev. deviation, normal distribution used for all random coefficients, K degrees of freedom, Std. standard
aP value < 0.01
bP value < 0.05
cP value < 0
Does Information on AI Use in Breast Cancer Screening alter Preferences?
The overall direction of preference results does not change when we include interactions with the information presented, except for the coefficient for clinical trial support evidence, which becomes insignificant. The results are presented in Table 5.
Table 5.
Impact of information on preferences for AI use in screening for breast cancer
| Variable | Estimate (95% CI) | Std. Err. | Standard deviation (95% CI) | Std. Err. |
|---|---|---|---|---|
| Reading method (ref. 2 radiologists) | ||||
| 2 AI systems | 0.011 (− 0.181 to 0.204) | 0.098 | 0.808a (0.703 to 0.914) | 0.054 |
| 1 radiologist and 1 AI system | 0.381a (0.237 to 0.525) | 0.073 | 0.286b (0.057 to 0.516) | 0.117 |
| Percentage of cases missed (sensitivity) (ref. 5% missed) | ||||
| 15% cases missed | − 1.589a (− 1.691 to − 1.487) | 0.052 | 0.096 (−0.151 to 0.343) | 0.126 |
| 25% cases missed | − 3.305a (− 3.503 to − 3.107) | 0.101 | 1.493a (1.331 to 1.656) | 0.083 |
| Percentage of false positives (specificity) (ref. 5% false positives) | ||||
| 10% false positives | − 0.389a (− 0.488 to − 0.290) | 0.050 | 0.023 (−0.211 to 0.257) | 0.119 |
| 15% false positives | − 0.911a (− 1.020 to − 0.803) | 0.055 | 0.498a (0.343 to 0.653) | 0.079 |
| Time from screening to receiving results (ref. 14 days) | ||||
| 2 days | 0.326a (0.138 to 0.513) | 0.096 | 0.166 (−0.177 to 0.449) | 0.144 |
| 5 days | 0.310a (0.183 to 0.436) | 0.065 | 0.096 (−0.142 to 0.333) | 0.121 |
| 10 days | 0.136b (0.013 to 0.258) | 0.062 | 0.073 (−0.241 to 0.388) | 0.161 |
| Supporting evidence (ref. both clinical trials and real-world performance data) | ||||
| Clinical trials only | − 0.085 (− 0.202 to 0.032) | 0.060 | 0.058 (−0.272 to 0.387) | 0.168 |
| Real-world performance only | − 0.071 (− 0.186 to 0.044) | 0.059 | 0.153 (−0.058 to 0.364) | 0.108 |
| Fair representation (ref. no minority represented) | ||||
| Some minority populations represented | 0.040 (− 0.083 to 0.164) | 0.063 | 0.151 (−0.096 to 0.129) | 0.126 |
| All minority populations represented | 0.143b (0.011 to 0.276) | 0.067 | 0.346a (0.091 to 0.600) | 0.130 |
| Responsibility (ref. government/regulatory agencies) | ||||
| Radiologist | − 0.076 (− 0.197 to 0.045) | 0.062 | 0.290b (0.026 to 0.555) | 0.135 |
| Health facility | 0.024 (− 0.091 to 0.139) | 0.059 | 0.104 (−0.145 to 0.353) | 0.127 |
| AI development agency | − 0.336a (− 0.454 to − 0.217) | 0.060 | 0.220 (−0.095 to 0.535) | 0.161 |
| Opt-out ASC | − 3.993a (− 4.270 to − 3.716) | 0.141 | 2.626a (2.431 to 2.822) | 0.100 |
| Interaction with info sheet 2 | ||||
| Reading method (ref. 2 radiologists) | ||||
| 2 AI systems | 0.108 (− 0.178 to 0.394) | 0.146 | 0.006 (−0.333 to 0.344) | 0.173 |
| 1 radiologist and 1 AI system | 0.315a (0.097 to 0.532) | 0.111 | 0.416a (0.133 to 0.698) | 0.144 |
| Percentage of cases missed (sensitivity) (ref. 5% missed) | ||||
| 15% cases missed | − 0.025 (− 0.173 to 0.122) | 0.075 | 0.003 (−0.441 to 0.447) | 0.236 |
| 25% cases missed | − 0.481a (− 0.782 to − 0.179) | 0.154 | 1.208a (0.834 to 1.582) | 0.197 |
| Percentage of false positives (specificity) (ref. 5% false positives) | ||||
| 10% false positives | − 0.060 (− 0.210 to 0.089) | 0.076 | 0.364b (0.064 to 0.665) | 0.153 |
| 15% false positives | 0.116 (− 0.041 to 0.274) | 0.080 | 0.388a (0.117 to 0.659) | 0.138 |
| Time from screening to receiving results (ref. 14 days) | ||||
| 2 days | 0.132 (− 0.151 to 0.416) | 0.144 | 0.034 (−0.359 to 0.426) | 0.200 |
| 5 days | − 0.031 (− 0.220 to 0.157) | 0.096 | 0.272 (−0.055 to 0.600) | 0.167 |
| 10 days | 0.124 (− 0.059 to 0.308) | 0.094 | 0.277c (−0.006 to 0.560) | 0.144 |
| Supporting evidence (ref. both clinical trials and real-world performance data) | ||||
| Clinical trials only | − 0.087 (− 0.264 to 0.090) | 0.090 | 0.410a (0.107 to 0.713) | 0.155 |
| Real-world performance only | 0.054 (− 0.119 to 0.226) | 0.088 | 0.065 (−0.339 to 0.469) | 0.206 |
| Fair representation (ref. no minority represented) | ||||
| Some minority populations represented | − 0.123 (− 0.308 to 0.062) | 0.094 | 0.158 (−0.277 to 0.593) | 0.222 |
| All minority populations represented | − 0.208b (− 0.407 to − 0.009) | 0.102 | 0.479a (0.157 to 0.802) | 0.164 |
| Responsibility (ref. government/regulatory agencies) | ||||
| Radiologist | − 0.005 (− 0.186 to 0.176) | 0.092 | 0.272 (−0.122 to 0.667) | 0.201 |
| Health facility | 0.105 (− 0.068 to 0.278) | 0.088 | 0.141 (−0.272 to 0.553) | 0.210 |
| AI development agency | 0.061 (− 0.119 to 0.241) | 0.092 | 0.482a (0.209 to 0.755) | 0.139 |
| Opt-out ASC | − 0.266 (− 0.649 to 0.177) | 0.195 | 0.463 (−0.380 to 1.305) | 0.430 |
| Number of observations | 18,567 | |||
| Number of participants | 2063 | |||
| AIC | 29,077.7 | |||
| BIC | 29,343.9 | |||
| CAIC | 29,377.9 | |||
| Log likelihood | − 14,504.85616 | |||
| K | 34 | |||
AI artificial intelligence, AIC Akaike Information Criterion, ASC Alternative Specific Constant, BIC Bayesian Information Criterion, CAIC Consistent Akaike Information Criterion, CI confidence interval, Err. error, Dev. deviation, normal distribution used for all random coefficients, K degrees of freedom, Std. standard
aP value < 0.01
bP value < 0.05
cP value < 0
We found that the strength of preferences for the combination of AI and radiologist as compared to two radiologists was significantly higher amongst participants receiving additional information on the use and benefits of AI for screening (i.e. study arm 2). No significant effect was found for two AI systems as opposed to two radiologists for the reading of mammograms. In addition to the reading method, we found some significant impact of receiving additional information (study arm 2), on the percentage of cases missed and on fair representation. Participants in study arm 2 had a significantly lower preference for screening services with 25% cases missed as opposed to 5% cases missed (− 0.48; 95% CI − 0.78 to − 0.18) and a lower preference for screening services that used AI systems that had all minority populations represented (− 0.21; 95% CI − 0.41 to − 0.01).
Does Experience with Cancer and Cancer Screening Impact Reading Method and Screening Preferences?
While there were no significant results for the case of ever being diagnosed with cancer, we found that those who had ever been screened for cancer were less likely to opt out of screening irrespective of the method being used (− 0.46; 95% CI − 0.81 to − 0.12). The results are presented in Table 6.
Table 6.
Impact of individual characteristics on preferences
| Variable | Estimate (95% CI) | Std. Err. | Std. Dev. (95% CI) | Std. Err. |
|---|---|---|---|---|
| Reading method (ref. 2 radiologists) | ||||
| 2 AI systems | 0.160c (− 0.021 to 0.341) | 0.092 | 0.477a (0.250 to 0.704) | 0.116 |
| 1 radiologist and 1 AI system | 0.448a (0.291 to 0.606) | 0.080 | 0.367a (0.193 to 0.541) | 0.089 |
| Percentage of cases missed (sensitivity) (ref. 5% missed) | ||||
| 15% cases missed | − 1.570a (− 1.646 to − 1.494) | 0.039 | 0.032 (− 0.295 to 0.359) | 0.167 |
| 25% cases missed | − 3.443a (− 3.595 to − 3.291) | 0.078 | 1.673a (1.544 to 1.802) | 0.066 |
| Percentage of false positives (specificity) (ref. 5% false positives) | ||||
| 10% false positives | − 0.410a (− 0.482 to − 0.337) | 0.037 | 0.094 (− 0.229 to 0.416) | 0.165 |
| 15% false positives | − 0.842a (− 0.920 to − 0.763) | 0.040 | 0.569a (0.445 to 0.693) | 0.063 |
| Time from screening to receiving results (ref. 14 days) | ||||
| 2 days | 0.390a (0.252 to 0.529) | 0.071 | 0.368a (0.141 to 0.595) | 0.116 |
| 5 days | 0.306a (0.214 to 0.399) | 0.047 | 0.155 (− 0.128 to 0.439) | 0.145 |
| 10 days | 0.194a (0.104 to 0.283) | 0.046 | 0.075 (− 0.322 to 0.472) | 0.203 |
| Supporting evidence (ref. both clinical trials and real-world performance data) | ||||
| Clinical trials only | − 0.121a (− 0.206 to − 0.035) | 0.044 | 0.140 (− 0.209 to 0.489) | 0.178 |
| Real-world performance only | − 0.034 (− 0.118 to 0.050) | 0.043 | 0.001 (− 0.265 to 0.267) | 0.136 |
| Fair representation (ref. no minority represented) | ||||
| Some minority populations represented | − 0.017 (− 0.107 to 0.073) | 0.046 | 0.126 (− 0.139 to 0.392) | 0.135 |
| All minority populations represented | 0.054 (− 0.043 to 0.150) | 0.049 | 0.413a (0.222 to 0.604) | 0.097 |
| Responsibility (ref. government/regulatory agencies) | ||||
| Radiologist | − 0.079c (− 0.169 to 0.010) | 0.046 | 0.400a (0.197 to 0.604) | 0.104 |
| Health facility | 0.067 (− 0.017 to 0.152) | 0.043 | 0.017 (− 0.254 to 0.288) | 0.138 |
| AI development agency | − 0.302a (− 0.390 to − 0.214) | 0.045 | 0.278c (− 0.029 to 0.586) | 0.157 |
| Opt-out ASC | − 3.785a (− 4.097 to − 3.472) | 0.160 | 2.493a (2.298 to 2.689) | 0.100 |
| Interaction with ever diagnosed with breast cancer | ||||
| Reading method (ref. 2 radiologists) | ||||
| 2 AI systems | 0.082 (− 0.257 to 0.420) | 0.439 | 0.075 (− 0.904 to 1.055) | 0.500 |
| 1 radiologist and 1 AI system | − 0.136 (− 0.448 to 0.176) | 0.173 | 0.239 (− 0.513 to 0.991) | 0.384 |
| Opt-out ASC | − 0.412 (− 1.272 to 0.448) | 0.159 | 0.452 (− 1.235 to 2.139) | 0.861 |
| Interaction with ever screened for breast cancer | ||||
| Reading method (ref. 2 radiologists) | ||||
| 2 AI systems | − 0.133 (− 0.302 to 0.036) | 0.086 | 0.677a (0.495 to 0.859) | 0.093 |
| 1 radiologist and 1 AI system | 0.091 (− 0.072 to 0.254) | 0.083 | 0.089 (− 0.213 to 0.391) | 0.154 |
| Opt-out ASC | − 0.464a (− 0.808 to − 0.120) | 0.175 | 0.983a (0.564 to 1.402) | 0.214 |
| Number of observations | 18,567 | |||
| Number of participants | 2063 | |||
| AIC | 25,184.5 | |||
| BIC | 25,544.6 | |||
| CAIC | 25,590.6 | |||
| Log likelihood | − 12,546.24308 | |||
| K | 46 | |||
AI artificial intelligence, AIC Akaike Information Criterion, BIC Bayesian Information Criterion, CAIC Consistent Akaike Information Criterion, CI confidence interval, Err. error, Dev. deviation, normal distribution used for all random coefficients, K degrees of freedom, Std. standard
aP value < 0.01
cP value < 0
Discussion
This paper presents the findings of one of the first DCE surveys in Australia to explore the preferences and attitudes of women towards the use of AI in breast cancer screening and to explore key factors that influence decision-making in the use or not of AI-driven technologies for breast screening services. By investigating the intersection of information and lived experience, our research aims to provide critical insights for policymakers, healthcare professionals, screening programme managers and technology developers to shape the future of breast cancer screening in Australia.
We found a general preference for the use of AI in screening for breast cancer when this service also involves a radiologist (one radiologist plus one AI system). This finding aligns with the literature on the acceptance of AI as an emerging technology, which finds that human involvement in the use of AI for mammography reading is preferred, and AI use is acceptable, but only when certain conditions are met [11, 13, 36–38]. The study also provides insights into how this potentially impacts breast cancer screening uptake.
We also found a general preference for not holding AI development companies responsible for errors compared to holding regulatory agencies responsible. Survey studies have generally found a disinclination to solely hold AI-developing companies responsible for errors. Ongena et al. [11], in their Dutch population survey study, found that a large proportion of their participants neither agreed nor disagreed with holding either the radiologists or the AI-developing companies responsible, contradicting the results of Pesapane et al. [13], who, in the Italian study, found that participants preferred to hold both groups responsible. However, it should be noted that these studies only focused on either the radiologist, the AI-developing company or both as responsible parties for any errors. No consideration was given to regulatory agencies that approve the use of the AI system or health facilities implementing its use. While Pesapane et al. [13] investigated holding others responsible, this group of “others” is not clearly defined.
The results also highlight the important role played by information (transparent communication with stakeholders) as a medium for potentially improving the acceptance and uptake of AI-driven health services. The effect of information as a decision aid is complex and can vary based on the scenario and quality of information provided. Previous studies have shown that information is a crucial factor in the adoption of AI [39–41]. In particular, the acceptance of AI technologies has been shown to be based on factors such as perceived usefulness, social norms and social influence, as well as the perceived benefits and failures of AI systems, which positively influence user acceptance [39, 40]. Literature on the impact of information on cancer screening uptake, however, shows mixed results, with studies showing marginally positive, negative and no impact on screening uptake [42–45]. The information sheets used in these studies often included balanced information on defined benefits and risks of breast screening, including information on over-detection. Our study critically adds to both these strands of literature.
As noted, our study consisted of two arms, where participants in one arm were presented with additional information on the potential benefits of using AI for screen reading services. Our results suggest stronger preferences for AI–radiologist mixed reading and higher sensitivity amongst participants receiving this additional positive information. This strongly reflects existing findings on the impact of information on potential benefits and failures of AI systems on user acceptance [39, 40]. Additionally, the insignificance of the preference for screening services that use AI systems trained on all minority populations, whilst unexpected, could indicate participant trade-off behaviours in the presence of additional information. For example, participants with additional information on the benefits of AI use may think more about the benefit of AI (e.g. savings on labour and improved efficiency) above and beyond potential limitations such as lack of fair representation in training AI systems. This will be particularly important in framing suitable implementation strategies for AI-driven healthcare services. However, the results do not imply a lower need to improve equitability in AI systems through maintaining representativeness in training algorithms. In fact, the lack of significance for fair representation, given the other attributes, is in line with some existing literature that appears to indicate that people care about fairness in AI, but not quite as much as they do about other attributes [12]. In a citizen jury where participants were able to fully discuss and understand the nature of algorithmic bias, participants appeared to place a strong emphasis on fairness [17]. This seems to reflect the difference between attitudinal questions for each attribute versus the DCE for the whole programme.
Experience with cancer screening significantly reduced the probability of opting out of screening. This could imply several learned behaviours such as increased awareness of recommendations for regular screening for early detection of cancer, and positive reinforcement (i.e. experience with screening being straightforward without significant issues), due to positive experience with screenings.
Another key consideration raised is the gap in AI regulation, especially in healthcare where there is potential for patient harm due to system errors, algorithmic bias and privacy issues [46]. This study provides insights into potential factors that may be considered in forming policies and regulations for oversight of the application of AI in healthcare. From a policy perspective, the preference for a mixed human–AI screen-reading system highlights the importance of direct human involvement in the medical decision-making process. This has been a key component of AI health regulation in countries such as Canada, where a legislative requirement for safeguards including human intervention points in the deployment of AI in healthcare exists [47, 48]. Overall, results highlight the importance of promoting a balanced incorporation of AI systems into health workflows, with suitable human oversight [25].
A limitation of this study surrounds the period within which the study was conducted. The research was designed during the coronavirus disease 2019 (COVID-19) pandemic when several states in Australia implemented different levels of lockdowns. During this time, participants were more likely to have been exposed to the use of technology for healthcare, including telehealth, and some AI-powered applications and non-healthcare purposes. Whilst the sample was broadly representative of the Australian public in terms of education, their experience with health technology could have influenced their perceptions of the use of technology in healthcare screening, especially if they used some AI-powered screening applications during the lockdown.
The lack of hovers over attributes, providing their definitions to participants when needed, could be considered a limitation. Efforts were, however, made to simplify the attribute names where confusion was expected to potentially occur.
It should be noted that although our second information sheet provided the positives of the use of AI for breast screening, there are potential negative effects that, if highlighted, could shift participant views from screening in general to opting out of screening, especially for those currently reluctant to get screened. Future studies should explore the impact of these potentially negative outcomes on preferences. Within this study, negative outcomes were presented in the form of the attribute levels, but an inclusion within the information sheet will likely lead to further interesting results.
Finally, unlike an efficient design, the balanced overlap design does not require the specification of interaction effects in the utility function as it ensures a modest amount of level overlap in design, which is beneficial for estimating interaction terms. The interaction effects investigated within this study are, however, generally exploratory. We have recruited a relatively large sample size to increase the power for estimating interaction terms.
Conclusion
This study is based on a DCE for assessing the preferences of Australian women for the use of AI for breast cancer screening services in Australia. Results broadly suggest an openness for screening services that use paired radiologist–AI screen reading, as opposed to the two-radiologist system generally used in public population-based programmes. However, heterogeneity is observed in this preference, based on factors such as the participant’s previous screening experience and the amount of information provided on the benefits of AI in screening. The results hold important implications for how policies pertaining to both the implementation and regulation of AI health services in Australia should be designed in the future.
Supplementary Information
Below is the link to the electronic supplementary material.
Acknowledgements
We would like to thank the faculty and higher degree candidates of the Centre for Health Economics, Monash University for their valuable insights and for participating in our focus group discussions around the attribute selection and questionnaire design. We would also like to acknowledge our anonymous respondents who pre-tested the survey and provided feedback through a semi-structured interview. Finally, we would like to thank the participants of the 2022 Australian Health Economics Society conference who provided feedback on an initial draft of this work.
Declarations
Funding
Open Access funding enabled and organized by CAUL and its Member Institutions. This study was funded by the National Health and Medical Research Council (NHMRC) grant 1181960 “The algorithm will see you now: ethical, legal and social implications of adopting machine learning systems for diagnosis and screening”.
Conflict of interest
MEW has no conflicts of interest to declare. UDSP has no conflicts of interest to declare. GC has no conflicts of interest to declare. CD has no conflicts of interest to declare. YSJA has no conflicts of interest to decalre. NH declares membership of Australia’s TGA Advisory Committee on Medical Devices. NH declares funding from a National Breast Cancer Foundation (NBCF) Chair in Cancer Prevention grant (EC-21-001) and a National Health and Medical Research Council (NHMRC) Investigator grant (1194410). SC declares research funding from NHMRC, NBCF and ACSQHC for work on AI in healthcare, membership of the NSW Health AI Taskforce, and support for the costs of travel for conference presentations on AI in healthcare from medical indemnity insurers and non-government organisations. SC is on the editorial board of the journal (The Patient – Patient-Centered Outcomes Research). SC was not involved in the selection of peer reviewers for the manuscript or any of the subsequent editorial decisions.
Data sharing
Aggregate data are available on request.
Ethics
This study received ethics approval from the Monash University Human Research Ethics Committee (approval date: 24/03/2022, approval number: 31413).
Author contributions
MEW: conceptualisation, DCE design, questionnaire design, data collection, data analysis, drafting of manuscript and revisions. UDSP: questionnaire design, data collection, drafting of manuscript and revision. CD: conceptualisation, DCE design, revision of manuscript. YSJA: conceptualisation, DCE design, revision of manuscript. NH: conceptualisation, DCE design, revision of manuscript. SC: conceptualisation, DCE design, revision of manuscript. GC: conceptualisation, DCE design, questionnaire design, data analysis, revision of manuscript.
Footnotes
Out of over 8.5 million options in a full factorial design.
26.4% national average based on 2021 census (https://www.abs.gov.au/statistics/people/population/historical-population/2021).
9.4% national average based on 2021 census (https://www.abs.gov.au/statistics/people/population/historical-population/2021).
35.2% national average based on ABS (2022). Education and work, Australia, May 2022: https://www.abs.gov.au/statistics/people/education/education-and-work-australia/latest-release.
Estimated age-standardised participation rate of females aged 50–74 in the BreastScreen Australia programme in 2021–2022 was 49.6% (breast screening rates, Cancer Australia).
Homogeneity test indicated the data could be pooled. Interaction effects used to test the impact of information on preferences.
We also run a latent class analysis and find that the random parameter logit (BIC: 25439.1) performs better than the optimum latent class model (BIC: 26764.5). The two-class model is selected as the optimum model based on the AIC, BIC and class membership percentages. For those interested, we present the results in supplementary “Appendix E”.
Stacy M. Carter and Gang Chen: Joint Senior authors.
References
- 1.AIHW. Cancer data in Australia. Australian Government-Australian Institute of Health and Welfare; 2023.
- 2.DoHAC. BreastScreen Australia Program. Australian Government: Department of Health and Aged Care; 2023.
- 3.Dembrower K, Crippa A, Colon E, Eklund M, Strand F, Consortium tST. Artificial intelligence for breast cancer detection in screening mammography in Sweden: a prospective, population-based, paired-reader, non-inferiority study. Lancet Digit Health. 2023;5:E703–11. [DOI] [PubMed] [Google Scholar]
- 4.Hill H, Roadevin C, Duffy S, Mandrik O, Brentnall A. Cost-effectivenss of AI for risk-stratified breast cancer screening. JAMA Netw Open. 2024;7: e2431715. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Areia M, Mori Y, Correale L, Repici A, Bretthauer M, Sharma P, et al. Cost-effectiveness of artificial intelligence for screenign colonoscopy: a modelling sutdy. Lancet Digit Health. 2022;4:E436–44. [DOI] [PubMed] [Google Scholar]
- 6.Rossi JG, Rojas-Perilla N, Krois J, Schwendicke F. Cost-effectiveness of artificial intelligence as a decision-support system applied to the detection and grading of melanoma, dental caries, and diabetic retinopathy. JAMA Netw Open. 2022;5: e220269. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Schwendicke F, Rossi JG, Göstemeyer G, Elhennawy K, Cantu AG, Gaudin R, et al. Cost-effectiveness of artificial intelligence for proximal caries detection. J Dent Res. 2021;100:369–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Liu H, Li R, Zhang Y, Zhang K, Yusufu M, Liu Y, et al. Economic evaluation of combined population-based screening for multiple blindess-causing eye diseases in China: a cost-effectiveness analysis. Lancet Glob Health. 2023;11:E456–65. [DOI] [PubMed] [Google Scholar]
- 9.Lennox-Chhugani N, Chen Y, Pearson V, Trzcinski B, James J. Women's attitudes to the use of AI image readers: a case study from a national breast screening programme. BMJ Health Care Inform. 2021;28(1):e100293. 10.1136/bmjhci-2020-100293 [DOI] [PMC free article] [PubMed]
- 10.Jutzi TB, Krieghoff-Henning EI, Holland-Letz T, Utikal JS, Hauschild A, Schadendorf D, Sondermann W, Fröhling S, Hekler A, Schmitt M, Maron RC, Brinker TJ. Artificial intelligence in skin cancer diagnostics: the patients' perspective. Front Med (Lausanne). 2020;7:233. 10.3389/fmed.2020.00233. [DOI] [PMC free article] [PubMed]
- 11.Ongena YP, Yakar D, Haan M, Kwee TC. Artificial intelligence in screening mammography: a population survey of women's preferences. J Am Coll Radiol. 2021;18(1 Pt A):79–86. 10.1016/j.jacr.2020.09.042. [DOI] [PubMed] [Google Scholar]
- 12.Carter SM, Carolan L, Aquino YSJ, Frazer H, Rogers WA, Hall J, et al. Australian women’s judgements about using artificial intelligence to read mammograms in breast cancer screening. Digital Health. 2023;7:20552076231191056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Pesapane F, Rotili A, Valconi E, Agazzi GM, Montesano M, Penco S, Nicosia L, Bozzini A, Meneghetti L, Latronico A, Pizzamiglio M, Rossero E, Gaeta A, Raimondi S, Pizzoli SFM, Grasso R, Carrafiello G, Pravettoni G, Cassano E. Women's perceptions and attitudes to the use of AI in breast cancer screening: a survey in a cancer referral centre. Br J Radiol. 2023;96(1141):20220569. 10.1259/bjr.20220569. [DOI] [PMC free article] [PubMed]
- 14.Pesapane F, Giambersio E, Capetti B, Monzani D, Grasso R, Nicosia L, et al. Patients’ perceptions and attitudes to the use of artificial intelligence in breast cancer diagnosis: a narrative review. Life. 2024;14:454. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.de Vries CF, Colosimo SJ, Boyle M, Lip G, Anderson LA, Staff RT, et al. AI in breast screening mammography: breast screening readers’ perspectives. Insights Imaging. 2022;13:186. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.de Vries CF, Morrissey BE, Duggan D, Staff RT, Lip G. Screening participants’ attitudes to the introduction of artificial intelligence in breast screening. J Med Screen. 2021;28:221–2. [DOI] [PubMed] [Google Scholar]
- 17.Carter SM, Popic D, Marinovich ML, Carolan L, Houssami N. Women’s views on using artificial intelligence in breast cancer screening: a review and qualitative study to guide breast screening services. Breast. 2024;77: 103783. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Fritsch SJ, Blankenheim A, Wahl A, Hetfeld P, Maassen O, Deffge S, et al. Attitudes and perceptions of artifical intelligence in healthcare: a cross-sectional survey among patients. Digit Health. 2022;8. [DOI] [PMC free article] [PubMed]
- 19.Beets B, Newman TP, Howell EL, Bao L, Yang S. Surveying public perceptions of artificial intelligence in health care in the United States: systematic review. J Med Internet Res. 2023;25:e40337. 10.2196/40337. [DOI] [PMC free article] [PubMed]
- 20.Wu C, Xu H, Bai D, Chen X, Gao J, Jiang X. Public perceptions on the application of artificial intelligence in healthcare: a qualitative meta-synthesis. BMJ Open. 2023;13(1):e066322. 10.1136/bmjopen-2022-066322. [DOI] [PMC free article] [PubMed]
- 21.Vo V, Chen G, Aquino YSJ, Carter SM, Do Q, Woode ME. Multi-stakeholder preferences for the use of artificial intelligence in healthcare: a systematic review and thematic analysis. Soc Sci Med. 2023;338:116357. [DOI] [PubMed] [Google Scholar]
- 22.Juravle G, Boudouraki A, Terziyska M, Rezlescu C. Trust in artificial intelligence for medical diagnoses. Prog Brain Res. 2020;253:263–82. 10.1016/bs.pbr.2020.06.006. [DOI] [PubMed]
- 23.Formosa P, Rogers WA, Griep Y, Bankins S, Richards D. Medical AI and human dignity: Contrasting perceptions of human and artificially intelligent (AI) decision making in diagnostic and medical resource allocation contexts. Comput Hum Behav. 2022;133:107296. 10.1016/j.chb.2022.107296
- 24.Mangham LJ, Hanson K, McPake B. How to do (or not do) … designing a discrete choice experiment for application in a low-income country. Health Policy Plan. 2009;24:7. [DOI] [PubMed] [Google Scholar]
- 25.Frazer HML, Peña-Solorzano CA, Kwok CF, Elliott MS, Chen Y, Wang C, et al. Comparison of AI-integrated pathways with human–AI interaction in population mammographic screening for breast cancer. Nat Commun. 2024;15:7525. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Chrzan K, Orme B. An overview and comparison of design strategies for choice-based conjoint analysis. In: Sawthooth software research paper Series. 2000. p. 19.
- 27.DoHAC. Getting tested for breast cancer can save your life. In: Care ADoHaA editor. YouTube2015. https://www.youtube.com/watch?v=zAkMo37Qy31k.
- 28.de Bekker-Grob EW, Donkers B, Jonker MF, Stolk EA. Sample size requirements for discrete-choice experiments in healthcare; a practical guide. Patient Patient-Cent Outcomes Res. 2015;8:373–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Lancsar E, Louviere J. Conducting discrete choice experiments to inform healthcare decision making: a user guide. Pharmacoeconomics. 2008;26:661–77. [DOI] [PubMed] [Google Scholar]
- 30.Orme B. Sample size issues for conjoint analysis studies. In: Sequim: Sawtooth software technical paper. 1998.
- 31.Hensher DA, Rose JM, Greene WH. Applied choice analysis. Cambridge: Cambridge University Press; 2015. [Google Scholar]
- 32.Nickson C, Velentzis LS, Brennan P, Mann B, Houssami N. Improving breast cancer screening in Australia: a public health perspective. Public Health Res Pract. 2019;29: e2921911. [DOI] [PubMed] [Google Scholar]
- 33.Breast Screen Victoria [Internet]. Carlton South (AU): What happens after your breast screen. [cited 2025 Feb]. Available from https://www.breastscreen.org.au/the-breastscreen-experience/after-screening/
- 34.AIHW. BreastScreen Australia monitoring report, 2022. In: Welfare AIoHa editor. Canberra: 2022.
- 35.Ride J, Goranitis I, Meng Y, LaBond C, Lancsar E. A reporting checklist for discrete choice experiments in health: the DIRECT checklist. Pharmacoeconomics. 2024;42:1161–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Viberg Johansson J, Dembrower K, Strand F, Grauman Å. Women's perceptions and attitudes towards the use of AI in mammography in Sweden: a qualitative interview study. BMJ Open. 2024;14(2):e084014. 10.1136/bmjopen-2024-084014. [DOI] [PMC free article] [PubMed]
- 37.Gatting L, Ahmed S, Meccheri P, Newlands R, Kehagia AA, Waller J. Acceptability of artificial intelligence in breast screening: focus groups with screening-eligible population in England. BMJ Public Health. 2024;2: e000892. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Sezgin E. Artificial intelligence in healthcare: Complementing, not replacing, doctors and healthcare providers. Digit Health. 2023;9:1–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Kelly S, Kaye S-A, Oviedo-Trespalacios O. What factors contribute to the acceptance of artificial intelligence? A systematic review. Telemat Inform. 2023;77: 101925. [Google Scholar]
- 40.Theis S, Jentzsch S, Deligiannaki F, Berro C, Raulf AP, Bruder C. Requirements for explainability and acceptance of artificial intelligence in collaborative work. In: Degen H, Ntoa S, editors. Artificial intelligence in HCI HCII 2023, Lecture notes in computer science. 2023. p. 25.
- 41.Ben-Gal HC. Artificial intelligence (AI) acceptance in primary care during the coronavirus pandemic: what is the role of patients’ gender, age and health awareness? A two-phase pilot study. Front Public Health. 2023;10: 931225. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Pérez-Lacasta MJ, Martínez-Alonso M, Garcia M, Sala M, Perestelo-Pérez L, Vidal C, Codern-Bové N, Feijoo-Cid M, Toledo-Chávarri A, Cardona À, Pons A, Carles-Lavila M, Rue M; with the InforMa Group. Effect of information about the benefits and harms of mammography on women's decision making: The InforMa randomised controlled trial. PLoS One. 2019;14(3):e0214057. 10.1371/journal.pone.0214057. [DOI] [PMC free article] [PubMed]
- 43.Montero-Moraga JM, Posso M, Román M, Burón A, Sala M, Castells X, Macià F. Effect of an information leaflet on breast cancer screening participation: A cluster randomized controlled trial. BMC Public Health. 2021;21(1):1301. 10.1186/s12889-021-11360-0. [DOI] [PMC free article] [PubMed]
- 44.Young B, Bedford L, Kendrick D, Vedhara K, Robertson JFR, das NR. Factors influencing the decision to attend screening for cancer in the UK: a meta-ethnography of qualitative research. J Public Health. 2018;40:24. [DOI] [PubMed] [Google Scholar]
- 45.Hersch J, Barratt A, Jansen J, Irwig L, McGeechan K, Jacklyn G, et al. Use of a decision aid including information on overdetection to support informed choice about breast cancer screening: a randomised controlled trial. Lancet. 2015;385:1642–52. [DOI] [PubMed] [Google Scholar]
- 46.AMA. Safe and responsible AI in Australia. AMA submission to the Department of Industry, Science and Resources Australian Medical Association; 2023. p. 5.
- 47.AMA. Automated Decision Making and AI Regulation. AMA submission to the Prime Minister and Cabinet consultation on positioning Australia as the leader in digital economy regulation Australian Medical Association; 2022. p. 5.
- 48.GoC. Directive on automated decision-making. 2023.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.

