Radiologist preferences for artificial intelligence-based decision support during screening mammography interpretation

Nathaniel Hendrix; Kathryn P Lowry; Joann G Elmore; William Lotter; A Gregory Sorensen; William Hsu; Geraldine J Liao; Sana Parsian; Suzanne Kolb; Arash Naeim; Christoph I Lee

doi:10.1016/j.jacr.2022.06.019

. Author manuscript; available in PMC: 2023 Oct 1.

Published in final edited form as: J Am Coll Radiol. 2022 Aug 13;19(10):1098–1110. doi: 10.1016/j.jacr.2022.06.019

Radiologist preferences for artificial intelligence-based decision support during screening mammography interpretation

Nathaniel Hendrix ^1,^*, Kathryn P Lowry ^2,^*, Joann G Elmore ³, William Lotter ⁴, A Gregory Sorensen ⁴, William Hsu ⁵, Geraldine J Liao ^2,⁶, Sana Parsian ^2,⁷, Suzanne Kolb ², Arash Naeim ³, Christoph I Lee ^2,⁸

^1.Department of Global Health and Population Harvard T.H. Chan School of Public Health, Boston, Massachusetts

^2.Department of Radiology, University of Washington, Seattle Cancer Care Alliance, Seattle, Washington

^3.Department of Medicine, David Geffen School of Medicine, University of California, Los Angeles

^4.DeepHealth Inc, RadNet AI Solutions, Cambridge, Massachusetts

^5.Department of Radiological Sciences, Data Integration, Architecture, and Analytics Group, University of California, Los Angeles

^6.Department of Radiology, Virginia Mason Medical Center, Seattle, Washington

^7.Department of Radiology, Kaiser Permanente Washington, Seattle, Washington

^8.Department of Health Services, School of Public Health, University of Washington, Seattle, Washington

NH and KPL contributed equally to this manuscript

Authorship contributions:

• NH: Design of study; analysis and interpretation of data; drafting portions of first draft and taking part in subsequent revisions; final approval of submitted manuscript; agreement to accountable for the accuracy and integrity of the work.

• KPL: Acquisition, analysis, and interpretation of data; drafting portions of first draft and taking part in subsequent revisions; final approval of submitted manuscript; agreement to accountable for the accuracy and integrity of the work.

• JGE: Interpretation of data; critical revision of drafts for important intellectual content; final approval of submitted manuscript; agreement to accountable for the accuracy and integrity of the work.

• WL: Interpretation of data; critical revision of drafts for important intellectual content; final approval of submitted manuscript; agreement to accountable for the accuracy and integrity of the work.

• AGS: Interpretation of data; critical revision of drafts for important intellectual content; final approval of submitted manuscript; agreement to accountable for the accuracy and integrity of the work.

• WH: Interpretation of data; critical revision of drafts for important intellectual content; final approval of submitted manuscript; agreement to accountable for the accuracy and integrity of the work.

• GJL: Acquisition and interpretation of data; critical revision of drafts for important intellectual content; final approval of submitted manuscript; agreement to accountable for the accuracy and integrity of the work.

• SP: Acquisition and interpretation of data; critical revision of drafts for important intellectual content; final approval of submitted manuscript; agreement to accountable for the accuracy and integrity of the work.

• SK: Interpretation of data; critical revision of drafts for important intellectual content; final approval of submitted manuscript; agreement to accountable for the accuracy and integrity of the work.

• AN: Interpretation of data; critical revision of drafts for important intellectual content; final approval of submitted manuscript; agreement to accountable for the accuracy and integrity of the work.

• CIL: Conception and design of study; interpretation of data; critical revision of drafts for important intellectual content; final approval of submitted manuscript; agreement to accountable for the accuracy and integrity of the work.

^✉

Corresponding author: Kathryn P. Lowry, Department of Radiology, University of Washington, Seattle Cancer Care Alliance, 825 Eastlake Ave. E, G2-600, Seattle, WA 98109-1023, kplowry@uw.edu

PMCID: PMC9840464 NIHMSID: NIHMS1847815 PMID: 35970474

Abstract

Background:

Artificial intelligence (AI) may improve cancer detection and risk prediction during mammography screening, but radiologists’ preferences regarding its characteristics and implementation are unknown.

Purpose:

To quantify how different attributes of AI-based cancer detection and risk prediction tools affect radiologists’ intentions to use AI during screening mammography interpretation.

Materials and Methods:

Through qualitative interviews with radiologists, we identified five primary attributes for AI-based breast cancer detection and four for breast cancer risk prediction. We developed a discrete choice experiment (DCE) based on these attributes and invited 150 U.S.-based radiologists to participate. Each respondent made 8 choices for each tool between 3 alternatives: 2 hypothetical AI-based tools versus screening without AI. We analyzed sample-wide preferences using random parameters logit models and identified subgroups with latent class models.

Results:

Respondents (N=66; 44% response rate) were from six diverse practice settings across eight states. Radiologists were more interested in AI for cancer detection when sensitivity and specificity were balanced (94% sensitivity with <25% of examinations marked) and AI mark-up appeared at the end of the hanging protocol after radiologists complete their independent review. For AI-based risk prediction, radiologists preferred AI models using both mammography images and clinical data. Overall, 46–60% intended to adopt any of the AI tools presented in the study; 26–33% approached AI enthusiastically but were deterred if the features did not align with their preferences.

Conclusion:

Although most radiologists want to use AI-based decision support, short-term uptake may be maximized by implementing tools that meet the preferences of dissuadable users.

Keywords: artificial intelligence, breast cancer, cancer detection, cancer screening, risk prediction, preferences, discrete choice experiment

Introduction

The development of Artificial Intelligence (AI)-based technology for mammography interpretation has accelerated in recent years, with numerous studies showing promise for AI-based decision support for cancer detection on screening mammography^1–6 and future breast cancer risk prediction.^7–10 Several AI-based tools for mammography interpretation have been recently approved for clinical use by the U.S. Food and Drug Administration (FDA) and are now commercially available.¹¹ Despite the pace of production of AI-based mammography decision support tools, it is not yet clear which technologies will be embraced by practicing radiologists.

Radiologists’ uptake of AI-based mammography tools will likely depend on multiple factors, such as radiologists’ perceptions of how AI might impact performance, existing workflow, as well as costs and reimbursement.¹² However, there is a paucity of data describing radiologists’ overall attitudes toward AI tools. While some breast imagers may be open to using AI-based decision support given their experience with conventional computer aided detection (CAD), some may be wary of such technology, particularly since the benefits of CAD seen in early reader studies ultimately were not achieved in large, population-based studies.^13–15 The lack of utility of conventional CAD has been largely attributed to its poor specificity, with as many as 2000 false positive CAD markings for every detected cancer in screening settings.^13,29 Given this experience, radiologists may have preconceived notions about the utility of AI-based mammography decision support, the integration of these tools with the task of image interpretation, and how this information should impact patient management.

To better understand radiologists’ attitudes and preferences for AI-based screening mammography decision support, we conducted qualitative interviews to ascertain attributes of AI-based mammography decision support most important to clinical radiologists. Based on these findings, we then developed a discrete choice experiment (DCE), to examine how these attributes may impact the adoption of AI-based tools among radiologists who regularly interpret mammography.”. DCEs are a methodologic tool for quantifying the relative importance of different attributes and strength of preferences from stakeholders.^16,17 In our DCEs, we sought to identify the key attributes driving radiologists’ preferences for AI-based mammography decision support tools designed for breast cancer detection and future breast cancer risk prediction.

Methods

We used DCEs to quantify radiologist preferences for two types of AI-based decision support tools: breast cancer detection and future breast cancer risk prediction. The DCEs were designed to assess the impact of different attributes on radiologists’ intentions to use AI during mammography interpretation. We invited a sample of radiologists from diverse, U.S.-based practice settings to participate and analyzed their responses to estimate mean preferences and to identify subgroups with distinct preferences. The study was approved by the University of Washington Human Subjects Division.

Development of experimental instruments

We developed the experimental instruments in accordance with guidance on best practices from the International Society for Pharmacoeconomics and Outcomes Research.¹⁸ Among the researchers, one radiologist [K.P.L.] and a health economist [N.H.] conducted interviews via teleconference software, analyzed the recordings, and identified the themes for inclusion in the study. We first created a list of candidate attributes for inclusion in the study from literature review and a priori hypotheses. We included these attributes in an interview guide that we used to inform semi-structured interviews with five radiologists from our professional networks (Appendix 1).¹⁹ Interview participants were selected from authors’ professional networks and included a mix of radiologists with fellowship training in breast imaging practicing in academic and community practice settings. None of the participants had previously used AI-based mammography support systems. Interview participants were compensated with $100 gift cards. Themes were coded from audio recordings of each interview using Excel by one analyst performed using the general inductive approach.²⁰

For the final instrument, we selected five attributes for assessing preferences for AI cancer detection and four attributes for AI-based future breast cancer risk prediction based on our structured interviews (Table 1). The themes were selected for their stated importance to interviewees and their relevance to our scientific questions. We also required that themes be independent of one another. Based on our qualitative data synthesis, we also defined two to three levels of performance for each attribute. We then used these levels to distinguish the two AI-based alternatives in each forced choice. That is, all alternatives had the same attributes (e.g., validation), but differed in how they performed on each attribute (e.g., retrospective versus prospective validation). Most levels were derived from interviews or content in scholarly publications. The levels of the sensitivity-specificity trade-off in the cancer detection experiment were a special case in which an AI researcher [W.L.] indicated hypothetical values for this attribute based on research algorithms.

Table 1:

Attributes and levels included in the two discrete choice instruments

AI-Based Cancer Detection		AI-Based Cancer Risk Prediction
Attribute	Dimensions	Attribute	Dimensions
How suspicion of malignancy is communicated	• Binary indicator (“Suspicious” / “Not suspicious”) • Categorical (“High” / “Intermediate” / “Low”) • Numerical probability (e.g., “2%”)	What timeframe AI predicts cancer risk for	•AI calculates risk of cancer over next two years • AI calculates risk of cancer over next five years • AI calculates lifetime cancer risk
How AI performs	• AI detects 97% of cancers and 50% of exams have at least one marking • AI detects 94% of cancers and 25% of exams have at least one marking • AI detects 85% of cancers and 10% of exams have at least one marking	How AI communicates risk score	• Risk categories (“High” / “Intermediate” / “Low”) • Risk of cancer relative to average woman (e.g., “1.5x”) • Absolute probability of cancer (e.g., “2%”)
How workflow can be customized	• Does not customize workflow by AI assessment • Defaults to 2D image only read for exams with low suspicion of cancer being present • Flags examinations with high suspicion of cancer for review by two radiologists	How AI performs	• Risk model based on images only with similar accuracy to the Tyrer-Cuzick model • Risk model based on images only that performs 20% better (20% higher AUC) than Tyrer-Cuzick • Risk model based on images and clinical risk factors that performs 20% better (20% higher AUC) than Tyrer-Cuzick
How the algorithm has been validated	• Reader studies of AI with enriched case sets • Retrospective validation in target screening population • Prospective trial in target screening population	What screening recommendations AI makes and automatically places into radiology report	• Risk assessment without a specific recommendation (information provided for referring physicians to interpret and address) • Risk assessment with recommendation for supplemental screening (e.g., ultrasound or MRI) for high-risk women • Risk assessment with recommendation for mammography screening interval (e.g., biennial versus annual)
How AI is applied to hanging protocol	• AI automatically shown for all images in hanging protocol (with option to toggle off) • AI marked-up views appear at end of hanging protocol

Open in a new tab

We also collected basic demographic and descriptive information: years in practice, training in breast imaging fellowship, attitude towards conventional CAD, opinion on the top priority for AI in breast imaging, number of mammograms read per week, gender, practice setting, and self-reported cancer detection and recall rates.

Experimental design

In each choice task, participants were presented with a forced choice between three alternatives: two hypothetical, AI-based mammography decision support tools or interpretation without AI (Figure 1). Each choice task presents a comparison between two hypothetical AI-based tools with identical attributes that vary in the level of performance on each attribute. For example, one tool may offer superior sensitivity, while the other has been validated through more rigorous means. This design assesses the relative importance of different levels of performance on the attributes both in the decision to use one AI-based tool versus another and in the decision to use AI at all (Table 1).

Figure 1: — A sample forced discrete choice task from the study instrument for AI-based cancer detection tools.

We used Ngene 1.2.1 (ChoiceMetrics, Inc., New South Wales, Australia) to arrange the comparisons between AI-based tools in a manner that optimizes the information derived from each DCE choice task. Namely, we used D-efficiency criteria to create an experimental design that ensured each performance level appeared equally often and that there were an equal number of comparisons between performance level.¹⁸ To minimize response burden, we created two versions of the instrument that resulted in each respondent being assigned 16 DCE choice tasks: 8 for cancer detection and 8 for risk prediction. Based on power calculations, the recommended minimum sample size for an analysis without interactions between attributes in this experimental design was 62 respondents.²¹ This method assumes that the probability of preferences for each choice task is normally distributed.

Administration

We sampled 6 radiology practices among the authors’ professional networks. We purposefully selected one academic center, one integrated health system with a radiology training program, one integrated health system without a radiology training program, and three private practice sites. We also selected for geographic diversity with both west and east coast practices represented. We contacted a practice manager at each facility, who provided us with a list of email addresses for all radiologists at the practice who regularly interpret screening mammograms.

We sent invitation emails to 150 radiologists at 6 practices between May and August 2021 with a follow-up reminder email approximately 2 weeks after the initial email. Invitation emails were composed by the coauthors and sent by the practice managers. Emails contained a link to a website where the study was conducted. All participants were compensated with a $100 Amazon gift card for completing the DCE.

Participants were first presented with a brief clinical narrative describing the hypothetical uses of AI in screening mammography, followed by study questions. Study data were collected and managed using REDCap electronic data capture tools hosted at University of Washington.^22,23

Analysis

Our main outcome of interest was the impact of each attribute on the probability that radiologists would use AI tools. We estimated sample-wide preferences with random parameters logit (RPL) models and identified subgroups using latent class models (LCMs).^24,25

The LCM assumes that there are unobserved groups of respondents with similar utility functions. It calculates the utility functions for each group separately, then calculates the probability of group membership for each respondent based on their choices and characteristics.²⁶ Individual-level characteristics were evaluated for inclusion in the final model by using backwards stepwise variable selection based on Bayesian information criteria.²⁷ We first specified two latent classes in our sample, then increased the number of classes until the model failed to converge.

We used effects coded data in both models. We also added an alternative-specific constant (ASC) for the choice to screen without AI. This allowed us to estimate the baseline preference for screening with the average of our hypothetical AI tools versus screening without AI. We used the coefficient of the ASC to calculate the baseline preference using the formula $P r (A I) = (1 + e^{- A S C})^{- 1}$ . We next calculated the impact of each performance dimension on the probability of choosing to screen with AI. For the coefficient of a given performance dimension, $β_{i}$ , we calculated the baseline probability of using the AI with that attribute level via the formula $P r (A I | β_{i}) = (1 + e^{- (β_{i} - A S C)})^{- 1}$ .

We calculated the relative importance of each attribute by first calculating the difference in coefficients between the most- and least-desirable level of each attribute. We summed these differences and divided the attribute-specific differences by this sum to estimate the maximum percent of each choice that is due to each attribute.

Data cleaning, formatting, and visualization of results were conducted in R 4.0.2. Regressions were conducted in Stata 16.1 (StataCorp LLC, College Station, TX).

Results

A total of 66 of 150 invited radiologists participated (44% participation rate) (Table 2). Respondents were from six diverse practice settings across eight states. Most were from non-academic practice settings and were fellowship-trained in breast imaging. Site-specific response rates ranged from 24% to 77%. All participants were radiologists who regularly interpreted mammograms -- 85% said that they interpreted between 101 and 500 mammograms per week. Of the 66 responses we received, 41 respondents indicated that they had participated in a breast imaging fellowship. Respondents had a mean of 15.8 years in practice (minimum of 1 year, maximum of 48 years). Conventional computer-aided detection was available at all practices, though none used contemporary deep learning-based detection support.

Table 2:

Respondent characteristics

	Overall (N=66)
Years in practice
Mean (SD)	15.8 (11.6)
Median [Min, Max]	12.0 [1.00, 48.0]
Breast imaging fellowship
No	25 (37.9%)
Yes	41 (62.1%)
Attitude towards conventional computer-aided detection
Very helpful	1 (1.5%)
Somewhat helpful	27 (40.9%)
Neither helpful nor unhelpful	21 (31.8%)
Somewhat unhelpful	7 (10.6%)
Very unhelpful	10 (15.2%)
Top priority for AI in breast imaging
Detect more cancers	42 (63.6%)
Improve efficiency	13 (19.7%)
Reduce false positives	11 (16.7%)
Screening mammograms read per week
Fewer than 10	0 (0%)
10 to 100	7 (10.6%)
101 to 500	56 (84.8%)
More than 500	3 (4.5%)
Self-reported cancer detection rate (per thousand)
Less than 2.5	1 (1.5%)
2.5 to 4.9	12 (18.2%)
5 to 7.4	39 (59.1%)
7.5 to 9.9	11 (16.7%)
More than 10	3 (4.5%)
Self-reported recall rate
Less than 5%	4 (6.1%)
5% to 9.9%	23 (34.8%)
10% to 14.9%	33 (50.0%)
15% to 19.9%	6 (9.1%)
More than 20%	0 (0%)
Gender
Female	40 (60.6%)
Male	26 (39.4%)
Practice setting
Academic	10 (15.2%)
Community (integrated health system or private practice)	56 (84.8%)

Open in a new tab

Preferences for AI-based breast cancer detection

Participants showed a strong intention to use an AI-based breast cancer detection tool versus not: there was a 0.90 probability of respondents choosing to use a randomly selected AI-based cancer detection tool. However, there was substantial heterogeneity: the 95% confidence interval for intention to use AI in a randomly selected sample of this size was 0.39 to 0.99.

Of the options presented for algorithm sensitivity and mark-up, respondents were most interested in using tools with a balance of 94% sensitivity and 25% of images marked (Figure 2). They also were most likely to opt for AI tools without workflow customization or tools that prompt double-reading of high-suspicion examinations versus defaulting to 2-dimensional images only for low-suspicion images. Our sample largely preferred viewing image mark-up at the end of the hanging protocol versus at the beginning.

Figure 2: — Mean baseline sample-wide probabilities (with 95% confidence intervals) of intending to use AI-based tools for cancer detection given specified attributes

We identified 3 classes in the latent class model (Figure 3). The largest class (39 of 66, or 60%) indicated that they were likely to use any AI option in the experiment. For this reason, we called this group “steadfast AI detection users.” Sensitivity was their most important attribute (Appendix Figure A1). Compared to AI non-users, they were more likely to practice in an academic setting, have a self-reported lower recall rate, and view conventional CAD more positively (Appendix Figure A2).

The second class (18 of 66, or 26%) approached AI enthusiastically but was quick to change their mind if the AI tool had any characteristics perceived as undesirable, including defaulting to 2-dimensional images or having low sensitivity. Workflow customization was the most important attribute for this class of respondents. We called this group “dissuadable AI detection users.” Demographically, this group was characterized by fewer years in practice than the other groups.

The third group, which we named “AI detection non-users,” represented 14% of the sample (9 of 66) and was characterized by a general unwillingness to adopt AI-based breast cancer detection. For this group, an AI-based cancer detection tool would need to have several characteristics perceived as desirable before being adopted.

Preferences for AI-based future breast cancer risk prediction

Participants also generally reported intending to use AI-based future breast cancer risk prediction versus not. The sample had a probability of 0.82 of choosing to use a randomly selected AI-based risk prediction tool presented to them in the experiment (Figure 4).

Figure 4: — Mean baseline sample-wide probabilities (with 95% confidence intervals) of intending to use AI-based tools for future breast cancer risk prediction given specified attributes

Three attributes of AI risk prediction tools were viewed as undesirable and reduced the probability of using AI risk prediction by as much as 30 percentage points: breast cancer risk estimates for a two-year (versus five-year or lifetime) period, equal accuracy to the Tyrer-Cuzick model (versus 20% improvement), and automated recommendations for different mammography screening intervals based on risk (versus no recommendation or recommendations of supplemental screening). Three attributes significantly increased the probability of using AI-based risk prediction: lifetime (versus two-year) cancer risk estimates, improved (versus not improved) performance over the Tyrer-Cuzick model (by 20%), and risk estimates based on both images and clinical data (versus using images alone).

We again identified 3 classes of decision makers (Figure 5). The largest (29 of 66, or 46%) showed propensity to use any of the AI-based risk prediction tools. We named this group “steadfast AI risk prediction users.” Performance of the AI model versus the Tyrer-Cuzick model was their most important attribute (Appendix Figure A3). Unlike the other two classes of participants, this group preferred to have risk communicated in categories (e.g., low, medium, and high) versus numerical estimates. They were characterized by more positive attitudes towards conventional CAD and by lower likelihood of listing specificity as the top priority for AI.

Another class that generally approached AI with enthusiasm was a group that we called “dissuadable AI risk prediction users” (23 of 66, or 33%). While eager to use AI, this group was inclined to change their mind when the tool had undesirable qualities. Among these undesirable qualities were using discrete categories to communicate risk, accuracy equal to the Tyrer-Cusick model, and not providing an automated screening recommendation. Compared to AI enthusiasts, information seekers were less likely to have received breast imaging fellowship training, more likely to have a negative attitude towards conventional CAD, and less likely to endorse efficiency as the most important application of AI in breast cancer screening (Appendix Figure A4).

The final class, representing 21% (14 of 66), was characterized by their reluctance to use any AI-based risk prediction tool. Unlike the other groups, which had a baseline probability of using AI near 90%, the “AI risk prediction non-users” had a baseline probability of just over 30%. They were more likely to use AI if it provided a lifetime prediction of cancer risk and if it did not provide an automated screening recommendation.

Predicted class membership for specific individuals varied across the two experiments (Table 3). Respondents generally displayed more cautious attitudes towards AI-based risk prediction than detection, and there was relatively low correlation between individuals’ predicted class membership for the two use cases of AI.

Table 3:

Count of radiologists predicted to belong in each latent class in the experiments estimating preferences for artificial intelligence (AI)-based detection and risk prediction.

		AI-based Risk Prediction
		Steadfast AI Risk Prediction Users	Dissuadable AI Risk Detection Users	AI Risk Prediction Non-Users
AI-based Detection	Steadfast AI Detection Users	21	15	3
	Dissuadable AI Detection Users	8	8	2
	AI Detection Non-Users	0	0	9

Open in a new tab

Discussion

In our DCEs involving radiologists from six diverse practice settings across eight states, there was overall strong interest in the use of AI-based mammography decision support: 90% of respondents intended to use AI-based tools for cancer detection, and 82% intended to use AI-based tools for future breast cancer risk prediction. Overall, the most desired AI cancer detection tools balanced sensitivity and specificity, while the desired AI cancer risk prediction tools improved upon the Tyrer-Cuzick risk model using information from both images and clinical data. Despite the overall enthusiasm for these AI tools, many respondents were easily dissuaded from adopting AI-based decision support based on specific features (26% for cancer detection and 33% for risk prediction). A smaller minority of radiologists would not consider using AI for cancer detection (14%) or future cancer risk prediction (21%) regardless of the choices of attributes.

When presented with different options for an AI tool for cancer detection, respondents preferred a tool with a balance of sensitivity and specificity (i.e., a tool with 94% sensitivity and mark-up of 25% of examinations) over one that prioritizes specificity (85% sensitivity with 10% mark-up) or sensitivity (97% sensitivity with 50% mark-up) alone. This preference may reflect radiologists’ experience with conventional CAD, which inserts up to four markings per examination to ensure high sensitivity. In contrast, AI-based cancer detection algorithms currently deployed in clinics mark approximately 38% of exams.²⁸ Of note, respondents did prefer a tool with a hanging protocol similar to conventional CAD, with mark-up displayed at the end of the hanging protocol. Radiologists’ preferences based on prior experience with CAD should therefore be taken into account by AI developers.

Approximately a quarter of radiologists were generally eager to use AI for cancer detection but changed their mind when specific detection tools did not align with their clinical preferences. Members of this class of “dissuadable AI detection users” were less likely to use AI if it customized workflow for examinations deemed as low-risk (i.e., defaulting to 2-D views for examinations interpreted by the tool as normal). These findings are similar to a previously published DCE of primary care physicians’ attitudes and preferences toward AI for screening mammography participation, in which 24% of respondents belonged to a latent class defined by their opposition to any unattended review of images based on AI assessment.³⁰ Together, these findings suggest that many physicians remain wary of the use of AI for unsupervised or partially supervised image interpretation.

Similarly, although most respondents were likely to adopt AI-based tools for risk prediction, one-third were easily dissuaded. These “dissuadable AI risk prediction users” were less likely to adopt an AI risk prediction tool if it did not demonstrate superior performance to the Tyrer-Cuzick model. They were also less likely to adopt an AI model that provided no screening recommendations or provided recommendations for mammography screening intervals versus one that provided recommendations for supplemental screening. This preference may reflect radiologists’ desire for AI-based risk prediction tools to align with current screening guidelines which encourage supplemental MRI screening in women with high predicted breast cancer risk based on clinical risk models.³¹ In contrast, the “AI risk prediction non-users” were more likely to use AI for risk prediction if the tool did not provide risk-tailored screening recommendations.

It was noteworthy that predicted class membership was not identical for AI cancer detection and AI risk prediction, and that respondents overall had a slightly more critical attitude toward risk prediction. One possible reason for this is that extensive experience with CAD has allowed radiologists to feel more comfortable using computerized decision support for detection. Because of these shifts in class membership, the latent classes we identified are best interpreted as situation-dependent attitudes that are likely to change as providers accumulate experience working with AI and as new use cases for AI are found.

This work is among the first quantitative experimental studies of radiologist preferences for AI. Our findings may be useful in aiding decisions about prioritization of resources to develop new features for AI in screening mammography, implementation in real world settings, and regulation of AI tools. Future work is needed to elicit preferences of other stakeholders such as patients and payers and to ensure that AI improves decision-making in real world settings, including how AI-based outputs can be presented to minimize the risk of potential harms.^32,33 Qualitative work has already begun among patients, where scientific validation of AI has emerged as an important attribute and little support has been found for unattended decision-making by AI.^34,35

Our study has limitations. Although our participation rate was relatively high for a study of invited physicians (44%)³⁶, our sample is small and thus the generalizability of our findings is unclear. However, we purposefully sampled across diverse practice types and geographic regions and met our minimum threshold of participants based on power calculations. Our study participants practiced in predominantly urban settings located on the East and West coasts, and therefore our findings may not generalize to all radiologists or across all settings. We also did not collect respondent age, which is imperfectly correlated with years in practice but may affect preferences differently. Additionally, there may be attributes that we did not include that would significantly affect radiologists’ attitudes towards AI-based mammography decision support. Finally, our findings reflect the attitudes of radiologists prior to widespread use of AI, and most of our respondents likely have little or no firsthand experience with AI-based decision support tools.

In summary, our findings from this DCE suggest that most radiologists are highly likely to express the intention of adopting AI-based screening mammography decision support tools for breast cancer detection and future breast cancer risk prediction. Most radiologists prefer AI cancer detection tools that provide a balance of sensitivity and specificity and AI risk prediction tools that improve upon existing clinical risk models using both imaging and clinical data. Despite the overall enthusiasm, many radiologists’ support of AI-based decision support is contingent on its specific features and functionality.

The short-term uptake of AI technology for mammography interpretation will likely depend on how well its functionality aligns with radiologists’ preferences. In particular, radiology practices can improve adoption of AI by selecting decision support tools that meet the needs of dissuadable users. While intention to use AI is largely defined by preexisting attitudes in the other groups, dissuadable users’ intentions vary widely based on tool’s attributes. As such, the support of this group is essential to securing a substantial majority of users within a practice who will intend to use AI during breast cancer screening.

Supplementary Material

Appendix

NIHMS1847815-supplement-Appendix.docx^{(600.1KB, docx)}

Funding information:

This study was supported by the National Cancer Institute (R37 CA240403) and the American Cancer Society (CSDG-21–078-01-CPSH). Study data were collected and managed using REDCap electronic data capture tools hosted at the Institute of Translational Health Sciences (ITHS) at the University of Washington. REDCap at ITHS is supported by the National Center for Advancing Translational Science of the National Institutes of Health (UL1 TR002319, KL2 TR002317, and TL1 TR002318).

Footnotes

Data statement: The authors declare that they had full access to all of the data in this study and the authors take complete responsibility for the integrity of the data and the accuracy of the data analysis.

Conflicts of interest: NH, KPL, JGE, WH, GJL, SP, SK, and AN report no conflicts of interest. WL and AGS are employees of DeepHealth, Inc., subsidiary of RadNet, Inc. CIL receives consulting fees from GRAIL, Inc. for service on a data safety board, personal fees from the American College of Radiology for journal editorial board work, and textbook royalties from McGraw Hill, Inc., Wolters Kluwer, and Oxford University Press.

References

1.Rodriguez-Ruiz A, Lång K, Gubern-Merida A, et al. Stand-alone artificial intelligence for breast cancer detection in mammography: Comparison with 101 radiologists. J Natl Cancer Inst. 2019;111(9):916–922. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.McKinney SM, Sieniek M, Godbole V, et al. International evaluation of an AI system for breast cancer screening. Nature. 2020;577(7788):89–94. [DOI] [PubMed] [Google Scholar]
3.Salim M, Wåhlin E, Dembrower K, et al. External Evaluation of 3 Commercial Artificial Intelligence Algorithms for Independent Assessment of Screening Mammograms. JAMA Oncol. Published online August 27, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Schaffter T, Buist DSM, Lee CI, et al. Evaluation of combined artificial intelligence and radiologist assessment to interpret screening mammograms. JAMA Netw Open. 2020;3(3):e200265. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Lotter W, Diab AR, Haslam B, et al. Robust breast cancer detection in mammography and digital breast tomosynthesis using an annotation-efficient deep learning approach. Nat Med. 2021;27(2):244–249. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Zhu X, Wolfgruber TK, Leong L, et al. Deep learning predicts interval and screening-detected cancer from screening mammograms: A case-case-control study in 6369 women. Radiology. Published online September 7, 2021:203758. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Yala A, Lehman C, Schuster T, Portnoi T, Barzilay R. A Deep Learning Mammography-based Model for Improved Breast Cancer Risk Prediction. Radiology. 2019;292(1):60–66. [DOI] [PubMed] [Google Scholar]
8.Arefan D, Mohamed AA, Berg WA, Zuley ML, Sumkin JH, Wu S. Deep learning modeling using normal mammograms for predicting breast cancer risk. Med Phys. 2020;47(1):110–118. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Dembrower K, Liu Y, Azizpour H, et al. Comparison of a deep learning risk score and standard mammographic density score for breast cancer risk prediction. Radiology. 2020;294(2):265–272. [DOI] [PubMed] [Google Scholar]
10.Yala A, Mikhael PG, Strand F, et al. Toward robust mammography-based models for breast cancer risk. Sci Transl Med. 2021;13(578):eaba4373. [DOI] [PubMed] [Google Scholar]
11.Sechopoulos I, Teuwen J, Mann R. Artificial intelligence for breast cancer detection in mammography and digital breast tomosynthesis: State of the art. Semin Cancer Biol. 2021;72:214–225. [DOI] [PubMed] [Google Scholar]
12.Ammenwerth E. Technology acceptance models in health informatics: TAM and UTAUT. Stud Health Technol Inform. 2019;263:64–71. [DOI] [PubMed] [Google Scholar]
13.Fenton JJ, Taplin SH, Carney PA, et al. Influence of computer-aided detection on performance of screening mammography. N Engl J Med. 2007;356(14):1399–1409. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Lehman CD, Wellman RD, Buist DSM, et al. Diagnostic accuracy of digital screening mammography with and without computer-aided detection. JAMA Intern Med. 2015;175(11):1828–1837. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Gao Y, Geras KJ, Lewin AA, Moy L. New frontiers: An update on computer-aided diagnosis for breast imaging in the age of artificial intelligence. AJR Am J Roentgenol. 2019;212(2):300–307. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Brenner RJ, Ulissey MJ, Wilt RM. Computer-aided detection as evidence in the courtroom: potential implications of an appellate court’s ruling. AJR Am J Roentgenol. 2006;186(1):48–51. [DOI] [PubMed] [Google Scholar]
17.Ryan M. Discrete choice experiments in health care. BMJ. 2004;328(7436):360–361. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.de Bekker-Grob EW, Ryan M, Gerard K. Discrete choice experiments in health economics: a review of the literature. Health Econ. 2012;21(2):145–172. [DOI] [PubMed] [Google Scholar]
19.Johnson FR, Lancsar E, Marshall D, et al. Constructing experimental designs for discrete-choice experiments: report of the ISPOR Conjoint Analysis Experimental Design Good Research Practices Task Force. Value Health. 2013;16(1):3–13. [DOI] [PubMed] [Google Scholar]
20.Coast J, Horrocks S. Developing attributes and levels for discrete choice experiments using qualitative methods. J Health Serv Res Policy. 2007;12(1):25–30. [DOI] [PubMed] [Google Scholar]
21.Thomas DR. A General Inductive Approach for Analyzing Qualitative Evaluation Data. American Journal of Evaluation. 2006;27(2):237–246. [Google Scholar]
22.de Bekker-Grob EW, Donkers B, Jonker MF, Stolk EA. Sample Size Requirements for Discrete-Choice Experiments in Healthcare: a Practical Guide. Patient. 2015;8(5):373–384. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Harris PA, Taylor R, Thielke R, et al. A metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform. 2009;42(2):377–381. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Harris PA, Taylor R, Minor BL, et al. The REDCap consortium: Building an international community of software platform partners. J Biomed Inform. 2019;95:103208. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Hauber AB, González JM, Groothuis-Oudshoorn CGM, et al. Statistical Methods for the Analysis of Discrete Choice Experiments: A Report of the ISPOR Conjoint Analysis Good Research Practices Task Force. Value Health. 2016;19(4):300–315. [DOI] [PubMed] [Google Scholar]
26.Hess S, Rose JM. Can scale and coefficient heterogeneity be separated in random coefficients models? Transportation. 2012;39(6):1225–1239. [Google Scholar]
27.Carroll JD, Green PE. Psychometric Methods in Marketing Research: Part I, Conjoint Analysis. J Mark Res. 1995;32(4):385–391. [Google Scholar]
28.Heinze G, Wallisch C, Dunkler D. Variable selection - A review and recommendations for the practicing statistician. Biom J. 2018;60(3):431–449. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Tartar M, Le L, Watanabe AT, Enomoto A. Artificial Intelligence Support for Mammography: In-Practice Clinical Experience. J Am Coll Radiol. Published online October 6, 2021. [DOI] [PubMed] [Google Scholar]
30.Hendrix N, Hauber B, Lee CI, Bansal A, Veenstra DL. Artificial intelligence in breast cancer screening: primary care provider preferences. J Am Med Inform Assoc. 2021;28(6):1117–1124. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Saslow D, Boetes C, Burke W, et al. American Cancer Society guidelines for breast screening with MRI as an adjunct to mammography. CA Cancer J Clin. 2007;57(2):75–89. [DOI] [PubMed] [Google Scholar]
32.Green B, Chen Y. The Principles and Limits of Algorithm-in-the-Loop Decision Making. Proc ACM Hum-Comput Interact. 2019;3(CSCW):1–24.34322658 [Google Scholar]
33.Jacobs M, Pradier MF, McCoy TH Jr, Perlis RH, Doshi-Velez F, Gajos KZ. How machine-learning recommendations influence clinician treatment selections: the example of the antidepressant selection. Transl Psychiatry. 2021;11(1):108. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Haan M, Ongena YP, Hommes S, Kwee TC, Yakar D. A qualitative study to understand patient perspective on the use of artificial intelligence in radiology. J Am Coll Radiol. 2019;16(10):1416–1419. [DOI] [PubMed] [Google Scholar]
35.Ongena YP, Yakar D, Haan M, Kwee TC. Artificial Intelligence in Screening Mammography: A Population Survey of Women’s Preferences. J Am Coll Radiol. 2021;18(1 Pt A):79–86. [DOI] [PubMed] [Google Scholar]
36.Beebe TJ, Jacobson RM, Jenkins SM, Lackore KA, Rutten LJF. Testing the impact of mixed-mode designs (mail and web) and multiple contact attempts within mode (mail or web) on clinician survey response. Health Serv Res. 2018;53 Suppl 1:3070–3083. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix

NIHMS1847815-supplement-Appendix.docx^{(600.1KB, docx)}

[R1] 1.Rodriguez-Ruiz A, Lång K, Gubern-Merida A, et al. Stand-alone artificial intelligence for breast cancer detection in mammography: Comparison with 101 radiologists. J Natl Cancer Inst. 2019;111(9):916–922. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.McKinney SM, Sieniek M, Godbole V, et al. International evaluation of an AI system for breast cancer screening. Nature. 2020;577(7788):89–94. [DOI] [PubMed] [Google Scholar]

[R3] 3.Salim M, Wåhlin E, Dembrower K, et al. External Evaluation of 3 Commercial Artificial Intelligence Algorithms for Independent Assessment of Screening Mammograms. JAMA Oncol. Published online August 27, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Schaffter T, Buist DSM, Lee CI, et al. Evaluation of combined artificial intelligence and radiologist assessment to interpret screening mammograms. JAMA Netw Open. 2020;3(3):e200265. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Lotter W, Diab AR, Haslam B, et al. Robust breast cancer detection in mammography and digital breast tomosynthesis using an annotation-efficient deep learning approach. Nat Med. 2021;27(2):244–249. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Zhu X, Wolfgruber TK, Leong L, et al. Deep learning predicts interval and screening-detected cancer from screening mammograms: A case-case-control study in 6369 women. Radiology. Published online September 7, 2021:203758. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Yala A, Lehman C, Schuster T, Portnoi T, Barzilay R. A Deep Learning Mammography-based Model for Improved Breast Cancer Risk Prediction. Radiology. 2019;292(1):60–66. [DOI] [PubMed] [Google Scholar]

[R8] 8.Arefan D, Mohamed AA, Berg WA, Zuley ML, Sumkin JH, Wu S. Deep learning modeling using normal mammograms for predicting breast cancer risk. Med Phys. 2020;47(1):110–118. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Dembrower K, Liu Y, Azizpour H, et al. Comparison of a deep learning risk score and standard mammographic density score for breast cancer risk prediction. Radiology. 2020;294(2):265–272. [DOI] [PubMed] [Google Scholar]

[R10] 10.Yala A, Mikhael PG, Strand F, et al. Toward robust mammography-based models for breast cancer risk. Sci Transl Med. 2021;13(578):eaba4373. [DOI] [PubMed] [Google Scholar]

[R11] 11.Sechopoulos I, Teuwen J, Mann R. Artificial intelligence for breast cancer detection in mammography and digital breast tomosynthesis: State of the art. Semin Cancer Biol. 2021;72:214–225. [DOI] [PubMed] [Google Scholar]

[R12] 12.Ammenwerth E. Technology acceptance models in health informatics: TAM and UTAUT. Stud Health Technol Inform. 2019;263:64–71. [DOI] [PubMed] [Google Scholar]

[R13] 13.Fenton JJ, Taplin SH, Carney PA, et al. Influence of computer-aided detection on performance of screening mammography. N Engl J Med. 2007;356(14):1399–1409. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Lehman CD, Wellman RD, Buist DSM, et al. Diagnostic accuracy of digital screening mammography with and without computer-aided detection. JAMA Intern Med. 2015;175(11):1828–1837. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Gao Y, Geras KJ, Lewin AA, Moy L. New frontiers: An update on computer-aided diagnosis for breast imaging in the age of artificial intelligence. AJR Am J Roentgenol. 2019;212(2):300–307. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Brenner RJ, Ulissey MJ, Wilt RM. Computer-aided detection as evidence in the courtroom: potential implications of an appellate court’s ruling. AJR Am J Roentgenol. 2006;186(1):48–51. [DOI] [PubMed] [Google Scholar]

[R17] 17.Ryan M. Discrete choice experiments in health care. BMJ. 2004;328(7436):360–361. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.de Bekker-Grob EW, Ryan M, Gerard K. Discrete choice experiments in health economics: a review of the literature. Health Econ. 2012;21(2):145–172. [DOI] [PubMed] [Google Scholar]

[R19] 19.Johnson FR, Lancsar E, Marshall D, et al. Constructing experimental designs for discrete-choice experiments: report of the ISPOR Conjoint Analysis Experimental Design Good Research Practices Task Force. Value Health. 2013;16(1):3–13. [DOI] [PubMed] [Google Scholar]

[R20] 20.Coast J, Horrocks S. Developing attributes and levels for discrete choice experiments using qualitative methods. J Health Serv Res Policy. 2007;12(1):25–30. [DOI] [PubMed] [Google Scholar]

[R21] 21.Thomas DR. A General Inductive Approach for Analyzing Qualitative Evaluation Data. American Journal of Evaluation. 2006;27(2):237–246. [Google Scholar]

[R22] 22.de Bekker-Grob EW, Donkers B, Jonker MF, Stolk EA. Sample Size Requirements for Discrete-Choice Experiments in Healthcare: a Practical Guide. Patient. 2015;8(5):373–384. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] 23.Harris PA, Taylor R, Thielke R, et al. A metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform. 2009;42(2):377–381. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] 24.Harris PA, Taylor R, Minor BL, et al. The REDCap consortium: Building an international community of software platform partners. J Biomed Inform. 2019;95:103208. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] 25.Hauber AB, González JM, Groothuis-Oudshoorn CGM, et al. Statistical Methods for the Analysis of Discrete Choice Experiments: A Report of the ISPOR Conjoint Analysis Good Research Practices Task Force. Value Health. 2016;19(4):300–315. [DOI] [PubMed] [Google Scholar]

[R26] 26.Hess S, Rose JM. Can scale and coefficient heterogeneity be separated in random coefficients models? Transportation. 2012;39(6):1225–1239. [Google Scholar]

[R27] 27.Carroll JD, Green PE. Psychometric Methods in Marketing Research: Part I, Conjoint Analysis. J Mark Res. 1995;32(4):385–391. [Google Scholar]

[R28] 28.Heinze G, Wallisch C, Dunkler D. Variable selection - A review and recommendations for the practicing statistician. Biom J. 2018;60(3):431–449. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] 29.Tartar M, Le L, Watanabe AT, Enomoto A. Artificial Intelligence Support for Mammography: In-Practice Clinical Experience. J Am Coll Radiol. Published online October 6, 2021. [DOI] [PubMed] [Google Scholar]

[R30] 30.Hendrix N, Hauber B, Lee CI, Bansal A, Veenstra DL. Artificial intelligence in breast cancer screening: primary care provider preferences. J Am Med Inform Assoc. 2021;28(6):1117–1124. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] 31.Saslow D, Boetes C, Burke W, et al. American Cancer Society guidelines for breast screening with MRI as an adjunct to mammography. CA Cancer J Clin. 2007;57(2):75–89. [DOI] [PubMed] [Google Scholar]

[R32] 32.Green B, Chen Y. The Principles and Limits of Algorithm-in-the-Loop Decision Making. Proc ACM Hum-Comput Interact. 2019;3(CSCW):1–24.34322658 [Google Scholar]

[R33] 33.Jacobs M, Pradier MF, McCoy TH Jr, Perlis RH, Doshi-Velez F, Gajos KZ. How machine-learning recommendations influence clinician treatment selections: the example of the antidepressant selection. Transl Psychiatry. 2021;11(1):108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] 34.Haan M, Ongena YP, Hommes S, Kwee TC, Yakar D. A qualitative study to understand patient perspective on the use of artificial intelligence in radiology. J Am Coll Radiol. 2019;16(10):1416–1419. [DOI] [PubMed] [Google Scholar]

[R35] 35.Ongena YP, Yakar D, Haan M, Kwee TC. Artificial Intelligence in Screening Mammography: A Population Survey of Women’s Preferences. J Am Coll Radiol. 2021;18(1 Pt A):79–86. [DOI] [PubMed] [Google Scholar]

[R36] 36.Beebe TJ, Jacobson RM, Jenkins SM, Lackore KA, Rutten LJF. Testing the impact of mixed-mode designs (mail and web) and multiple contact attempts within mode (mail or web) on clinician survey response. Health Serv Res. 2018;53 Suppl 1:3070–3083. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Radiologist preferences for artificial intelligence-based decision support during screening mammography interpretation

Nathaniel Hendrix, PharmD, PhD

Kathryn P Lowry, MD

Joann G Elmore, MD, MPH

William Lotter, PhD

A Gregory Sorensen, MD

William Hsu, PhD

Geraldine J Liao, MD

Sana Parsian, MD

Suzanne Kolb, MPH

Arash Naeim, MD

Christoph I Lee, MD, MS

Abstract

Background:

Purpose:

Materials and Methods:

Results:

Conclusion:

Introduction

Methods

Development of experimental instruments

Table 1:

Experimental design

Figure 1:

Administration

Analysis

Results

Table 2:

Preferences for AI-based breast cancer detection

Figure 2:

Figure 3:

Preferences for AI-based future breast cancer risk prediction

Figure 4:

Figure 5:

Table 3:

Discussion

Supplementary Material

Funding information:

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases