Artificial intelligence in breast cancer screening: primary care provider preferences

Nathaniel Hendrix; Brett Hauber; Christoph I Lee; Aasthaa Bansal; David L Veenstra

doi:10.1093/jamia/ocaa292

. 2020 Dec 23;28(6):1117–1124. doi: 10.1093/jamia/ocaa292

Artificial intelligence in breast cancer screening: primary care provider preferences

Nathaniel Hendrix ^1,^✉, Brett Hauber ^1,², Christoph I Lee ^3,^4,⁵, Aasthaa Bansal ¹, David L Veenstra ¹

PMCID: PMC8200265 PMID: 33367670

Abstract

Background

Artificial intelligence (AI) is increasingly being proposed for use in medicine, including breast cancer screening (BCS). Little is known, however, about referring primary care providers’ (PCPs’) preferences for this technology.

Methods

We identified the most important attributes of AI BCS for ordering PCPs using qualitative interviews: sensitivity, specificity, radiologist involvement, understandability of AI decision-making, supporting evidence, and diversity of training data. We invited US-based PCPs to participate in an internet-based experiment designed to force participants to trade off among the attributes of hypothetical AI BCS products. Responses were analyzed with random parameters logit and latent class models to assess how different attributes affect the choice to recommend AI-enhanced screening.

Results

Ninety-one PCPs participated. Sensitivity was most important, and most PCPs viewed radiologist participation in mammography interpretation as important. Other important attributes were specificity, understandability of AI decision-making, and diversity of data. We identified 3 classes of respondents: “Sensitivity First” (41%) found sensitivity to be more than twice as important as other attributes; “Against AI Autonomy” (24%) wanted radiologists to confirm every image; “Uncertain Trade-Offs” (35%) viewed most attributes as having similar importance. A majority (76%) accepted the use of AI in a “triage” role that would allow it to filter out likely negatives without radiologist confirmation.

Conclusions and Relevance

Sensitivity was the most important attribute overall, but other key attributes should be addressed to produce clinically acceptable products. We also found that most PCPs accept the use of AI to make determinations about likely negative mammograms without radiologist confirmation.

Keywords: artificial intelligence, breast cancer screening, discrete choice, primary care, conjoint analysis

INTRODUCTION

The capacity of breast cancer screening (BCS) to reduce cancer mortality has been well-proven.¹ Its high false positive rate, however, means that participating women have a substantial probability of undergoing unnecessary medical care.^2–4 Using artificial intelligence (AI) to interpret screening mammograms has been proposed as a way of lowering the false positive rate while maintaining the benefits of BCS to cancer mortality.⁵^,⁶ The US Food & Drug Administration’s first approval of AI BCS took place in 1998 for a technology called computer-aided detection, which later studies revealed to be ineffective in clinical settings.^7–9 Recent advances in AI BCS, such as convolutional neural networks and deep learning, have led to the approval of a new generation of algorithms that promise to detect more cancers, reduce radiologist burden by triaging negative screens out of their workflow, and predict future cancer risk.^10–12

Despite the current environment of optimism around AI BCS, substantial challenges with its integration into the clinic remain unsolved. For example, some of the most advanced methods in AI produce results without any understandable interpretation of how it arrived at them, potentially lowering trust and exposing clinicians to liability for errors.¹³^,¹⁴ The quality of trials for clinical AI has been low, perhaps exacerbated by the fact that these trials are not required for FDA approval.¹⁵ Finally, with many AI developers focusing on the availability of data rather than their representativeness, AI may exacerbate existing health disparities.¹⁶^,¹⁷ For these reasons and more, simply improving AI’s accuracy to match or exceed that of human clinicians is only a first step in its implementation.¹⁸ Gaining the trust of multiple stakeholders from clinicians to patients, payers, and regulators is essential before the promised benefits of AI BCS can reach patients.¹⁹

In order to understand how AI developers can best meet the needs of one key stakeholder, we sought to quantify the relative importance of different attributes of AI BCS to primary care providers (PCPs) who commonly order screening mammography. We conducted a discrete choice experiment (DCE) among PCPs using hypothetical AI BCS algorithms with a range of different performance characteristics. Discrete choice experiments are used to quantify the strength of stakeholders’ preferences for different attributes of a technology, and their results can be applied to cost-effectiveness analyses, research prioritization, and regulatory decisions.²⁰^,²¹ We chose to focus on PCPs because patients often make breast cancer screening decisions with their PCPs. Our hypothesis was that sensitivity (ability to detect additional cancers) and the understandability of decisions made by AI would be the most important attributes to ordering PCPs.

MATERIALS AND METHODS

We used a DCE to quantify PCP’s preferences for attributes of AI BCS by presenting them with a series of choices between two hypothetical AI products and screening by radiologist alone. We first used qualitative interviewing to determine what the most important attributes are, then constructed choice questions following an experimental design including those attributes, and distributed the DCE as part of a survey to a sample of PCPs from across the US. We analyzed responses using two models: first, with a random parameters logit (RPL) model to estimate the sample average preferences, and then with a latent class model to explain the heterogeneity by assigning respondents to classes of similar decision makers. The study was approved by the University of Washington Human Subjects Division.

Instrument development

We conducted a literature search to identify candidate attributes and levels for the hypothetical products described in the DCE. The literature search informed a series of semi-structured qualitative interviews with PCPs about how they would make recommendations to their patients concerning the use of AI for BCS. Each interview transcript underwent coding using the general inductive approach to identify themes that emerged from the interviews.²² We then combined similar themes and eliminated themes that could not be independently varied or were less related to our scientific question. We chose a final set of six attributes from these themes that included sensitivity, specificity, radiologist involvement, understandability of AI decision-making, supporting evidence, and diversity of training data. These attributes are shown with their respective levels in Table 1. In order to provide a basis of comparison to participants, we included notes next to the sensitivity and specificity attributes indicating that the average US radiologist misses approximately 15% of cancers and has an 11% false-positive rate.²³

Table 1.

Attributes and levels from the discrete choice experiment

Attribute	Levels
Sensitivity	Misses 6% of cancers Misses 11% of cancers Misses 15% of cancers
Specificity	6% of women without cancer receive a false positive 11% of women without cancer receive a false positive 15% of women without cancer receive a false positive
Radiologist involvement	All images reviewed by radiologist 30% of images most likely to contain cancer reviewed by radiologist No images reviewed by radiologist
Understandability of artificial intelligence (AI) decision-making	Decision-making rationale understandable by clinicians Decision-making rationale understandable by AI experts only Decision-making rationale not understandable
Supporting evidence	Supported by both observational data and randomized controlled trial (RCT) Supported by RCT Supported by observational data
Diversity of training and validation data	100% of patients are well-represented in data 75% of patients are well-represented in data 50% of patients are well-represented in data

Open in a new tab

Following the identification of the final set of attributes, we created a draft instrument, which we tested in a series of usability interviews with a separate sample of 4 PCPs to ensure that the content was presented in an understandable fashion. Our final design comprised a set of 15 choices, each of which included 2 hypothetical AI alternatives and the option for screening by radiologist alone. Each respondent completed all 15 choice tasks. The levels of the 6 attributes were varied such that the participant must trade off better performance on some attributes for worse performance on others. We used SAS 9.4 (Cary, North Carolina) to design the variation in levels so as to maximize D-efficiency. D-efficiency optimizes the information gathered across a participant’s choices by ensuring that each level of each attribute appears equally often and that comparisons between each possible pair of levels within a given attribute are equally frequent.²⁴^,²⁵ By selecting the experimental design that maximizes D-efficiency, we therefore maximize the probability of being able to identify main effects. To control for excluded attributes, participants were asked to imagine that all alternatives cost the same and that all positive screens followed the same pattern of diagnostic work-up. Figure 1 contains a sample task from the DCE.

Figure 1. — A sample task from the discrete choice instrument.

Study population

Physicians and nurse practitioners working as PCPs were recruited for qualitative and pretest interviews from a convenience sample of the authors’ professional networks. We calculated a minimum sample size of 30 for estimating main effects in the DCE.²⁶ We randomly selected 350 physician and nurse practitioners working as PCPs around the US to receive a mailed invitation to take part in the internet-based DCE. The names and work addresses of potential participants were gathered from the websites of university-affiliated practices, which we identified using a search engine. Snowball recruitment was also allowed.

Statistical analysis

The outcome of interest in our statistical analyses was the impact of each attribute level on the probability that PCPs would recommend AI-augmented screening over screening by radiologist alone. We first used a RPL model, which provides the sample average change in the probability of choosing a given option with certain characteristics. The RPL allows the utility function underlying these choices to vary for each individual. We included in the regression an alternative-specific constant for the option to recommend radiologist alone. This was intended to account for unobserved attributes associated with screening by radiologist alone and to test for status quo bias. Status quo bias refers to the manner in which commitment to the current way of doing things and aversion to risk create resistance to change.²⁷^,²⁸ All attributes of each alternative were modeled as categorical variables using effects coding.

We next used a latent class model to identify latent preference segments in the data. The latent class model allows us to explain the heterogeneity that had been included in the RPL model.²⁹ We began with a 2-class model and increased the number of classes until the model no longer converged. The model included the same utility specification as the RPL model, plus the respondent-level covariates collected in the demographic questions after the DCE. Both regressions were conducted in Stata 16.1 (College Station, Texas). A more complete description of both models is available in the Supplementary Appendix.

We calculated the relative importance of each attribute by subtracting the coefficient of the least-preferred level for that attribute from the coefficient of the most-preferred level of that attribute. Then, each difference was divided by the sum of differences for all six attributes to arrive at a percentage describing relative importance.

We modeled baseline preference for AI in both regressions with the following formula:

P r (A I) = \frac{1}{1 + e^{- A S C}}

where ASC is the coefficient of the alternative-specific constant for recommending radiologist alone.³⁰^,³¹

The influence of each attribute on the willingness to recommend AI-based screening was tested by entering their respective coefficients, one-by-one, into the following formula:

P r (A I) = \frac{1}{1 + e^{- (β - A S C)}}

where β is a coefficient from the regression results. We report from the RPL model the difference in willingness to recommend AI-augmented screening versus screening by radiologist alone based on individual attribute levels. From the latent class model, we report the relative importance of each attribute for the different preference segments.

Complete replication data and code are available on Dataverse.³²

RESULTS

Participants

Ninety-one PCPs responded to the survey. Among these, 22 responded to mailed invitations, and 69 took part in response to snowball sampling. An overall response rate cannot be calculated due to the use of snowball sampling, but 6% of recipients of mailed invitations had responded before we closed data collection. Respondents had a mean of 8.8 (standard deviation: 9.8) years of clinical practice. A plurality (44.0%) was from the Western US region, and a majority (62.6%) identified as female. Generally, respondents expressed a neutral attitude toward technology with 60 (65.9%) describing themselves as using technology when others around them do. They reported mostly positive experiences working with radiologist colleagues. Asked about the ease of following up on questions with a radiologist, 73 (80.2%) said that it was somewhat or very easy. Similarly, 82 (90.1%) reported having a somewhat or very high level of trust in their radiologist colleagues. Detailed respondent characteristics are available in Table 2.

Table 2.

Respondent characteristics (n = 91)

Parameter	n (%)
Attitude toward technology
I love new technology…	3 (3.3)%
I like new technology…	19 (20.9%)
I use technology when others around me do	60 (65.9%)
I am usually one of the last to use a new technology	6 (6.6%)
I am skeptical of new technology…	3 (3.3%)
Ease of contacting radiologist colleagues
Very easy	37 (40.7%)
Somewhat easy	36 (39.6%)
Neither easy nor difficult	9 (9.9%)
Somewhat difficult	9 (9.9%)
Very difficult	0 (0%)
Trust in radiologist colleagues
Very high	39 (42.9%)
Somewhat high	43 (47.3%)
Moderate	8 (8.8%)
Somewhat low	1 (1.1%)
Very low	0 (0%)
Region
Midwest	24 (26.4%)
Northeast	10 (11.0%)
South	17 (18.7%)
West	40 (44.0%)
Gender
Female	57 (62.6%)
Male	33 (36.3%)
Other	1 (1.1%)
Years of clinical practice	Mean: 8.8 (Range: 0–44)

Open in a new tab

Two respondents (2.2%) always selected the alternative with higher sensitivity, and one respondent (1.1%) always selected the alternative with higher specificity. Four (4.4%) participants always chose radiologist alone over both AI products.

Effect of AI attributes on screening recommendation

We found in the RPL model that the attribute most likely to decrease PCPs’ willingness to recommend AI was a lack of improvement in sensitivity over radiologist alone: this reduced the probability of recommending AI by 0.36 (95% confidence interval (CI): 0.31–0.38). Improving sensitivity by 9 percentage points over radiologist alone, on the other hand, was the attribute that most increased willingness to recommend AI. This attribute was associated with an increase of 0.36 (95% CI: 0.29–0.42) in the probability of recommending AI. We detected no statistically significant status quo bias in this model: the probability of selecting AI over radiologist alone was 0.08 higher (95% CI: −0.01–0.16), given otherwise equal utility.

The changes in probability of recommending AI associated with all attributes can be seen in Figure 2, with full regression coefficients available in Supplementary Appendix Table 1. Respondents were equally likely to recommend an AI-based product if all its decisions were double-checked by a radiologist and if the AI alone interpreted likely negative screens without a radiologist ever examining the images. Finally, supporting AI-augmented screening with both observational studies and randomized controlled trials (RCTs) was not preferred over RCTs alone.

Classes of decision makers

We successfully modeled 2- and 3-class versions of the latent class model; the 4-class version did not converge. Both operational models had similar goodness-of-fit, but the 3-class model offered more information. Thus, we present the relative importance of attributes for the 3-class model here (Figure 3). The regression coefficients for both models and relative importance of attributes for the 2-class model are in the Supplementary Appendix.

Figure 3. — Relative importance of attributes of artificial intelligence-augmented screening for members of the three classes modeled in the latent class model (Error bars represent the 95% confidence interval. Values in parenthesis after each class in the legend indicate the probability of class membership among the respondents.).

Each class had distinct preferences for the importance of the attributes. Respondents had a 41% probability of assignment to Class 1, which we call “Sensitivity First.” Sensitivity was at least twice as important as any other attribute for members of this class. Additionally, members of the “Sensitivity First” class preferred radiologists confirming only likely positives to confirming all screens. Members of Class 2, which we refer to as “Against AI Autonomy,” valued radiologist confirmation of all screens as the co-equal most important attribute, along with specificity. Sensitivity was less important than specificity or radiologist involvement for this class, to which respondents had a 24% probability of being assigned. Finally, we call Class 3 the “Uncertain Trade-Offs” group because all attributes were of similar importance to these respondents. As such, what could convince them to recommend AI-augmented screening was less clear than for other classes. Respondents had a 35% probability of assignment to this class. Members of the “Sensitivity First” and “Uncertain Trade-Offs” classes did not significantly prefer radiologist confirmation of all images over radiologist confirmation only of likely positives. Of these classes, only “Against AI Autonomy” did not show a statistically significant status quo bias. For members of the “Sensitivity First” class, the probability of selecting radiologist alone over AI-augmented screening was 0.18 higher (95% CI: 0.11–0.24) given otherwise equal utility; for members of the “Uncertain Trade-Offs” class, the probability of selecting radiologist alone was 0.23 higher (95% CI: 0.17–0.27).

Clinicians with more years in practice were significantly more likely to be in the “Sensitivity First” or “Against AI Autonomy” groups compared to the “Uncertain Trade-Offs” group (odds ratio: 1.18 per additional year [95% CI: 1.07–1.30] for “Sensitivity First,” 1.15 [95% CI: 1.03–1.27] for “Against AI Autonomy”). Most other predictors of class membership were nonsignificant, and participants’ trust in radiologist colleagues was removed from the class membership model due to incalculable confidence intervals. Supplementary Appendix Table 1 shows all coefficients from the class membership model.

DISCUSSION

This study demonstrated that there are several potential paths to the development of AI BCS that would be acceptable to PCPs. Improvements in sensitivity have been the focus of many AI researchers and were highly valued by most respondents as well. We found, however, that translating AI BCS into the clinic does not need to wait for these technological improvements. Instead, refining policies around how radiologists work with AI, what data are used in the development of AI, and how studies can support its effectiveness claims all contribute meaningfully to PCPs’ decisions around whether to recommend AI to their patients.

Respondents generally agreed on the value of sensitivity as the most important attribute of AI BCS. Developing AI BCS with sensitivity greater than that of radiologists was particularly valued. This prioritization of sensitivity over specificity by PCPs is in contrast to public health researchers’ focus on false positives as a major harm of screening. Another area of broad agreement among respondents was their preference for RCTs over observational studies. Using both types of studies to support the effectiveness claims of AI BCS was not preferred over RCT alone. This was despite the common stance among methodologists that clinical trials for diagnostic AI are impractical and among policy-makers that they may be unnecessary.³³

We also identified meaningful heterogeneity in PCPs’ preferences, which we explained by using a latent class model to group respondents into 3 classes. The area of greatest difference between these groups was in their attitude toward radiologist confirmation of images. Respondents had a 76% probability of belonging to a latent class that supported the use of AI to triage negative screens and to refer likely positives to radiologists. One class, however, defined itself largely by its opposition to any unattended use of AI. The members of this “Against AI Autonomy” class were the only ones to not exhibit status quo bias, as shown by the statistically nonsignificant value of the alternative specific constant of radiologist alone (Supplementary Appendix Table 1). We interpret this lack of bias as indicating openness to AI combined with caution around its implementation.

Our findings have several implications. First, we found that PCPs are largely supportive of using AI in a triage setting. This novel framework for human-AI collaboration would deploy AI at a high sensitivity/low specificity setting such that likely positives are sent to radiologists for review, while likely negatives receive an immediate determination.³⁴^,³⁵ A recent publication described an AI-based system used in this way that achieved a 99.9% negative predictive value while filtering out approximately 40% of mammograms that did not need radiologist review. Primary care providers’ acceptance of using AI in a triage setting means that other workflows with clearer analogies to present-day operations are unnecessary. For example, some studies have used AI as a second reader for BCS images, where AI is treated as if it were a radiologist.³⁶ Other studies have used radiologist input as if it were from an AI and integrated it into an ensemble with AI models.²³ The triage workflow instead uses the input of radiologists and AI to optimize for their unique strengths.

Our results also suggest that the PCPs who are most likely to accept the use of a triage system are those who are also most likely to view improved sensitivity as the main appeal of AI BCS. It is therefore necessary to study the overall sensitivity of a collaborative AI-radiologist team to ensure that it improves sensitivity over radiologist alone without negatively impacting specificity. Existing AI algorithms and radiologists detect somewhat different cases, suggesting that combining decisions from both could produce better performance than either one alone.³⁵ However, putting the triage workflow into practice would meaningfully change the decision task of radiologists by increasing the prevalence of positives as well as by providing a tacit “positive” label from the algorithm. Prior work testing radiologists’ accuracy at mammogram interpretation for a nonrepresentative sample of women suggests that training may be necessary to avoid poor performance at interpreting images triaged by AI.³⁷

Our work also shows how researchers can create clinically acceptable technologies by refining other attributes if they are only capable of producing minor improvements in AI’s sensitivity. In the absence of a technological breakthrough enabling sensitivity over 90%, our study provides evidence that improving other elements of AI BCS may make it clinically appealing for PCPs. Highly diverse training data and explanations of AI decision-making that are understandable by clinicians were both important to most respondents. Both of these attributes contribute to an algorithm’s generalizability: diverse data by ensuring that the AI operates correctly on a wide range of patients and understandable decision-making by showing when AI is considering inappropriate criteria for a judgment.³⁸ Improving the quality of clinical trials for AI BCS to include well-designed RCTs would also increase the technology’s appeal to PCPs.¹⁵

Our analysis is novel, as relatively little work has been published on clinicians’ preferences for AI. One survey of senior specialist physicians in the United Kingdom found liability, accuracy, understandability, and quality of supporting evidence to be primary concerns.³⁹ Qualitative work among PCPs in the United Kingdom found that, while many were concerned about AI’s potential to disrupt the doctor–patient bond or to miss atypical presentations, participants were hopeful that AI could reduce inefficiencies by triaging uncomplicated patients and providing faster results.⁴⁰ Another study among physicians in New Zealand focused on understandability and found that 88% of respondents were more likely to trust an AI algorithm that produced an understandable explanation of its decisions.⁴¹ With the exception of liability, these studies are in general agreement with our results.

Our findings regarding the importance of understandability also deserve mention because of the importance that this topic has taken on among clinical commentators. Prior reports have suggested that the understandability of AI decision-making is vital for its application in clinics.¹⁴^,⁴² This has been supported by the survey mentioned above that showed how understandability correlates with acting on AI’s recommendations.⁴¹ While our findings bolster current knowledge about clinicians’ preferences for explainable AI, they simultaneously undercut the idea of its paramount importance, at least among PCPs. We find, instead, that poor understandability can be offset by other attributes such as high sensitivity, high specificity, or excellent diversity in training data. Interpretable AI supports tests of generalizability that are essential in quality assurance and regulatory processes, however, and therefore should not be ignored.³⁸^,⁴³ Understandability is also connected to providers’ liability for using AI.¹³

Our study has several limitations. The attitudes that informed our respondents’ choices are likely to change as they encounter more examples of clinical AI. Our study can be used as a baseline to measure these changes, but this means that its results may appear unduly positive or negative in retrospect. Its results also cannot be used to predict uptake, which would depend on the input of other stakeholders such as patients, payers, and radiologists. We were unable to evaluate all potentially relevant attributes, including cost. We attempted to determine the most important attributes but may have missed attributes with similar importance. Changing the set of included attributes may have shifted the estimated impact of the attributes we included. Finally, the generalizability of our sample is unknown. Despite our efforts at recruiting a representative sample of PCPs, we relied on snowball sampling, which means that respondent selection was not entirely independent. It also means we may have underestimated the size of the confidence intervals.

While our study provides a useful start to quantitative translational research in AI for cancer screening, much work remains to be done. Future studies should examine the preferences of other stakeholders in the decision to implement AI BCS: patients, payers, and radiologists will all play important roles in deciding on the use of AI technologies. Work examining the performance of different configurations for the collaboration between AI and clinicians should be done and, with it, a more detailed examination of clinician’s willingness to trade improvements in AI’s accuracy for granting it increased autonomy in decision-making. Finally, we recommend that similar studies be conducted periodically to assess how exposure to clinical AI alters PCPs’ willingness to recommend an expanded role for AI, particularly in cancer screening.

CONCLUSION

Much has been written about what AI needs before it can become clinically acceptable. In this study, we have quantified the impact of 6 highly relevant attributes of AI on PCPs’ decision to recommend it to their patients for BCS. We find evidence that technical advances that allow for greater diagnostic accuracy are important but are not the only way to produce appealing AI products. We also find support for using AI to interpret some mammograms without radiologist confirmation, which opens up the possibility of developing innovative human-machine collaborations that can reduce radiologist burden and improve the efficiency of BCS.

FUNDING

The authors gratefully acknowledge the University of Washington School of Pharmacy’s financial support for recruitment. NH is supported by a training grant from the National Cancer Institute (T32 CA009168) and by a predoctoral fellowship in outcomes research from The PhRMA Foundation. CIL is supported by a research grant from the National Cancer Institute (R37 CA240403). Study data were collected and managed using REDCap electronic data capture tools hosted at the Institute of Translational Health Sciences (ITHS). REDCap at ITHS is supported by the National Center for Advancing Translational Science of the National Institutes of Health (UL1 TR002319, KL2 TR002317, and TL1 TR002318).

AUTHOR CONTRIBUTIONS

NH and DLV conceived the study. NH, BH, CIL, AB, and DLV designed the study. NH conducted the qualitative interviews. NH, BH, CIL, AB, and DLV created the experimental design. NH, BH, AB, and DLV designed the analyses. NH performed the analyses and created the visualizations with input from BH, CIL, AB, and DLV. NH wrote the first draft of the manuscript. All authors critically appraised it and provided feedback prior to submission.

SUPPLEMENTARY MATERIAL

Supplementary material is available at Journal of the American Medical Informatics Association online.

DATA AVAILABILITY STATEMENT

The data underlying this article are available in the Harvard Dataverse at https://doi.org/10.7910/DVN/EX4NG2.

CONFLICT OF INTEREST STATEMENT

None declared.

Supplementary Material

ocaa292_Supplementary_Data

Click here for additional data file.^{(88.9KB, docx)}

REFERENCES

1. Myers ER, Moorman P, Gierisch JM, et al. Benefits and harms of breast cancer screening: a systematic review. JAMA 2015; 314 (15): 1615–34. [DOI] [PubMed] [Google Scholar]
2. Hubbard RA, Kerlikowske K, Flowers CI, Yankaskas BC, Zhu W, Miglioretti DL.. Cumulative probability of false-positive recall or biopsy recommendation after 10 years of screening mammography: a cohort study. Ann Intern Med 2011; 155 (8): 481–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Ong M-S, Mandl KD.. National expenditure for false-positive mammograms and breast cancer overdiagnoses estimated at $4 billion a year. Health Aff (Millwood) 2015; 34 (4): 576–83. [DOI] [PubMed] [Google Scholar]
4. Nelson HD, Pappas M, Cantor A, Griffin J, Daeges M, Humphrey L.. Harms of breast cancer screening: systematic review to update the 2009 US Preventive Services Task Force recommendation. Ann Intern Med 2016; 164 (4): 256–67. [DOI] [PubMed] [Google Scholar]
5. Houssami N, Lee CI, Buist DS, Tao D.. Artificial intelligence for breast cancer screening: opportunity or hype? Breast 2017; 36: 31–3. [DOI] [PubMed] [Google Scholar]
6. Trister AD, Buist DS, Lee CI.. Will machine learning tip the balance in breast cancer screening? JAMA Oncol 2017; 3 (11): 1463–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Fenton JJ, Taplin SH, Carney PA, et al. Influence of computer-aided detection on performance of screening mammography. N Engl J Med 2007; 356 (14): 1399–409. [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Dromain C, Boyer B, Ferre R, Canale S, Delaloge S, Balleyguier C.. Computed-aided diagnosis (CAD) in the detection of breast cancer. Eur J Radiol 2013; 82 (3): 417–23. [DOI] [PubMed] [Google Scholar]
9. Lehman CD, Wellman RD, Buist DS, Kerlikowske K, Tosteson AN, Miglioretti DL.. Diagnostic accuracy of digital screening mammography with and without computer-aided detection. JAMA Intern Med 2015; 175 (11): 1828–37. [DOI] [PMC free article] [PubMed] [Google Scholar]
10. He J, Baxter SL, Xu J, Xu J, Zhou X, Zhang K.. The practical implementation of artificial intelligence technologies in medicine. Nat Med 2019; 25 (1): 30–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Business Wire. FDA-cleared artificial intelligence breast cancer diagnosis system launched by Paragon Biosciences. 2019. https://www.mpo-mag.com/contents/view_breaking-news/2019-09-12/fda-cleared-artificial-intelligence-breast-cancer-diagnosis-system-launched-by-paragon-biosciences/Accessed May 25, 2020
12. Geras KJ, Mann RM, Moy L.. Artificial intelligence for mammography and digital breast tomosynthesis: current concepts and future perspectives. Radiology 2019; 293 (2): 246–59. [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Price WN, Gerke S, Cohen IG.. Potential Liability for Physicians Using Artificial Intelligence. JAMA 2019; 322 (18): 1765. [DOI] [PubMed] [Google Scholar]
14. Reyes M, Meier R, Pereira S, et al. On the interpretability of artificial intelligence in radiology: challenges and opportunities. Radiol Artif Intell 2020; 2 (3): e190043. [DOI] [PMC free article] [PubMed] [Google Scholar]
15. Nagendran M, Chen Y, Lovejoy CA, et al. Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. BMJ 2020; 368: m689. doi: 10.1136/bmj.m689. [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Nsoesie EO. Evaluating artificial intelligence applications in clinical settings. JAMA Netw Open 2018; 1 (5): e182658. [DOI] [PubMed] [Google Scholar]
17. Obermeyer Z, Powers B, Vogeli C, Mullainathan S.. Dissecting racial bias in an algorithm used to manage the health of populations. Science 2019; 366 (6464): 447–53. [DOI] [PubMed] [Google Scholar]
18. Shah ND, Steyerberg EW, Kent DM.. Big data and predictive analytics: recalibrating expectations. JAMA 2018; 320 (1): 27. [DOI] [PubMed] [Google Scholar]
19. Dzindolet MT, Peterson SA, Pomranky RA, Pierce LG, Beck HP.. The role of trust in automation reliance. Int J Hum-Comput Stud 2003; 58 (6): 697–718. [Google Scholar]
20. Clark MD, Determann D, Petrou S, Moro D, de Bekker-Grob EW.. Discrete Choice Experiments in Health Economics: A Review of the Literature. PharmacoEconomics 2014; 32 (9): 883–902. [DOI] [PubMed] [Google Scholar]
21. Ho MP, Gonzalez JM, Lerner HP, et al. Incorporating patient-preference evidence into regulatory decision making. Surg Endosc 2015; 29 (10): 2984–93. [DOI] [PubMed] [Google Scholar]
22. Thomas DR. A general inductive approach for analyzing qualitative evaluation data. Am J Eval 2006; 27 (2): 237–46. [Google Scholar]
23. Schaffter T, Buist DSM, Lee CI, et al. ; and the DM DREAM Consortium. Evaluation of combined artificial intelligence and radiologist assessment to interpret screening mammograms. JAMA Netw Open 2020; 3 (3): e200265. [DOI] [PMC free article] [PubMed] [Google Scholar]
24. Kuhfeld WF. Experimental design: Efficiency, coding, and choice designs. In: Mark Research Methods in SAS: Experimental Design, Choice, Conjoint, and Graphical Techniques. Cary, NC: SAS Institute; 2005: 47–97. [Google Scholar]
25. Johnson F, Lancsar E, Marshall D, et al. Constructing experimental designs for discrete-choice experiments: report of the ISPOR conjoint analysis experimental design good research practices task force. Value Health 2013; 16 (1): 3–13. [DOI] [PubMed] [Google Scholar]
26. de Bekker-Grob EW, Donkers B, Jonker MF, Stolk EA.. Sample size requirements for discrete-choice experiments in healthcare: a practical guide. Patient 2015; 8 (5): 373–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
27. Kahneman D, Knetsch JL, Thaler RH.. Anomalies: the endowment effect, loss aversion, and status quo bias. J Econ Perspect 1991; 5 (1): 193–206. [Google Scholar]
28. Kim H-W, Kankanhalli A.. Investigating user resistance to information systems implementation: a status quo bias perspective. MIS Q 2009; 33 (3): 567. [Google Scholar]
29. Boeri M, Saure D, Schacht A, Riedl E, Hauber B.. Modeling heterogeneity in patients’ preferences for psoriasis treatments in a multicountry study: a comparison between random-parameters logit and latent class approaches. PharmacoEconomics 2020; 38 (6): 593–14. [DOI] [PubMed] [Google Scholar]
30. Hall J, Kenny P, King M, Louviere J, Viney R, Yeoh A.. Using stated preference discrete choice modelling to evaluate the introduction of varicella vaccination. Health Econ 2002; 11 (5): 457–65. [DOI] [PubMed] [Google Scholar]
31. van Dam L, Hol L, de Bekker-Grob EW, et al. What determines individuals’ preferences for colorectal cancer screening programmes? A discrete choice experiment. Eur J Cancer 2010; 46 (1): 150–9. [DOI] [PubMed] [Google Scholar]
32. Hendrix N. Replication Data for: “Artificial intelligence in breast cancer screening: Primary care provider preferences”, v3. Harvard Dataverse. Harvard Dataverse. doi:10.7910/DVN/EX4NG2. [DOI] [PMC free article] [PubMed]
33. Park SH, Han K.. Methodologic guide for evaluating clinical performance and effect of artificial intelligence technology for medical diagnosis and prediction. Radiology 2018; 286 (3): 800–9. [DOI] [PubMed] [Google Scholar]
34. Rodriguez-Ruiz A, Lång K, Gubern-Merida A, et al. Can we reduce the workload of mammographic screening by automatic identification of normal exams with artificial intelligence? A feasibility study. Eur Radiol 2019; 29 (9): 4825–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
35. McKinney SM, Sieniek M, Godbole V, et al. International evaluation of an AI system for breast cancer screening. Nature 2020; 577 (7788): 89–94. [DOI] [PubMed] [Google Scholar]
36. Adamson AS, Welch HG.. Machine learning and the cancer-diagnosis problem—no gold standard. N Engl J Med 2019; 381 (24): 2285–7. [DOI] [PubMed] [Google Scholar]
37. Miglioretti DL, Ichikawa L, Smith RA, et al. Correlation between screening mammography interpretive performance on a test set and performance in clinical practice. Acad Radiol 2017; 24 (10): 1256–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
38. Ribeiro MT, Singh S, Guestrin C. “Why should i trust you?”: explaining the predictions of any classifier. ArXiv160204938 Cs Stat. 2016. http://arxiv.org/abs/1602.04938Accessed October 17, 2019
39. Petkus H, Hoogewerf J, Wyatt JC.. What do senior physicians think about AI and clinical decision support systems: quantitative and qualitative analysis of data from specialty societies. Clin Med 2020; 20 (3): 324–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
40. Blease C, Kaptchuk TJ, Bernstein MH, Mandl KD, Halamka JD, DesRoches CM.. Artificial intelligence and the future of primary care: exploratory qualitative study of UK general practitioners’ views. J Med Internet Res 2019; 21 (3): e12802. [DOI] [PMC free article] [PubMed] [Google Scholar]
41. Diprose WK, Buist N, Hua N, Thurier Q, Shand G, Robinson R.. Physician understanding, explainability, and trust in a hypothetical machine learning risk calculator. J Am Med Inform Assoc 2020; 27(4): 592–600.ocz229. [DOI] [PMC free article] [PubMed] [Google Scholar]
42. Nundy S, Montgomery T, Wachter RM.. Promoting trust between patients and physicians in the era of artificial intelligence. JAMA 2019; 322 (6): 497. [DOI] [PubMed] [Google Scholar]
43. Gastounioti A, Kontos D.. Is it time to get rid of black boxes and cultivate trust in AI? Radiol Artif Intell 2020; 2 (3): e200088. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ocaa292_Supplementary_Data

Click here for additional data file.^{(88.9KB, docx)}

Data Availability Statement

The data underlying this article are available in the Harvard Dataverse at https://doi.org/10.7910/DVN/EX4NG2.

[ocaa292-B1] 1. Myers ER, Moorman P, Gierisch JM, et al. Benefits and harms of breast cancer screening: a systematic review. JAMA 2015; 314 (15): 1615–34. [DOI] [PubMed] [Google Scholar]

[ocaa292-B2] 2. Hubbard RA, Kerlikowske K, Flowers CI, Yankaskas BC, Zhu W, Miglioretti DL.. Cumulative probability of false-positive recall or biopsy recommendation after 10 years of screening mammography: a cohort study. Ann Intern Med 2011; 155 (8): 481–92. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocaa292-B3] 3. Ong M-S, Mandl KD.. National expenditure for false-positive mammograms and breast cancer overdiagnoses estimated at $4 billion a year. Health Aff (Millwood) 2015; 34 (4): 576–83. [DOI] [PubMed] [Google Scholar]

[ocaa292-B4] 4. Nelson HD, Pappas M, Cantor A, Griffin J, Daeges M, Humphrey L.. Harms of breast cancer screening: systematic review to update the 2009 US Preventive Services Task Force recommendation. Ann Intern Med 2016; 164 (4): 256–67. [DOI] [PubMed] [Google Scholar]

[ocaa292-B5] 5. Houssami N, Lee CI, Buist DS, Tao D.. Artificial intelligence for breast cancer screening: opportunity or hype? Breast 2017; 36: 31–3. [DOI] [PubMed] [Google Scholar]

[ocaa292-B6] 6. Trister AD, Buist DS, Lee CI.. Will machine learning tip the balance in breast cancer screening? JAMA Oncol 2017; 3 (11): 1463–4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocaa292-B7] 7. Fenton JJ, Taplin SH, Carney PA, et al. Influence of computer-aided detection on performance of screening mammography. N Engl J Med 2007; 356 (14): 1399–409. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocaa292-B8] 8. Dromain C, Boyer B, Ferre R, Canale S, Delaloge S, Balleyguier C.. Computed-aided diagnosis (CAD) in the detection of breast cancer. Eur J Radiol 2013; 82 (3): 417–23. [DOI] [PubMed] [Google Scholar]

[ocaa292-B9] 9. Lehman CD, Wellman RD, Buist DS, Kerlikowske K, Tosteson AN, Miglioretti DL.. Diagnostic accuracy of digital screening mammography with and without computer-aided detection. JAMA Intern Med 2015; 175 (11): 1828–37. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocaa292-B10] 10. He J, Baxter SL, Xu J, Xu J, Zhou X, Zhang K.. The practical implementation of artificial intelligence technologies in medicine. Nat Med 2019; 25 (1): 30–6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocaa292-B11] 11.Business Wire. FDA-cleared artificial intelligence breast cancer diagnosis system launched by Paragon Biosciences. 2019. https://www.mpo-mag.com/contents/view_breaking-news/2019-09-12/fda-cleared-artificial-intelligence-breast-cancer-diagnosis-system-launched-by-paragon-biosciences/Accessed May 25, 2020

[ocaa292-B12] 12. Geras KJ, Mann RM, Moy L.. Artificial intelligence for mammography and digital breast tomosynthesis: current concepts and future perspectives. Radiology 2019; 293 (2): 246–59. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocaa292-B13] 13. Price WN, Gerke S, Cohen IG.. Potential Liability for Physicians Using Artificial Intelligence. JAMA 2019; 322 (18): 1765. [DOI] [PubMed] [Google Scholar]

[ocaa292-B14] 14. Reyes M, Meier R, Pereira S, et al. On the interpretability of artificial intelligence in radiology: challenges and opportunities. Radiol Artif Intell 2020; 2 (3): e190043. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocaa292-B15] 15. Nagendran M, Chen Y, Lovejoy CA, et al. Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. BMJ 2020; 368: m689. doi: 10.1136/bmj.m689. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocaa292-B16] 16. Nsoesie EO. Evaluating artificial intelligence applications in clinical settings. JAMA Netw Open 2018; 1 (5): e182658. [DOI] [PubMed] [Google Scholar]

[ocaa292-B17] 17. Obermeyer Z, Powers B, Vogeli C, Mullainathan S.. Dissecting racial bias in an algorithm used to manage the health of populations. Science 2019; 366 (6464): 447–53. [DOI] [PubMed] [Google Scholar]

[ocaa292-B18] 18. Shah ND, Steyerberg EW, Kent DM.. Big data and predictive analytics: recalibrating expectations. JAMA 2018; 320 (1): 27. [DOI] [PubMed] [Google Scholar]

[ocaa292-B19] 19. Dzindolet MT, Peterson SA, Pomranky RA, Pierce LG, Beck HP.. The role of trust in automation reliance. Int J Hum-Comput Stud 2003; 58 (6): 697–718. [Google Scholar]

[ocaa292-B20] 20. Clark MD, Determann D, Petrou S, Moro D, de Bekker-Grob EW.. Discrete Choice Experiments in Health Economics: A Review of the Literature. PharmacoEconomics 2014; 32 (9): 883–902. [DOI] [PubMed] [Google Scholar]

[ocaa292-B21] 21. Ho MP, Gonzalez JM, Lerner HP, et al. Incorporating patient-preference evidence into regulatory decision making. Surg Endosc 2015; 29 (10): 2984–93. [DOI] [PubMed] [Google Scholar]

[ocaa292-B22] 22. Thomas DR. A general inductive approach for analyzing qualitative evaluation data. Am J Eval 2006; 27 (2): 237–46. [Google Scholar]

[ocaa292-B23] 23. Schaffter T, Buist DSM, Lee CI, et al. ; and the DM DREAM Consortium. Evaluation of combined artificial intelligence and radiologist assessment to interpret screening mammograms. JAMA Netw Open 2020; 3 (3): e200265. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocaa292-B24] 24. Kuhfeld WF. Experimental design: Efficiency, coding, and choice designs. In: Mark Research Methods in SAS: Experimental Design, Choice, Conjoint, and Graphical Techniques. Cary, NC: SAS Institute; 2005: 47–97. [Google Scholar]

[ocaa292-B25] 25. Johnson F, Lancsar E, Marshall D, et al. Constructing experimental designs for discrete-choice experiments: report of the ISPOR conjoint analysis experimental design good research practices task force. Value Health 2013; 16 (1): 3–13. [DOI] [PubMed] [Google Scholar]

[ocaa292-B26] 26. de Bekker-Grob EW, Donkers B, Jonker MF, Stolk EA.. Sample size requirements for discrete-choice experiments in healthcare: a practical guide. Patient 2015; 8 (5): 373–84. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocaa292-B27] 27. Kahneman D, Knetsch JL, Thaler RH.. Anomalies: the endowment effect, loss aversion, and status quo bias. J Econ Perspect 1991; 5 (1): 193–206. [Google Scholar]

[ocaa292-B28] 28. Kim H-W, Kankanhalli A.. Investigating user resistance to information systems implementation: a status quo bias perspective. MIS Q 2009; 33 (3): 567. [Google Scholar]

[ocaa292-B29] 29. Boeri M, Saure D, Schacht A, Riedl E, Hauber B.. Modeling heterogeneity in patients’ preferences for psoriasis treatments in a multicountry study: a comparison between random-parameters logit and latent class approaches. PharmacoEconomics 2020; 38 (6): 593–14. [DOI] [PubMed] [Google Scholar]

[ocaa292-B30] 30. Hall J, Kenny P, King M, Louviere J, Viney R, Yeoh A.. Using stated preference discrete choice modelling to evaluate the introduction of varicella vaccination. Health Econ 2002; 11 (5): 457–65. [DOI] [PubMed] [Google Scholar]

[ocaa292-B31] 31. van Dam L, Hol L, de Bekker-Grob EW, et al. What determines individuals’ preferences for colorectal cancer screening programmes? A discrete choice experiment. Eur J Cancer 2010; 46 (1): 150–9. [DOI] [PubMed] [Google Scholar]

[ocaa292-B32] 32. Hendrix N. Replication Data for: “Artificial intelligence in breast cancer screening: Primary care provider preferences”, v3. Harvard Dataverse. Harvard Dataverse. doi:10.7910/DVN/EX4NG2. [DOI] [PMC free article] [PubMed]

[ocaa292-B33] 33. Park SH, Han K.. Methodologic guide for evaluating clinical performance and effect of artificial intelligence technology for medical diagnosis and prediction. Radiology 2018; 286 (3): 800–9. [DOI] [PubMed] [Google Scholar]

[ocaa292-B34] 34. Rodriguez-Ruiz A, Lång K, Gubern-Merida A, et al. Can we reduce the workload of mammographic screening by automatic identification of normal exams with artificial intelligence? A feasibility study. Eur Radiol 2019; 29 (9): 4825–32. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocaa292-B35] 35. McKinney SM, Sieniek M, Godbole V, et al. International evaluation of an AI system for breast cancer screening. Nature 2020; 577 (7788): 89–94. [DOI] [PubMed] [Google Scholar]

[ocaa292-B36] 36. Adamson AS, Welch HG.. Machine learning and the cancer-diagnosis problem—no gold standard. N Engl J Med 2019; 381 (24): 2285–7. [DOI] [PubMed] [Google Scholar]

[ocaa292-B37] 37. Miglioretti DL, Ichikawa L, Smith RA, et al. Correlation between screening mammography interpretive performance on a test set and performance in clinical practice. Acad Radiol 2017; 24 (10): 1256–64. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocaa292-B38] 38. Ribeiro MT, Singh S, Guestrin C. “Why should i trust you?”: explaining the predictions of any classifier. ArXiv160204938 Cs Stat. 2016. http://arxiv.org/abs/1602.04938Accessed October 17, 2019

[ocaa292-B39] 39. Petkus H, Hoogewerf J, Wyatt JC.. What do senior physicians think about AI and clinical decision support systems: quantitative and qualitative analysis of data from specialty societies. Clin Med 2020; 20 (3): 324–8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocaa292-B40] 40. Blease C, Kaptchuk TJ, Bernstein MH, Mandl KD, Halamka JD, DesRoches CM.. Artificial intelligence and the future of primary care: exploratory qualitative study of UK general practitioners’ views. J Med Internet Res 2019; 21 (3): e12802. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocaa292-B41] 41. Diprose WK, Buist N, Hua N, Thurier Q, Shand G, Robinson R.. Physician understanding, explainability, and trust in a hypothetical machine learning risk calculator. J Am Med Inform Assoc 2020; 27(4): 592–600.ocz229. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ocaa292-B42] 42. Nundy S, Montgomery T, Wachter RM.. Promoting trust between patients and physicians in the era of artificial intelligence. JAMA 2019; 322 (6): 497. [DOI] [PubMed] [Google Scholar]

[ocaa292-B43] 43. Gastounioti A, Kontos D.. Is it time to get rid of black boxes and cultivate trust in AI? Radiol Artif Intell 2020; 2 (3): e200088. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Artificial intelligence in breast cancer screening: primary care provider preferences

Nathaniel Hendrix

Brett Hauber

Christoph I Lee

Aasthaa Bansal

David L Veenstra

Abstract

Background

Methods

Results

Conclusions and Relevance

INTRODUCTION

MATERIALS AND METHODS

Instrument development

Table 1.

Figure 1.

Study population

Statistical analysis

RESULTS

Participants

Table 2.

Effect of AI attributes on screening recommendation

Figure 2.

Classes of decision makers

Figure 3.

DISCUSSION

CONCLUSION

FUNDING

AUTHOR CONTRIBUTIONS

SUPPLEMENTARY MATERIAL

DATA AVAILABILITY STATEMENT

CONFLICT OF INTEREST STATEMENT

Supplementary Material

REFERENCES

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases