Table 1.
Study | Study design | Population | Mammography vendor | Index test | Comparator | Reference standard |
---|---|---|---|---|---|---|
Lotter 202128 | Enriched test set MRMC laboratory study (accuracy of a read) | 285 women from 1 US health system with 4 centres (46.0% screen detected cancer); age and ethnic origin NR | Hologic 100% | In-house AI system (DeepHealth); threshold NR (set to match readers’ sensitivity and specificity, respectively) | 5 MQSA certified radiologists (US), single reading; threshold of BI-RADS scores 3, 4, and 5 considered recall | Cancer: pathology confirmed cancer within 3 months of screening; confirmed negative: a negative examination followed by an additional BI-RADS score 1 or 2 interpretation at the next screening examination 9-39 months later |
McKinney 202029 | Retrospective test accuracy study (accuracy of a read) | 3097 women from 1 US centre (22.2% cancer within 27 months of screening); age <40, 181 (5.8%); 40-49, 1259 (40.7%); 50-59, 800 (25.8%); 60-69, 598 (19.3%); ≥70, 259 (8.4%) | Hologic / Lorad branded: >99%; Siemens or General Electric: <1% |
In-house AI system (Google Health); threshold: to achieve superiority for both sensitivity and specificity compared with original single reading using validation set | Original single radiologist decision (US); threshold: BI-RADS scores 0, 4, 5 were treated as positive | Cancer: biopsy confirmed cancer within 27 months of imaging; non-cancer: one follow-up non-cancer screen or biopsied negative (benign pathologies) after ≥21 months |
Rodriguez-Ruiz 201933 | Enriched test set MRMC laboratory study (accuracy of a read) | 199 examinations from a Dutch digital screening pilot project (39.7% cancer); age range 50-74 |
Hologic 100% | Transpara version 1.4.0 (Screenpoint Medical BV, Nijmegen, Netherlands); threshold: 8.26/10, corresponding to the average radiologist’s specificity |
Nine Dutch radiologists, single reading, as part of a previously completed MRMC study38; no threshold | Cancer: histopathology-proven cancer; non-cancer: ≥1 normal follow-up screening examination (2 year screening interval) |
Salim 202035 |
Retrospective test accuracy study (accuracy of a read) | 8805 women from a Swedish cohort study (8.4% cancer within 12 months of screening); median age 54.5 (IQR 47.4-63.5) | Hologic 100% | 3 commercial AI systems (anonymised: AI-1, AI-2, and AI-3); threshold: corresponding to the specificity of the first reader | Original radiologist decision (Sweden); (1) single reader (R1; R2), (2) consensus reading; no threshold | Cancer: pathology confirmed cancer within 12 months of screening; non-cancer: ≥2 years cancer free follow-up |
Schaffter 202036 | Retrospective test accuracy study (accuracy of a read) | 68 008 consecutive women from 1 Swedish centre (1.1% cancer within 12 months of screening) mean age 53.3 (SD 9.4) | NR | 4 in-house AI systems: 1 top performing model submitted to the DREAM challenge, 1 ensemble method of the eight best performing models (CEM), CEM combined with reader decision (single reader or consensus reading); threshold: corresponding to the sensitivity of single and consensus reading, respectively | Original radiologist decision (Sweden); (1) single reader (R1; R2), (2) consensus reading; no threshold | Cancer: tissue diagnosis within 12 months of screening; non-cancer: no cancer diagnosis ≥12 months after screening |
AI=artificial intelligence; BI-RADS=breast imaging reporting and data system; CEM=challenge ensemble method; DREAM=Dialogue on Reverse Engineering Assessment and Methods; IQR=interquartile range; MQSA=Mammography Quality Standards Act; MRMC=multiple reader multiple case; NR=not reported; R1=first reader; R2=second reader; SD=standard deviation.