Abstract
There has been increased excitement around the use of machine learning (ML) and artificial intelligence (AI) in dermatology for the diagnosis of skin cancers and assessment of other dermatologic conditions. As these technologies continue to expand, it is essential to ensure they do not create or widen sex- and gender-based disparities in care. While desirable bias may result from the explicit inclusion of sex or gender in diagnostic criteria of diseases with gender-based differences, undesirable biases can result from usage of datasets with an underrepresentation of certain groups. We believe that sex and gender differences should be taken into consideration in ML/AI algorithms in dermatology because there are important differences in the epidemiology and clinical presentation of dermatologic conditions including skin cancers, sex-specific cancers, and autoimmune conditions. We present recommendations for ensuring sex and gender equity in the development of ML/AI tools in dermatology to increase desirable bias and avoid undesirable bias.
Keywords: artificial intelligence, machine learning, gender, equity, dermatology, disparities
Machine learning and artificial intelligence have broad applications in healthcare, with the potential to improve diagnosis and management of disease and increase access to care. This excitement for machine learning (ML) and artificial intelligence (AI) applications also extends to the field of dermatology. The largest body of work relates to AI applications for diagnostic support of assessing lesions for skin cancers, including using deep learning convolutional neural networks to classify images of skin lesions as melanoma or other skin cancers.1–3 Early evidence has demonstrated performance of these technologies on-par with dermatologists for skin cancer classification.1–3 AI has also been applied to the diagnosis of hair loss, using microscopy and image analysis to perform hair density measurements.4,5 Finally, additional applications include the characterization of other cutaneous conditions such as chronic wounds6,7 and autoimmune inflammatory dermatoses, such as psoriasis.8
As dermatologic applications of AI broaden and are incorporated into clinical care, especially for diagnostic support, it is essential to consider the role of sex and gender equity to ensure that uptake of these new technologies does not result in new or widened disparities in clinical care. Within dermatology, there has been much discussion on the lack of racial diversity in ML algorithms as a potential source of bias that can perpetuate health disparities for skin of color.9 Similarly, there is potential for bias if certain sexes or genders are excluded from datasets used to train and develop machine learning algorithms or if sex or gender-based differences in clinical presentation are not adequately accounted for. More broadly, AI systems trained predominately on male subjects perform worse when tested on female subjects.10 However, there is limited literature on considerations of sex or gender and AI in dermatology. Given the threats and vulnerabilities of AI to bias, increased and intentional consideration of sex and gender is critical to ensure equity in ML and AI applications in dermatology. We will focus our discussion on the applications of AI in dermatology diagnostic support for cutaneous malignancies, autoimmune inflammatory conditions, and alopecia.
In considering bias in AI, 2 types of biases have been described in ML literature: desirable and undesirable biases.11 Desirable biases result from accounting for differences between groups to allow for more accurate and tailored diagnostic and treatment plans, whereas undesirable biases occur when algorithms developed based on lack of adequate evidence or based on skewed evidence result in discrimination.11 For sex and gender, an example of a desirable bias is the inclusion of sex in diagnostic criteria of diseases with sex-based differences, whereas usage of datasets with an underrepresentation of certain genders exemplifies undesirable biases.
There are several reasons sex and gender differences should be taken into account in ML algorithms in dermatology in particular. First, there are important sex and gender-based differences in skin diseases that can impact AI performance, including in the diagnosis of skin cancers. For example, melanomas in females occur more frequently on the hip and lower extremities compared to males.12,13 Sex-specific cancers, such as vulvar neoplasms, also present differently. The majority of vulvar melanomas are mucosal lentiginous or nodular subtypes, while the superficial spreading subtype is much less common.14 These subtypes all have different gross and histologic features, and the distribution of subtypes in vulvar melanoma differs vastly from melanomas found on other parts of the body. Similarly, genital lichen sclerosus, a chronic disease that can progress to squamous cell carcinoma, has a different clinical presentation among men and women.15 Early lichen sclerosus may be characterized by color alterations and a short frenulum in males, in contrast to circumscribed blanching and mucosal fragility in females. Later stages of lichen sclerosus also present with stable fissures in different anatomical areas in both sexes. Women are often diagnosed with lichen sclerosus later in life and thus receive treatment later, highlighting the diagnostic challenges and differences by sex.15 This is consistent with Sun et al’s large-scale fairness analyses of diverse health outcomes, which found that women consistently experience longer time to diagnosis than men.16 Therefore, algorithms that are not trained and developed using lesions from a wide variety of locations or a diversity of patients may be less accurate in the diagnosis of these dermatologic conditions for particular sexes. This undesirable bias may result in delayed diagnosis of skin cancer and worsening outcomes due to differential performance based on sex and gender, especially for conditions like vulvar cancer, which already result in significant morbidity and mortality due to diagnostic challenges.17
Furthermore, there are sex-based differences in cutaneous disease prevalence, particularly for autoimmune conditions and connective tissues diseases such as lupus and scleroderma, which occur much more frequently in women.18,19 Though most AI research has been focused on skin cancer diagnosis and classification, applications are also beginning to expand to the classification of inflammatory conditions.8 To broaden AI applications for the diagnosis of inflammatory dermatoses, considering sex and gender explicitly may represent a desirable bias. Specifically, it may be particularly helpful for informing pretest probability to improve algorithm accuracy and performance. For conditions that exclusively affect certain sexes, such as Bartholin gland cysts in females or balanitis in males, the importance of explicit consideration of sex is even more essential.
Inclusion of marginalized populations is especially important given the intersectionality between skin tone—often used as a proxy for race—and gender in numerous dermatologic conditions. For example, many autoimmune causes of scarring alopecia including central centrifugal cicatricial alopecia (CCCA) and discoid lupus erythematosus (DLE) disproportionately affect women of color, necessitating adequate representation. Furthermore, women often experience more severe quality of life impairment from alopecia.20 It is thus critical to ensure that AI and ML efforts utilize images from patients of diverse skin types and genders for such conditions. To illustrate, there have been recent deep learning-based systems proposed for hair loss diagnosis using convolutional neural networks to detect hair follicles density.4,5 However, studies have shown that hair density differs by ancestry, as Black hair may have lower density and slower growth than Caucasian hair,21,22 and gender, as males have slightly lower hair density than females.21 These race and sex-based differences in conditions disproportionately affecting women of color underscore the importance of including images from diverse ancestries and genders.
We have compiled several recommendations for ensuring sex and gender equity in the development of AI and ML tools in dermatology to increase desirable bias and avoid undesirable bias (Table 1). First, to increase desirable bias, sex and gender-related information should be collected with images, incorporated into patient metadata, and reported when creating algorithms for skin cancer and dermatoses. In fact, studies have shown that integrating clinical metadata including patient sex alongside dermoscopic images generates more accurate clinical diagnoses than dermoscopy alone.3, 26 Furthermore, sex and gender composition is not routinely reported or considered in dermatologic AI research, which could mask any underlying biases in the study population.
Table 1.
Examples of desirable and undesirable biases relevant to dermatologic disease with recommendations for AI/ML algorithms
| Desirable or undesirable bias | Type of bias | Dermatologic example | Recommendations |
|---|---|---|---|
| Desirable | Sex-based differences in disease presentations |
|
|
| Sex and gender-based differences intersect with other patient demographics |
|
||
| Undesirable | Inadequate representation of all sexes and genders in ML/AI datasets | Lack of reporting in the literature on sex and gender representation in ML/AI datasets in dermatology |
|
| Unknown sex and gender-based differences in dermatologic conditions | Mixed evidence whether gender minorities have different incidence rates of skin cancer than cis-gender patients25 | Promote future research on differences in dermatologic conditions in all genders, including transgender, nonbinary, and other gender-diverse patients |
Accordingly, we also advocate for adequate representation of all genders in artificial intelligence and machine learning data and training sets. The central importance of the representativeness of the data itself in promoting fairness in machine learning diagnostic applications has been recognized more broadly in healthcare. For example, Chen et al proposed that fairness of predictions should be addressed through data collection rather than model constraints in order to minimize discrimination and maintain accuracy.27 Not only must algorithms be trained on diverse data sets, but it is also critical to intentionally examine and report evidence of no difference in performance of the model based on sex or gender in the test set. This can only be accomplished if sex and gender information is routinely collected and reported, underscoring the importance of this transparency.
It is also essential to consider differences in dermatologic conditions of all genders, including transgender, nonbinary, and other gender-diverse patient populations. These patients have unique clinical concerns, including hormone use (which can cause acne and hair disorders), and increased risk of squamous cell carcinoma of neo genitalia from patients who have undergone gender-affirming surgeries.28 However, these patient populations have historically been omitted from the medical literature, representing a knowledge gap that may perpetuate unknown, undesirable biases in machine learning applications. Thus, further research is needed to understand dermatologic considerations of gender-diverse patients.
Finally, beyond gender, algorithm developers should take into account the intersectionality of patient factors including race, sexuality, and gender for more accurate and tailored recommendations. In nonmedical applications, the consequences of disregarding issues of intersectionality manifest in examples such as lower accuracy of facial recognition software in the recognition of women of color in particular.29 For dermatology specifically, as discussed above, there are gender and ancestral differences in hair which merit consideration by AI developers interested in creating diagnostic technologies for alopecia.
Advances in ML and AI are poised to transform the delivery of dermatologic care, but the potential utility may be dampened without recognizing and addressing issues pertaining to diversity in all forms, including sex and gender. To ensure all patients can benefit from the promises of these novel technologies, adequate consideration and representation of all sexes and genders is critically important.
FUNDING
There is no funding source for this work.
AUTHOR CONTRIBUTIONS
Authors MSL, LNG, and VEN all made substantial contributions to the conception and design, drafting, and editing of the manuscript.
DATA AVAILABILITY STATEMENT
No new data were generated or analyzed in support of this research.
CONFLICT OF INTEREST STATEMENT
None declared.
REFERENCES
- 1. Esteva A, Kuprel B, Novoa RA, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 2017; 542 (7639): 115–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Zakhem GA, Fakhoury JW, Motosko CC, Ho RS. Characterizing the role of dermatologists in developing artificial intelligence for assessment of skin cancer: a systematic review. J Am Acad Dermatol 2020. doi:10.1016/j.jaad.2020.01.028 [DOI] [PubMed] [Google Scholar]
- 3. Haenssle HA, Fink C, Schneiderbauer R, et al. Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Ann Oncol Off J Eur Soc Med Oncol 2018; 29 (8): 1836–42. [DOI] [PubMed] [Google Scholar]
- 4. Olsen EA, Canfield D. SALT II: a new take on the Severity of Alopecia Tool (SALT) for determining percentage scalp hair loss. J Am Acad Dermatol 2016; 75 (6): 1268–70. [DOI] [PubMed] [Google Scholar]
- 5. Gupta AK, Ivanova IA, Renaud HJ. How good is artificial intelligence (AI) at solving hairy problems? A review of AI applications in hair restoration and hair disorders. Dermatol Ther 2021; 34 (2): e14811. [DOI] [PubMed] [Google Scholar]
- 6. Mukherjee R, Manohar DD, Das DK, Achar A, Mitra A, Chakraborty C. Automated tissue classification framework for reproducible chronic wound assessment. BioMed Res Int 2014; 2014: 851582. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Manohar Dhane D, Maity M, Mungle T, et al. Fuzzy spectral clustering for automated delineation of chronic wound region using digital images. Comput Biol Med 2017; 89: 551–60. [DOI] [PubMed] [Google Scholar]
- 8. Shrivastava VK, Londhe ND, Sonawane RS, Suri JS. A novel and robust Bayesian approach for segmentation of psoriasis lesions and its risk stratification. Comput Methods Programs Biomed 2017; 150: 9–22. [DOI] [PubMed] [Google Scholar]
- 9. Adamson AS, Smith A. Machine learning and health care disparities in dermatology. JAMA Dermatol 2018; 154 (11): 1247–8. [DOI] [PubMed] [Google Scholar]
- 10. Larrazabal AJ, Nieto N, Peterson V, Milone DH, Ferrante E. Gender imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis. Proc Natl Acad Sci USA 2020; 117 (23): 12592–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Cirillo D, Catuara-Solarz S, Morey C, et al. Sex and gender differences and biases in artificial intelligence for biomedicine and healthcare. NPJ Digit Med 2020; 3: 81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Olsen CM, Thompson JF, Pandeya N, Whiteman DC. Evaluation of sex-specific incidence of melanoma. JAMA Dermatol 2020; 156 (5): 553–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Yuan T-A, Lu Y, Edwards K, Jakowatz J, Meyskens FL, Liu-Smith F. Race-, age-, and anatomic site-specific gender differences in cutaneous melanoma suggest differential mechanisms of early- and late-onset melanoma. IJERPH 2019; 16 (6): 908.doi:10.3390/ijerph16060908 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Ragnarsson-Olding BK, Kanter-Lewensohn LR, Lagerlöf B, Nilsson BR, Ringborg UK. Malignant melanoma of the vulva in a nationwide, 25-year study of 219 Swedish females: clinical observations and histopathologic features. Cancer 1999; 86 (7): 1273–84. [PubMed] [Google Scholar]
- 15. Latini A, Cota C, Orsini D, Cristaudo A, Tedesco M. Male and female genital lichen sclerosus. Clinical and functional classification criteria. Postepy Dermatol Alergol 2018; 35 (5): 447–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Sun TY, Walk OJBD IV, Chen JL, Nieva HR, Elhadad N. Exploring gender disparities in time to diagnosis. ArXiv201106100 Cs Stat 2020. http://arxiv.org/abs/2011.06100 Accessed May 8, 2021 [Google Scholar]
- 17. Tan A, Bieber AK, Stein JA, Pomeranz MK. Diagnosis and management of vulvar cancer: a review. J Am Acad Dermatol 2019; 81 (6): 1387–96. [DOI] [PubMed] [Google Scholar]
- 18. Steen VD, Oddis CV, Conte CG, Janoski J, Casterline GZ, Medsger TA. Incidence of systemic sclerosis in Allegheny County, Pennsylvania. a twenty-year study of hospital-diagnosed cases, 1963–1982. Arthritis Rheum 1997; 40 (3): 441–5. [DOI] [PubMed] [Google Scholar]
- 19. Petri M. Epidemiology of systemic lupus erythematosus. Best Pract Res Clin Rheumatol 2002; 16 (5): 847–58. [DOI] [PubMed] [Google Scholar]
- 20. Cartwright T, Endean N, Porter A. Illness perceptions, coping and quality of life in patients with alopecia. Br J Dermatol 2009; 160 (5): 1034–9. [DOI] [PubMed] [Google Scholar]
- 21. Loussouarn G, Lozano I, Panhard S, Collaudin C, El Rawadi C, Genain G. Diversity in human hair growth, diameter, colour and shape. an in vivo study on young adults from 24 different ethnic groups observed in the five continents. Eur J Dermatol EJD 2016; 26 (2): 144–54. [DOI] [PubMed] [Google Scholar]
- 22. Birnbaum MR, McLellan BN, Shapiro J, Ye K, Reid SD. Evaluation of hair density in different ethnicities in a healthy American population using quantitative trichoscopic analysis. Skin Appendage Disord 2018; 4 (4): 304–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Qian Y, Johannet P, Sawyers A, Yu J, Osman I, Zhong J. The ongoing racial disparities in melanoma: an analysis of the Surveillance, Epidemiology, and End Results database (1975–2016). J Am Acad Dermatol 2021; 84 (6): 1585–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Singer S, Tkachenko E, Hartman RI, Mostaghimi A. Association between sexual orientation and lifetime prevalence of skin cancer in the United States. JAMA Dermatol 2020; 156 (4): 441–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Yeung H, Braun H, Goodman M. Sexual and gender minority populations and skin cancer-new data and renewed priorities. JAMA Dermatol 2020; 156 (4): 367–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Kharazmi P, Kalia S, Lui H, Wang ZJ, Lee TK. A feature fusion system for basal cell carcinoma detection through data-driven feature learning and patient profile. Skin Res Technol 2018; 24 (2): 256–64. [DOI] [PubMed] [Google Scholar]
- 27. Chen I, Johansson FD, Sontag D. Why is my classifier discriminatory? Adv Neural Inf Process Syst 2018; 31: 3543–54. http://arxiv.org/abs/1805.12002 Accessed May 8, 2021. [Google Scholar]
- 28. Sullivan P, Trinidad J, Hamann D. Issues in transgender dermatology: a systematic review of the literature. J Am Acad Dermatol 2019; 81 (2): 438–47. [DOI] [PubMed] [Google Scholar]
- 29. Buolamwini J, Gebru T. Gender shades: intersectional accuracy disparities in commercial gender classification. In: proceedings of the Conference on Fairness, Accountability and Transparency; February 23–24, 2018; New York. http://proceedings.mlr.press/v81/buolamwini18a.html Accessed May 9, 2021.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
No new data were generated or analyzed in support of this research.
