Alcohol use disorders (AUD) are very common in the developed world [1], yet only a minority of individuals with AUD seek treatment. Several factors influence the choice to seek treatment, including demographic, psychological and physical impediments. Integrating information from a number of disparate data sources is challenging. In this issue of EClinicalMedicine, Lee et al. [2] report a machine learning analysis that classified individuals with AUD as either treatment seekers or non-seekers. Notable strengths of this study included the examination of a wide range of predictor variables, the application of an innovative data analysis method (alternating decision trees; ADTs), and the use of an external validation sample to quantify reproducibility. There are, however, caveats that apply to the use of machine-learning methods in biomedical research.
Traditional (inferential) statistics relies on having many more participants than variables, whereas machine learning methods can accommodate many variables relative to the number of participants (see Ref. [3] for an accessible treatment of the differences between these approaches). There are many types of machine learning algorithms: the choice of ADTs in Lee et al. is particularly apt because the ADT output is interpretable and therefore more likely to be useful in a clinical context. This interpretability stands in contrast to methods such as deep learning, which may yield a better prediction but produce an output that cannot be easily understood [4]. ADTs also promote ‘sparse’ solutions. That is, the final model makes a prediction based only a subset of the original variables. For example, compared with logistic regression, the ADT performed similarly in terms of its ability to predict treatment-seeking status, but only required 10 variables compared with 24 required for logistic regression, saving approximately 3 h in assessment time.
Machine learning uses out-of-sample validation to evaluate model performance (i.e., the prediction accuracy on a previously unseen participant), a measure that more closely mimics clinical scenarios. This again contrasts with the traditional statistical approach that relies primarily on p values, with the complication of deciding if a significant p value is clinically meaningful or not. In machine learning applications to biomedicine, out-of-sample validation is usually achieved using internal cross-validation, where the dataset is split into a number of portions, and each portion is included once in a test set. Internal cross-validation can, in some situations, be over-optimistic about the prediction accuracy. External cross-validation – the application of the model to an entirely new dataset – is regarded as the gold standard validation method (see Ref. [5] for an overview of validation methods), but is uncommon in biomedicine due to the challenges in collecting additional data. Lee et al.'s use of an external validation dataset, producing similar results to the internal validation dataset, supports the reproducibility of their findings.
A challenge for the application of machine learning to biomedical data is that, while ADT models are quite transparent (compared to deep learning), there are subtleties to the model that need careful consideration. For example, African-American race predicted treatment non-seeking, but only applied when depression symptoms were high, and removing race did not materially alter classification accuracy. Caution is needed to ensure that variables such as race are not misinterpreted [6]. Furthermore, there can be downsides to the use of machine learning to inform medical decision making: these potentially include a loss of physician skills and an overreliance on data to the detriment of patient interaction [7].
Future studies could further examine some intriguing results in Lee et al., notably the different models for males vs. females. While the male-only analysis was similar to the gender non-specific model (expected because the overall sample was primarily male), the psychological construct of sensation seeking was the exception. This is potentially important because sensation seeking is associated both with alcohol-use initiation and heavier use in young adulthood [8]. The finding that high sensation seeking is also associated with a reluctance to seek treatment (in men) suggests that this psychological trait has relevance throughout the trajectory of alcohol use. In contrast, the female-only model differed from the gender non-specific model, incorporating altruism, self-consciousness, a history of sexual abuse and the number of dependence criteria. Future work should endeavour to examine data from a larger sample of females, as Lee et al.'s data supports the need to address sex and gender differences in biomedical research [9], [10].
Overall, Lee et al.'s study represents an exciting development in addiction research. The inclusion of a diverse set of variables, a sophisticated and appropriate machine learning algorithm, plus the use of an external validation dataset points the way for future studies of this kind.
Author Contributions
RW wrote this commentary.
Declaration of Competing Interest
RW has nothing to disclose.
References
- 1.Connor J.P., Haber P.S., Hall W.D. Alcohol use disorders. Lancet. 2016;387:988–998. doi: 10.1016/S0140-6736(15)00122-1. [DOI] [PubMed] [Google Scholar]
- 2.Lee M.R., Sankar V., Hammer A. Using machine learning to classify individuals with alcohol use disorder based on treatment seeking status. EClinicalMedicine. 2019 doi: 10.1016/j.eclinm.2019.05.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Bzdok D., Altman N., Krzywinski M. Statistics versus machine learning. Nat Methods. 2018;15:233–234. doi: 10.1038/nmeth.4642. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Woo C.W., Chang L.J., Lindquist M.A., Wager T.D. Building better biomarkers: brain models in translational neuroimaging. Nat Neurosci. 2017;20:365–377. doi: 10.1038/nn.4478. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Gillan C.M., Whelan R. What big data can do for treatment in psychiatry. Curr Opin Behav Sci. 2017;18:34–42. [Google Scholar]
- 6.Zou J., Schiebinger L. AI can be sexist and racist — it's time to make it fair. Nature. 2018;559:324–326. doi: 10.1038/d41586-018-05707-8. [DOI] [PubMed] [Google Scholar]
- 7.Cabitza F., Rasoini R., Gensini G.F. Unintended consequences of machine learning in medicine. JAMA. 2017;318(6):517–518. doi: 10.1001/jama.2017.7797. [DOI] [PubMed] [Google Scholar]
- 8.Padilla M.M., O'Halloran L., Bennett M., Cao Z., Whelan R. Impulsivity and reward processing endophenotypes in youth alcohol misuse. Curr Addict Rep. 2017;4:350–363. [Google Scholar]
- 9.Tuchman E. Women and addiction: the importance of gender issues in substance abuse research. J Addict Dis. 2010;29:127–138. doi: 10.1080/10550881003684582. [DOI] [PubMed] [Google Scholar]
- 10.Clayton J.A. Studying both sexes: a guiding principle for biomedicine. FASEB J. 2015;30:519–524. doi: 10.1096/fj.15-279554. [DOI] [PMC free article] [PubMed] [Google Scholar]