Skip to main content
Health Promotion and Chronic Disease Prevention in Canada : Research, Policy and Practice logoLink to Health Promotion and Chronic Disease Prevention in Canada : Research, Policy and Practice
. 2022 Jan;42(1):21–28. doi: 10.24095/hpcdp.42.1.04

A machine learning approach to predict e-cigarette use and dependence among Ontario youth

Jiamin Shi 1,2, Rui Fu 2,3, Hayley Hamilton 1,2, Michael Chaiton 1,2
PMCID: PMC9067014  PMID: 35044141

Abstract

Introduction:

We developed separate random forest algorithms to predict e-cigarette (vaping) ever use and daily use among Ontario youth, and subsequently examined predictor importance and statistical interaction.

Methods:

This cross-sectional study used a representative sample of Ontario elementary and high school students in 2019 (N=6471). Vaping frequency over the last 12 months was used to define ever-vaping and daily vaping. We considered a large set of individual characteristics as potential correlates for ever-vaping (176 variables) and daily vaping (179 variables). Using cross-validation, we developed random forest algorithms and evaluated model performance based on the C-index, a measure to assess the discriminatory ability of a model, for both outcomes. Further, the top 10 correlates were identified by relative importance score calculation and their interaction with sociodemographic characteristics.

Results:

There were 2064 (31.9%) ever-vapers, and 490 (7.6%) of the respondents were daily users. The random forest algorithms for both outcomes achieved high performance, with C-index over 0.90. The top 10 correlates of daily vaping included use of caffeine, cannabis and tobacco, source and type of e-cigarette and absence in last 20 school days. Those of ever-vaping included school size, use of alcohol, cannabis and tobacco; 9 of the top 10 ever-vaping correlates demonstrated interactions with ethnicity.

Conclusion:

Machine learning is a promising methodology for identifying the risks of ever-vaping and daily vaping. Furthermore, it enables the identification of important correlates and the assessment of complex intersections, which may inform future longitudinal studies to customize public health policies for targeted population subgroups.

Keywords: machine learning, vaping, smoking, Ontario, youth


Highlights

  • This study applied a machine learning methodology that allowed the inclusion of a wide range of correlates in tobacco research among youth.

  • The top 10 correlates of daily vaping included use of caffeine, cannabis and tobacco, source and type of e-cigarette and absence in last 20 school days. Those of ever-vaping included school size, and use of alcohol, cannabis and tobacco.

  • Future longitudinal studies could verify the most important correlates of ever-vaping and daily vaping identified, potentially informing policies to prioritize strategies for issues related to substance use.

  • Analysis of interactions quantified interaction strengths amongst important correlates and sociodemographic characteristics, which could be further explored by future longitudinal studies.

Introduction

Research has shown that the prevalence of vaping nicotine increased rapidly among North American youth aged 16 to 19 years from 2017 to 2018.1 In particular, the ever-vaping percentage increased from 29.3% to 37.0%, and the percentage of vaping in the past 30 days increased from 8.4% to 14.6% among youth in Canada. Youth are also increasingly reporting symptoms of vaping dependence, defined as “the constellation of behaviors and symptoms that are distressing to the user and promote the compulsive use of vaping due to nicotine and non-nicotine factors.”2,p.257 A prospective cohort study suggests that vaping dependence is potentially related to future tobacco use persistence and escalation among Grade 12 students in the US.3 As of 2020, approximately 3000 hospitalizations and deaths reported by the US Centers for Disease Control and Prevention (CDC) were linked to use of vaping products.4

Previous studies of vaping dependence, including those that used validated scales such as the PROMIS-E and the Penn State Electronic Cigarette Dependence Index, have attributed the rise of vaping dependence symptoms to older age, longer duration of use, greater vaping frequency, higher nicotine concentrations and current cigarette smoking.5,6 However, these studies have limitations associated with traditional statistical regressions. The use of p-values to select features for model building based on statistical significance may limit insight into predictors not selected. Moreover, as vaping dependence may correlate with a wide variety of characteristics, it can be challenging for a regression model to completely capture these complex relationships. This complexity can further limit study findings with statistical issues such as multicollinearity and overfitting.

To address the aforementioned limitations, we applied a machine learning approach in this study. Machine learning—defined as “a group of data-driven analytical methods that rely on computational power to perform statistical tasks”7,p.1317—is an emerging technique found in health research.8-11 Compared to conventional statistical methods, machine learning may prove better able to make accurate predictions, with proper guidelines to mitigate risks of overfitting.12 We use the machine learning definition of “predictor” throughout this paper to refer to a prediction model; it does not necessarily imply a temporal or causal relationship.

This methodology focusses on the variables that are most “important” to prediction in terms of improving the performance of the model area under the curve (AUC) of the receiver operating curve (ROC), rather than relying on estimates of variance and p-value hypothesis testing. Although there are studies that have applied machine learning methods such as classification trees13 and random forest14 in tobacco research, a recent scoping review suggested that these applications are rarely linked to public health impacts.15

Thus, the aim of our study was to investigate further ever-vaping and daily vaping (as a proxy for vaping dependence) among the youth population, using machine learning methods with interpretable findings. In particular, our objectives were to develop machine learning algorithms that predict both ever-vaping and daily vaping among Ontario youth, and to perform post hoc analysis including ranking the importance of individual risk factors on both outcomes and illustrating statistical intersections to identify particularly susceptible youth subgroups.

Methods

Data and participants

This study used data from the 2019 Ontario Student Drug Use and Health Survey (OSDUHS), which included responses from 14142 students from 992 classes in 263 elementary or secondary schools from 47 Ontario school boards.16 The OSDUHS had a complex survey sampling design—schools were clustered within the 26 geographical strata. There were four different questionnaire types in total. We obtained a total of 6471 respondents after including only the survey types that contained the question “In the last 12 months, how often did you smoke e-cigarettes?” and excluding students who did not respond to this question. The sample used to examine daily vaping was limited to ever-vapers, including a total of 2064 respondents.

Measures

Outcome

We created binary outcome variables to represent daily vaping and ever-vaping using the same survey question. Participants who reported never having used an e-cigarette in their lifetime were “never-vapers,” while others were “ever-vapers.” Participants who vaped at least daily were classified as vaping dependent. Those who did not meet this criterion were considered to be participants without daily vaping.

Potential determinants

We regarded 179 and 176 variables capturing person-level characteristics as potentially predicting daily vaping and ever-vaping, respectively16 (see the Appendix at https://osf.io/x36p8/ for full list of variables.) These variables described administrative information, demographics, school life, family life, physical health, mental health, driving behaviours, experience of having been a passenger with an intoxicated driver, vaping behaviours, substance use, perceptions and exposures, sociodemographic characteristics and other risk behaviours of substance use. We excluded any variables that were conditional on either daily vaping or ever-vaping based on survey design (i.e. questions that were conditional on having ever vaped were not included as predictors of ever-vaping). We collapsed levels of several variables to facilitate subsequent analysis. Numeric variables were scaled using z-score normalization prior to model building.

Analysis

Descriptive statistics and imputation of missing values

We summarized demographic characteristics of the respondents and prevalence of ever-vaping and daily vaping. Over 90% of the variables had missingness lower than 5% or between 5% and 10%. A variable describing different types of special education had 10% missingness. Categorical variables were either collapsed with their reference levels or available options representing uncertainty of how to respond. We imputed the missing value as the median for all numeric variables.

Random forest algorithm

Using the R version 3.6.3 package “caret,”17 we developed a random forest algorithm—an ensemble machine learning algorithm formed by a large number of classification trees—to classify respondents of primary outcomes.18 For instance, in the algorithm of daily vaping, each tree classified respondents either as being daily vapers or as not being daily vapers. When all the class predictions from trees were summed, the class with the majority of votes became the prediction of the random forest. This “wisdom of the crowd” approach had the potential to make the random forest a highly accurate and robust algorithm for prediction.19

Development and validation of a random forest for daily vaping and ever-vaping

We included all the candidate predictors to train the model, excluding variables that were conditional on the outcome (i.e. we excluded questions for ever-vaping that were only asked to students who vaped). Using a ratio of 7:3, we randomly split the dataset into a training set (n=1612 or 4680) and a test set (n=691 or 2006) for the sample to classify daily vaping and ever-vaping. Both ever-vaping and daily vaping were imbalanced. To facilitate model training efficiency, we performed a Synthetic Minority Over-sampling Technique (SMOTE) procedure on the training data to reach two balanced samples for model training.20 In a 10-fold cross-validation procedure during model training, the dataset was randomly partitioned into 10 equally sized subsamples. At each iteration, nine subsamples were used to train the model, while the one subsample retained was used to validate the model. The above procedure was repeated 10 times. To evaluate model performance, we reported accuracy, sensitivity, specificity and AUC regarding the classification of daily vaping and ever-vaping on the test set. We considered the average performance of the 10 iterations as overall performance of the model. AUC exceeding 0.80 represented good discriminatory ability, a common threshold for classification models.21

Ranking of individual risk factors of daily vaping and ever-vaping

To identify the top 10 correlates of daily vaping and ever-vaping, we ranked all of the correlates based on scaled relative importance scores (0–100)—a measure calculated from total loss of accuracy due to exclusion of a correlate for every tree divided by the total number of trees.22,23 One-way partial dependence plots of the top 10 correlates were used to understand their marginal effects on the predicted risks of daily vaping and ever-vaping, while other correlates were kept constant.24 A partial dependence plot of one correlate illustrated probabilities of outcomes, given different values of that correlate. The higher the probability, the greater the risk of outcome observed under the influence of that correlate. These methods were applied to sociodemographic characteristics as well.

Exploration of interactions

We examined two-way interactions of the top 10 correlates identified and sociodemographic correlates that can robustly predict inequities of smoking-related outcomes.25 Further, we explored the interaction effects of the following pairs of sociodemographic characteristics—age and sex, age and ethnicity, age and socioeconomic status (SES), sex and ethnicity, sex and SES, ethnicity and SES—using a simple feature importance ranking measure approach.26 SES is subjectively determined by respondents based on their rating of their own SES on a ladder scaled from zero to 10.27 Two-way partial dependence plots were used to illustrate daily vaping and ever-vaping risks on the proposed pairs with interaction strengths above a threshold of 0.1. The calculations of partial dependence probabilities were based on the variation of the two predictors, while holding other predictors constant.28

Sensitivity analysis

We conducted two sets of sensitivity analyses using the same oversampled training set for both outcomes. First, we fitted random forest algorithms with only the top 10 correlates identified. Second, we built base multivariate logistic regression models composed of age, sex, ethnicity and SES. Performance of these logistic models was assessed by accuracy, sensitivity, specificity and AUC on the test set and compared to these measures of the random forest.

Results

Sample characteristics

The 6471 respondents were divided into 10 age groups (0 to 11, individual years between ages 12 and 19, and 20+ years); 54.6% of them were females; the majority (68.6%) came from a family positioned from 6 to 8 on the SES ladder; and 62.1% of them were White (Table 1). There were 2064 (31.9%) ever-vapers and 490 (7.6% of the entire sample or 23.7% of ever-vapers) respondents who were daily vapers.

Table 1. Demographic characteristics of sample eligible respondents to OSDUHS 2019.

graphic file with name 42_1_4_t01.jpg

Performance of the random forest algorithms

The random forest algorithms for both outcomes achieved high performance. The algorithm for ever-vaping had a testing accuracy of 0.82 (95% confidence interval [CI]: 0.81–0.84), sensitivity of 0.83 (0.80–0.86), specificity of 0.82 (0.80–0.84) and an AUC of 0.90. The algorithm for daily vaping had a testing accuracy of 0.83 (0.80–0.86), sensitivity of 0.85 (0.77–0.90), specificity of 0.82 (0.78–0.86) and an AUC of 0.90.

Top 10 correlates of ever-vaping and daily vaping

The algorithms demonstrated different top 10 correlates for daily vaping and ever-vaping (Figure 1). The top 10 correlates for ever-vaping were: having used cannabis in lifetime; having drunk alcohol in past 12 months; source of cannabis; having used waterpipe in lifetime; having used tobacco in lifetime; school size; having used cannabis in past 12 months; the number of drinks containing alcohol when typically drinking; having had an energy drink with alcohol in last 12 months; and having been drunk. The top 10 correlates for daily vaping were: source of e-cigarette/tried a friend’s; having smoked e-cigarettes with nicotine; having used cannabis in lifetime; source of cannabis; having smoked e-cigarettes without nicotine; having had a caffeine drink in the last 12 months; having had a caffeine drink in the last seven days; absence in the last 20 school days; source of e-cigarette/having bought e-cigarettes at a vape shop; and having used tobacco in lifetime. For both daily vaping and ever-vaping, all of the sociodemographic correlates showed minimal influence, with relative importance lower than three; thus, none of the corresponding partial dependence plots were reported.

Figure 1. Scaled relative importance plots of the top 10 correlates of daily vaping and ever-vaping, OSDUHS 2019.

Figure 1

Partial dependence on the top 10 predictors

According to partial dependence plots for ever-vaping, we found higher risks of ever-vaping among respondents who had used cannabis in the last 12 months or their lifetime, had drunk alcohol with or without high energy drinks in the last 12 months, had used tobacco or waterpipe in their lifetime, and had been drunk, compared to those who had not (see Appendix at https://osf.io/x36p8/). Across sources of cannabis, respondents who had ever used cannabis demonstrated a higher risk of ever-vaping than never-users. Respondents who had two to three drinks containing alcohol when they typically drank had approximately a 25% higher risk of ever-vaping than other alcohol and non-alcohol users. Risk of ever-vaping increased as school size increased in a range of up to 500 students, and remained high until the school size reached approximately 1850. There was a tiny decline in risk for schools with 1850 to 2000 students.

In regard to daily vaping, an increased risk of daily vaping was found among respondents who had used cannabis or tobacco in their lifetime or had drunk a caffeine drink in the last 12 months or seven days, compared with those who had not (see Appendix at https://osf.io/x36p8). Across sources of e-cigarette, there was a vast difference in the risk of being a daily vaper for respondents who borrowed an e-cigarette from a friend compared to those who purchased one in a retail environment. Across types of e-cigarettes, respondents who smoked e-cigarettes without nicotine had a 25% lower risk of being a daily vaper than those who did not. Never-users of cannabis showed a slightly lower risk of being a daily vaper than respondents who used cannabis across various sources. Any absence in the last 20 school days was associated with an increased risk of daily vaping; while it is possible that daily vaping could have led to more school absence, our model was not designed to demonstrate such a relationship.

Interactions

All of the top 10 correlates for ever-vaping, except for having been drunk, demonstrated interactions with ethnicity (see Appendix at https://osf.io/x36p8). Having tobacco or cannabis in lifetime and having drunk alcohol in last 12 months showed interactions with ethnicity, SES and age. Japanese ethnicity demonstrated a higher probability of ever-vaping than non-Japanese ethnicity for all school sizes, while opposite relationships were found among those of Southeast Asians and Korean ethnicity. Across all sources of cannabis, being of non-Japanese ethnicity was associated with lower probabilities of ever-vaping than being of Japanese ethnicity. Regardless of ethnic group, having two to three drinks on a typical day had the highest probability of ever-vaping, compared to other sources of alcohol. While being of Japanese ethnicity was positively associated with the probability of ever-vaping, being of Southeast Asian or Korean ethnicity was inversely associated with ever-vaping. There were smaller differences in the probability of ever-vaping between those of Japanese compared to non-Japanese ethnicity for having had cannabis or alcohol and having had alcohol combined with energy drinks in last 12 months. This relationship was also found for having had tobacco or cannabis in lifetime. Across all the SES groups, being of Southeast Asian or Korean ethnicity was associated with a slightly lower probability of ever-vaping compared to being non-Southeast Asian or non-Korean.

Age interacted with past-year alcohol use, ever use of tobacco and ever use of cannabis; in these interactions, the use of a substance was a more important predictor among younger students compared to older students. Similarly, these variables were more important predictors among higher SES students compared to lower SES students.

Weak interaction was found between caffeine consumption and ethnicity for daily vaping (see the Appendix at https://osf.io/x36p8/). The interaction strength of having had a caffeine drink in the last seven days and being uncertain of ethnicity was 0.111. Having had a caffeine drink in the last seven days was associated with a slightly higher probability of daily vaping, regardless of the uncertainty of ethnicity.

Sensitivity analysis

In line with the results of the primary analysis, high performance was found in parsimonious random forest algorithms with only the top 10 correlates. The parsimonious model of daily vaping had an accuracy of 0.81 (95% CI: 0.78–0.84), a sensitivity of 0.80 (0.72–0.86), a specificity of 0.82 (0.78–0.85) and an AUC of 0.87; the parsimonious model of ever- vaping had an accuracy of 0.78 (0.76–0.79), a sensitivity of 0.78 (0.74–0.81), a specificity of 0.78 (0.75–0.80), and an AUC of 0.86. By contrast, base logistic regressions of both outcomes had lower performance than the random forest models from the primary analysis. Specifically, the logit model of daily vaping had an accuracy of 0.53 (0.49–0.57), a sensitivity of 0.63 (0.54–0.71), a specificity of 0.50 (0.45–0.54) and an AUC of 0.60; the logit model of ever-vaping had an accuracy of 0.61 (0.59–0.64), a sensitivity of 0.82 (0.79–0.85), a specificity of 0.52 (0.49–0.55) and an AUC of 0.73.

Discussion

We applied a machine learning approach to investigate correlates of daily vaping and ever-vaping, using data from the OSDUHS conducted on a representative sample of Ontario youth attending elementary or secondary schools. The final random forest algorithms demonstrated high performance. The top 10 correlates for daily vaping differed from those for ever-vaping, as is consistent with various predictors found for cigarette onset and escalation in tobacco research.29-31 While we found no interactions among pairs of predictors proposed for daily vaping, we did find interactions between predictors of ever-vaping, particularly by ethnicity.

Our study suggests the key correlates for ever-vaping and daily vaping were different. While a previous study concluded that social influences are the most powerful predictors for ever-vaping,32 our study highlights the importance of three substances, namely cannabis, alcohol and tobacco, to risk of ever-vaping. These findings align with the emerging trend of cannabis vaping,33 and indicate that nicotine, a highly addictive compound in tobacco, is the most common substance in vaping devices.34 We also identified school size as an important sociodemographic correlate to the risk of ever-vaping.

Across sources of e-cigarette, since the lowest risk of daily vaping was found among respondents who tried an e-cigarette from a friend or borrowed one, social influences may play a limited role in the development of daily vaping. The use of nicotine-containing e-cigarettes was found to be associated with the highest risk of daily vaping—unsurprisingly, since addiction to vaping depends on nicotine.35 Our results suggest that caffeine, cannabis and tobacco are important substances for increased risk of daily vaping. While the literature suggests school grade and age might be the strongest sociodemographic correlates of drug use,36 our study shows increased number of absences in the last 20 school days might contribute more to increased risk of daily vaping.

Strengths and limitations

Methodologically, our study provides further evidence on the utility of machine learning in devising predictive modelling in tobacco control.37 The high performance of random forests yields interpretable findings, such as identification of important features, that are potentially meaningful for policy makers. As research indicates that e-cigarette use in adolescence is associated with higher odds of smoking cigarettes,38 features selected can identify important correlates, potentially preventing youth from proceeding to cigarette use. Days absent from school and school size, indicators not commonly found in the literature, were identified as important correlates of outcomes, because of the use of machine learning methods.

Furthermore, the high performance found in this study is in line with research that demonstrates that machine learning can outperform conventional statistical modelling on some occasions. For example, a systematic review reports that machine learning models have higher performance than logistic regression in neurosurgical outcome predictions.39 Similarly, machine learning models exhibit higher C-indexes than clinical risk scores in prognostic performance among patients with acute gastrointestinal bleeding.40

Regarding limitations, as our study was cross-sectional, we were only able to identify the top 10 important correlates rather than the true predictors of daily vaping or ever-vaping. Despite the robustness of random forest algorithms,41 the relative importance of correlates did not imply causality, and we did not conduct hypothesis testing in this analysis. Future longitudinal studies with a causal design and analysis would help address this limitation. More research is also required to validate the findings about interactions, since the ethnic groups reported had relatively small sample sizes (n<150). While our models demonstrated high performance with simple imputation of missing data, it would be worthwhile for future research to consider more sophisticated pipelines such as multiple imputation if precision of correlates is of major interest.42

Furthermore, current tools for developing random forest algorithms are unable to incorporate a cluster sampling. However, this limitation only affects the variance of the correlates, which was not the focus of this study. Finally, our analysis has limitations that are inherent to survey studies, such as potential recall bias and response bias. Nevertheless, we expect the results to remain robust, since we believe the OSDUHS survey has been structured with instruments that optimize response quality.

Conclusion

By training and testing random forest algorithms, we identified different sets of top 10 correlates for daily vaping and ever-vaping in a Canadian youth population. We found interactions among important correlates and sociodemographic characteristics for ever-vaping. Identification of correlates for daily vaping and ever-vaping for targeting purposes may inform future longitudinal studies to improve policies designed for subpopulations, irrespective of causality.

Acknowledgements

This project was funded by the Canadian Institutes of Health Research, funding reference number MS2-17073.

Conflicts of interest

The authors have no conflicts of interest.

Authors’ contributions and statement

JS, HH and MC conceptualized the manuscript. JS led the writing, statistical analysis and data interpretation, with the guidance of RF and MC. All authors provided feedback, edited drafts and approved the final version of the manuscript.

The content and views expressed in this article are those of the authors and do not necessarily reflect those of the Government of Canada.

References

  1. Hammond D, Reid JL, Rynard VL, et al, et al. Prevalence of vaping and smoking among adolescents in Canada, England, and the United States: repeat national cross sectional surveys. BMJ. 2019:l2219. doi: 10.1136/bmj.l2219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Stratton K, Kwan LY, Eaton DL, et al. National Academies Press (US) Washington(DC): 2018. Public health consequences of e-cigarettes; pp. 255–338. [PubMed] [Google Scholar]
  3. Vogel EA, Cho J, McConnell RS, Barrington-Trimis JL, Leventhal AM, et al. Prevalence of electronic cigarette dependence among youth and its association with future use. JAMA Netw Open. 2020;3((2)):e1921513–338. doi: 10.1001/jamanetworkopen.2019.21513. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Almeida-da-Silva CL, Dakafay H, O’Brien K, Montierth D, Xiao N, Ojcius DM, Biomed J, et al. Effects of electronic cigarette aerosol exposure on oral and systemic health. Biomed J. 2021:252–9. doi: 10.1016/j.bj.2020.07.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Morean ME, Krishnan-Sarin S, O’Malley SS, et al. Assessing nicotine dependence in adolescent e-cigarette users: the 4-item Patient-Reported Outcomes Measurement Information System (PROMIS) nicotine dependence item bank for electronic cigarettes. Drug Alcohol Depend. 2018:60–3. doi: 10.1016/j.drugalcdep.2018.03.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Foulds J, Veldheer S, Yingst J, et al, et al. Development of a questionnaire for assessing dependence on electronic cigarettes among a large sample of ex-smoking e-cigarette users. Nicotine Tob Res. 2015;17((2)):186–92. doi: 10.1093/ntr/ntu204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Beam AL, Kohane IS, et al. Big data and machine learning in health care. JAMA. 2018;319((13)):1317–8. doi: 10.1001/jama.2017.18391. [DOI] [PubMed] [Google Scholar]
  8. ev N, Glorot X, Rae JW, et al, et al. A clinically applicable approach to continuous prediction of future acute kidney injury. Nature. 2019;572((7767)):116–9. doi: 10.1038/s41586-019-1390-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Avati A, Jung K, Harman S, Downing L, Ng A, Shah NH, et al. Improving palliative care with deep learning. BMC Med Inf Decis Mak. 2018;18((Suppl 4)):122–9. doi: 10.1186/s12911-018-0677-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. DuBrava S, Mardekian J, Sadosky A, et al, et al. Using random forest models to identify correlates of a diabetic peripheral neuropathy diagnosis from electronic health record data. Pain Med. 2017;18((1)):107–15. doi: 10.1093/pm/pnw096. [DOI] [PubMed] [Google Scholar]
  11. Caballero FF, Soulis G, Engchuan W, et al, et al. Advanced analytical methodologies for measuring healthy ageing and its determinants, using factor analysis and machine learning techniques: the ATHLOS project. Sci Rep. 2017 doi: 10.1038/srep43955. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Luo W, Phung D, Tran T, et al, et al. Guidelines for developing and reporting machine learning predictive models in biomedical research: a multidisciplinary view. J Med Internet Res. 2016;18((12)):e323–15. doi: 10.2196/jmir.5870. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Fu R, Mitsakakis N, Chaiton M, et al. A machine learning approach to identify correlates of current e-cigarette use in Canada. Explor Med. 2021 [Google Scholar]
  14. Choi J, Ferrell A, Woo S, Haddad L, et al. Machine learning-based nicotine addiction prediction models for youth e-cigarette and waterpipe (hookah) users. J Clin Med. 2021:972–15. doi: 10.3390/jcm10050972. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Fu R, Kundu A, Mitsakakis N, Chaiton M, et al. Machine learning applications in tobacco research: a scoping review. Fu R, Kundu A, Mitsakakis N, Chaiton M. doi: 10.1136/tobaccocontrol-2020-056438. [DOI] [PubMed] [Google Scholar]
  16. Park S, McCague H, Northrup D, Myles R, Chi T, et al. The design and implementation of the CAMH Ontario Student Drug Use and Health Survey (OSDUHS) 2019: Technical documentation for Centre for Addiction and Mental Health. Park S, McCague H, Northrup D, Myles R, Chi T. :Technical documentation for Centre for Addiction and Mental Health–15. [Google Scholar]
  17. Kuhn M, Jed W, Steve W, et al, et al. caret: classification and regression training. Kuhn M, Jed W, Steve W, et al. 2020 [Google Scholar]
  18. Breiman L, et al. Random forests. Mach Learn. 2001:5–32. [Google Scholar]
  19. Srinath K, et al. Ensemble machine learning: wisdom of the crowd [Internet] Srinath K. Available from: https://towardsdatascience.com/ensemble-machine-learning-wisdom-of-the-crowd-56df1c24e2f5. [Google Scholar]
  20. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP, et al. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002:321–57. [Google Scholar]
  21. Rice ME, Harris GT, Cohen’s d, and r, et al. Comparing effect sizes in follow-up studies: ROC area, Cohen’s d, and r. Law Hum Behav. 2005;29((5)):615–20. doi: 10.1007/s10979-005-6832-7. [DOI] [PubMed] [Google Scholar]
  22. Breiman L, Friedman J, Stone CJ, Olshen RA, et al. Wadsworth International Group. Belmont(CA): 1984. Classification and regression trees. [Google Scholar]
  23. Friedman JH, et al. Greedy function approximation: a gradient boosting machine. Ann Statist. 2001;29((5)):1189–232. [Google Scholar]
  24. Greenwell B, et al. Partial dependence plots. Greenwell B. 2018 [Google Scholar]
  25. Potter LN, Lam CY, Cinciripini PM, Wetter DW, et al. Intersectionality and smoking cessation: exploring various approaches for understanding health inequities. Nicotine Tob Res. 2021:115–23. doi: 10.1093/ntr/ntaa052. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Greenwell BM, Boehmke BC, McCarthy AJ, et al. A simple and effective model-based variable importance measure. arXiv. 2018 [Google Scholar]
  27. Adler NE, Epel ES, Castellazzo G, Ickovics JR, et al. Relationship of subjective and objective social status with psychological and physiological functioning: preliminary data in healthy, White women. Health Psychology. 2000;19((6)):586–92. doi: 10.1037//0278-6133.19.6.586. [DOI] [PubMed] [Google Scholar]
  28. Milborrow S, et al. Plot a model’s residuals, response, and partial dependence plots. Milborrow S. 2020 Available from: https://cran.r-project.org/web/packages/plotmo/plotmo.pdf. [Google Scholar]
  29. Pokhrel P, Fagan P, Kawamoto CT, Okamoto SK, Herzog TA, et al. Predictors of marijuana vaping onset and escalation among young adults. Drug Alcohol Depend. 2020:108320–92. doi: 10.1016/j.drugalcdep.2020.108320. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Wellman RJ, Dugas EN, Dutczak H, et al, et al. Predictors of the onset of cigarette smoking: a systematic review of longitudinal population-based studies in youth. Am J Prev Med. 2016:767–78. doi: 10.1016/j.amepre.2016.04.003. [DOI] [PubMed] [Google Scholar]
  31. Morean ME, Wedel AV, et al. Vaping to lose weight: predictors of adult e-cigarette use for weight loss or control. Addict Behav. 2017:55–59. doi: 10.1016/j.addbeh.2016.10.022. [DOI] [PubMed] [Google Scholar]
  32. Jayakumar N, O’Connor S, Diemert L, Schwartz R, et al. Predictors of e-cigarette initiation: findings from the Youth and Young Adult Panel Study. Tob Use Insights. 2020:1179173X20977486–59. doi: 10.1177/1179173X20977486. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Chadi N, Minato C, Stanwick R, et al. Cannabis vaping: understanding the health risks of a rapidly emerging trend. Paediatr Child Health. 2020:S16–S20. doi: 10.1093/pch/pxaa016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. FDA. Washington(DC): Chemicals in tobacco products and your health—nicotine: the addictive chemical in tobacco products [Internet] Available from: https://www.fda.gov/tobacco-products/health-effects-tobacco-use/chemicals-tobacco-products-and-your-health. [Google Scholar]
  35. Dinardo P, Rome E, et al. Vaping: the new wave of nicotine addiction. Cleve Clin J Med. 2019;86((12)):789–98. doi: 10.3949/ccjm.86a.19118. [DOI] [PubMed] [Google Scholar]
  36. Boak A, Elton-Marshall T, Mann RE, Hamilton HA, et al. Centre for Addiction and Mental Health. Toronto(ON): 2020. Drug use among Ontario students 1977-2019: detailed findings from the Ontario Student Drug Use and Health Survey (OSDUHS) pp. detailed findings from the Ontario Student Drug Use and Health Survey (OSDUHS)–98. Available from: https://www.camh.ca/-/media/files/pdf---osduhs/drugusereport_2019osduhs-pdf.pdf?la=en&hash=7F149240451E7421C3991121AEAD630F21B13784. [Google Scholar]
  37. Nam SJ, Kim HM, Kang T, Park CY, et al. A study of machine learning models in predicting the intention of adolescents to smoke cigarettes. arXiv. 2019 [Google Scholar]
  38. Dutra LM, Glantz SA, et al. Electronic cigarettes and conventional cigarette use among US adolescents: a cross-sectional study. JAMA Pediatr. 2014:610–7. doi: 10.1001/jamapediatrics.2013.5488. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Senders JT, Staples PC, Karhade AV, et al, et al. Machine learning and neurosurgical outcome prediction: a systematic review. World Neurosurg. 2018:476–486. doi: 10.1016/j.wneu.2017.09.149. [DOI] [PubMed] [Google Scholar]
  40. Shung D, Simonov M, Gentry M, Au B, Laine L, et al. Machine learning to predict outcomes in patients with acute gastrointestinal bleeding: a systematic review. Dig Dis Sci. 2019;64((8)):2078–87. doi: 10.1007/s10620-019-05645-z. [DOI] [PubMed] [Google Scholar]
  41. Sarica A, Cerasa A, Quattrone A, et al. Random forest algorithm for the classification of neuroimaging data in Alzheimer’s disease: a systematic review. Front Aging Neurosci. 2017 doi: 10.3389/fnagi.2017.00329. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Austin PC, White IR, Lee DS, Buuren S, et al. Missing data in clinical research: a tutorial on multiple imputation. Can J Cardiol. 2020;37((9)):1322–31 . doi: 10.1016/j.cjca.2020.11.010. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Health Promotion and Chronic Disease Prevention in Canada : Research, Policy and Practice are provided here courtesy of Public Health Agency of Canada

RESOURCES