Abstract
The number of health-related incidents caused using illegal and legal psychoactive substances (PAS) has dramatically increased over two decades worldwide. In Colombia, the use of illicit substances has increased up to 10.3%, while the consumption alcohol and tobacco has increased to 84% and 12%, respectively. It is well-known that identifying drug consumption patterns in the general population is essential in reducing overall drug consumption. However, existing approaches do not incorporate Machine Learning and/or Deep Data Mining methods in combination with spatial techniques. To enhance our understanding of mental health issues related to PAS and assist in the development of national policies, here we present a novel Deep Neural Network-based Clustering-oriented Embedding Algorithm that incorporates an autoencoder and spatial techniques. The primary goal of our model is to identify general and spatial patterns of drug consumption and abuse, while also extracting relevant features from the input data and identifying clusters during the learning process. As a test case, we used the largest publicly available database of legal and illegal PAS consumption comprising 49,600 Colombian households. We estimated and geographically represented the prevalence of consumption and/or abuse of both PAS and non-PAS, while achieving statistically significant goodness-of-fit values. Our results indicate that region, sex, housing type, socioeconomic status, age, and variables related to household finances contribute to explaining the patterns of consumption and/or abuse of PAS. Additionally, we identified three distinct patterns of PAS consumption and/or abuse. At the spatial level, these patterns indicate concentrations of drug consumption in specific regions of the country, which are closely related to specific geographic locations and the prevailing social and environmental contexts. These findings can provide valuable insights to facilitate decision-making and develop national policies targeting specific groups given their cultural, geographic, and social conditions.
1. Introduction
Psychoactive substances (PAS) are chemical substances that change the function of the nervous system and cause alterations in people’s perception, mood, consciousness, cognition, or behavior [1]. PAS can be grouped according to their chemical structure as synthetic cannabinoids, synthetic cathinones, phenethylamines, arylcyclohexylamines, tryptamines, indolalkylamines, new synthetic opioids, piperazines, ketamine, and designer benzodiazepines. They can also be grouped according to their origin as a natural origin or synthetic molecules [2–4].
The increased numbers of drug use among young people are drawing the attention of national governments [5]. Because the number of health-related incidents caused by using legal and illegal PAS worldwide has dramatically increased over the last two decades [2, 6], this phenomenon has become some of the largest burdens of disease [7, 8]. Drug use constitutes a high cost to society due to premature mortality, increased health expenditure, criminal justice (drug and micro-trafficking), social welfare costs, and other social consequences [9, 10].
Colombia is ranked as one of the largest drug producers in the world [11]. Unfortunately, the production and commercialization of drugs through drug- and micro-trafficking, constantly expands in locations with high levels of poverty and limited government presence [12]. Statistics and indicators of drug consumption, production, and distribution, as well as reports from the National Statistical System (DANE) of Colombia, highlight a dramatic increase in (i) drug production [13]; (ii) intern consumption of PAS at early ages; and (iii) the prevalence in use and/or abuse of drugs have dramatically increased over the last 20 years [14, 15]. Furthermore, the country also has the highest prevalence of drug use among school students in recent years compared to other Latin American countries [16]. Thus, there is an urgent need to develop effective interventions to prevent the use and/or abuse of PAS.
The first step towards reducing this consumption is to identify drug consumption patterns in the general population [17]. Several studies have identified patterns associated with drug use and consumption [18, 19]. According to the Center for Disease Control and Prevention, individuals who do not have their own homes and live in rented accommodations are more likely to use drugs [20]. Other research studies suggest that neighborhood contextual characteristics may increase the risk of substance abuse [21–26]. Additionally, population density may also influence substance use and overdose risk through a higher level of socialization in densely populated urban areas [27–29]. Other authors have identified that anxiety, sleep disorders, suicide, depression, and other mental illnesses are risk factors for the consumption and abuse of PAS [30, 31]. Furthermore, early marijuana use has been shown to increase the risk of consuming other PAS [32]. Furthermore, people involved in sports and artistic activities perceive drugs as enhancers element for improving their performance [33].
In Colombia, drug consumption patterns and risk factors have also been identified. For instance, Kalyanam et al. [34] analyzed the social impact of basuco and inhalant use among street youths. Narvaez-Chicaiza [35] assessed the social factors that lead to the adoption of harm reduction policies and how these factors influence treatments for substance abuse disorders. Additionally, Restrepo-Escobar & Cardona [36] demonstrated that university students with low satisfaction in their studies tend to be heavy users of alcohol, tobacco, and marijuana. However, these approaches do not consider the use of Machine Learning (ML) and/or Deep Data Mining techniques in combination with spatial models to analyze drug consumption data from the general population. To our knowledge, we have not found any robust models integrating ML and spatial models to identify drug consumption patterns using publicly available Colombian databases.
Although several techniques for analyzing drug consumption patterns are currently available (i.e., ML, Bayesian, spatial, traditional multivariate, or univariate statistical models, or, in some cases, a combination of these), new trends in pattern identification and analysis techniques focus on hybrid and ensemble models [37]. Currently, the most widely used ML techniques are Support Vector Machines (SVMs), Random Forest (RF), and Natural Language Processing (NLP) [38–42]. Among Bayesian models, Bayesian meta-regression (DisMod-MR), Bayesian hierarchical models, and Markov Chain Monte Carlo are the most attractive methods [43–45]. Regarding spatial models, Spatial Distributions, Spatial Regression Models, Spatial Scan Statistics, Variograms, and Social Mapping are the most frequently used techniques [46–49]. On the other hand, logistic regression, confirmatory factor analysis, and correlational analysis are the most employed traditional statistical models to identify drug-associated patterns [50–53].
Fraley and Raftery [54] suggest separating clustering approaches into hierarchical and partitioning techniques. Partitioning techniques are divided into density-, model-, and grid-based methods, the most popular of which are K-means, PAM, CLARA, DBSCAN and CLIQUE. On the other hand, hierarchical techniques are divided into agglomerative and divisive methods. Of these, the best-known methods are BRICH, CURE, ROCK, and CHAMELEON (see [55] for further reading). Although these techniques have been shown to perform well when relevant features are removed a priori, it is well-known that in clustering algorithms, irrelevant and redundant features in the data may degrade the quality of clusters and lead to high computational cost. Therefore, removing such features may alleviate these issues. Thus, we focus on identifying patterns of PAS consumption using an ensemble model integrating an autoencoder with both a clustering algorithm and a spatial model. As part of our approach, we used the most recent and representative works for data clustering, and different dimensionality reduction and feature selection methodologies proposed in the literature.
Feature selection approaches in clustering can be split into filter, wrapper, embedded, and hybrid approaches [37]. While wrappers depend on the clustering algorithms to evaluate the clustering quality of a selected feature subset, filters are independent of the clustering algorithm. Embedded approaches also work with a clustering algorithm and, unlike wrappers, incorporate knowledge about the clustering structure. Another type of method is hybrid approaches, which combine filter and wrapper approaches into a single strategy. However, studies on embedded and hybrid feature selection approaches in clustering are limited [37]. Other feature learning-based approaches using Deep Neural Networks have been shown to work well for linear and nonlinear models [56]. For instance, Xie et al. in (2016) [57] propose to work on feature extraction and clustering using pre-trained Auto-Encoders simultaneously. However, these are mainly used to work and process images. In general, deep clustering models use Auto-Encoders since they can learn input features without labels on the data; performance measures show that this approach is reliable for different data types [58]. Thus, deep clustering methods have become a growing field of research for feature selection [58]. In this regard, the use of convolutional networks in autoencoders and the application of feature selection for clustering are open questions that have not been fully addressed yet, especially when dealing with data from different statistical distributions [37].
Here, we propose a Deep Neural Network-based Clustering-oriented Embedding Algorithm that allows us to (i) identify consumption patterns of PAS; and (ii) build an ensemble algorithm integrating an autoencoder with a clustering algorithm and a spatial model to deal with the feature space and cluster memberships. Our approach is based on the model proposed by Xie et al. [57] and B. Li et al. [56], and expands their work by creating an autoencoder from a convolutional network to represent high-order interactions in the data accurately and simultaneously incorporate a spatial analysis to describe drug consumption patterns properly. Our main hypothesis is that incorporating these two critical elements in our proposal will help to identify and better understand drug consumption patterns and support national policy development processes.
2. Materials and methods
2.1 Study area
Located in South America, the Republic of Colombia is a diverse country with a population of over 50 million people distributed over a territory of 440,831 square miles [59], encompassing jungles, highlands, grasslands, deserts, coasts, and islands, distributed in six regions and 32 departments (states) [60] (See S1 Fig). It is worth noting that, unfortunately, Colombia has been a major producer of illegal drugs for a long time, which has had a significant impact on drug consumption and abuse.
According to the United Nations Office on Drugs and Crime, Colombia is the first cocaine-producing country and the eighth country with the highest production of cannabis [61]. In addition, the Colombian Drug Observatory indicates that the use of illicit substances in the territory has increased to 10.3%, with men between the ages of 18 and 24 being the heaviest consumers of these types of drugs. Reports also indicate that consumption of licit substances such as alcohol and tobacco has recently increased dramatically [62].
2.2 Data sources
We used two databases to identify drug consumption patterns in Colombia. The first database was retrieved from the 2019 National Survey of Psychoactive Substance Consumption in the General Population (DANE-DIMPE-ENCSPA-2019; URL: https://microdatos.dane.gov.co/index.php/catalog/680/data-dictionary) conducted by the National Statistical System (DANE) of Colombia [63]. This survey includes observations of 49,600 households, where information on housing, location, general characteristics of individuals, consumption of legal and illegal PAS, and implemented treatments is registered. The second database comes from the Colombian Drug Observatory and contains information on the production of PAS per area during 2019. All these databases are fully available and completely anonymized. In this study, we used departments (states) as georeferenced areas using polygons (i.e., a shapefile) as implemented in ArcGIS Hub [64]. Thus, an ethics statement approved by an ethics committee is not required since we are using public information without the identification or individual information of the people involved.
2.3 Convolutional Auto-Encoder-Deep Embedded Clustering algorithm
Fig 1 presents the proposed Convolutional Auto-Encoder- Deep Embedded Clustering (CAE-DEC) framework based on the implementation presented by Xie et al. [57]. However, unlike the Xie et al. model, our structure is developed by applying convolutional layers for the deep autoencoder (DA) architecture instead of a linear one to represent high-order interactions in the data. In addition, a spectral clustering-based centroid estimation is proposed to achieve an improved initial centroid calculation. We chose the CAE-DEC framework based on its ability to reduce both the number of model parameters and the dimensionality, while creating clusters simultaneously.
Fig 1. Architecture of the proposed CAE-DEC model.
Republished from [64] under a CC BY license, with permission from [ArcGIS Hub], original copyright [2016].
In our approach, an encoder structure is first applied to map the input vector into a lower feature space, called latent feature space (LFS). Then, the LFS is independently passed through a decoder structure and a clustering layer to achieve an efficient clustering framework. The encoder-decoder combination (DA) attempts to extract a LFS preserving the relevant information from the original input data. On the other hand, the clustering layer seeks to execute an improved clustering assignment by minimizing the divergence between a target distribution and a centroid-based probability distribution.
In the last stage of the framework, a spatial analysis was performed using the feature space generated from the autoencoder as input. Here, the spatial data exploration is initially performed using Global Spatial Autocorrelation to determine to which level the similarity between observations in a dataset relates to the similarity of the locations of such observations [65]. To assess GSA, the Moran’s I [66], Geary’s C [67], and Getis and Ord’s G [68] statistics are estimated. We also measure the Local Spatial Autocorrelation, which focuses on the relationships between each observation and its surroundings, rather than providing a single-number summary of these relationships across the map [69]. This is estimated based on the ability to determine whether spatial autocorrelation is present in a geographically referenced data set. Finally, we perform regionalization, which corresponds to a special kind of clustering where the objective is to group similar observations based on their statistical attributes and spatial location [70]. In this sense, regionalization embeds the same logic as standard clustering techniques while applying a series of geographical constraints [71]. This framework was built using the TensorFlow (https://www.tensorflow.org/) and PyTorch (https://pytorch.org/) libraries in Python version 3.11 [72].
2.4 Convolutional Auto-Encoder (CAE)
The DA is a deep neural network architecture capable of learning unsupervised representations of an input data set. Typically, DA networks are used for dimensionality reduction or denoising tasks. The structure of a DA is based on two deep networks: a network to transform the original input data into a latent feature space, and a network trained to reconstruct the original input data using the extracted latent space as input. The first network, used to extract the latent space, is called the encoder, while the second is called the decoder. Rather than using fully connected layers, the implemented DA architecture incorporates convolutional (CONV) layers and fully connected (FC) layers for LFS extraction and reconstruction (Fig 1). Integrating convolutional layers in a DA is also called CAE [73]. Compared to a DA, which is built with only fully connected layers, the CAE structure can reduce the number of parameters compared to a DA [74].
2.4.1 Convolutional layer
The proposed CAE structure is designed using four convolutional layers, two CONV layers during the encoder stage, two CONV layers during the decoder stage, and two fully connected layers (Fig 1). The convolution operation can be denoted as:
(1) |
where is the value of each feature at the (i, j) location in the k-th feature map of the l-th layer, and represent the weight and bias of the k-th filter of the l-th layer, and denotes the input value at location (i, j) of the l-th layer. For non-linear mapping, an activation function g(.) is applied over the convolutional feature as follows:
(2) |
where is the activation value resulting from applying the activation function g(.). The Rectified Linear Unit (ReLU) function is set as the activation function on each CONV, except in the final decoder CONV layer where a sigmoidal activation is applied.
2.5 Clustering layer
The clustering layer is inspired by Xie et al. [57]. Initially, a soft assignment is computed between the latent space, also known as embedded space, and the cluster centroids. Then, update steps are repeated to define the final cluster centroids and embedded space. The Kullback–Leibler (KL) divergence is used as loss function during the optimization procedure. The objective is to minimize de KL divergence between a soft clustering distribution Q and an auxiliar target distribution P. The KL loss is calculated as:
(3) |
where Lc is the clustering loss. To measure the similarity between embedded point zk and the cluster centroid cm, the t Student’s distribution is used as a kernel:
(4) |
with α the degrees of freedom of the t Student’s distribution and qkm is a soft clustering assignment distribution of each embedded point (i.e., probability of assigning point k to cluster m). As in Xie et al., when setting α = 1 the similarity function qkm can be calculated as:
(5) |
To compute the target distribution pkm, the second power of qkm is calculated, and a cluster normalization is applied as follows:
(6) |
Then, by minimizing the divergence between P and Q, the embedding learning is achieved through highly confident assignments.
2.5.1 Center initialization
As previously mentioned, the cluster centroids are initialized using a spectral clustering-based approach. The spectral clustering allows flexible distance metrics and provides better cluster estimations than K-means [57]. However, most spectral clustering algorithms have high computational requirements. To overcome these computational requirements, random samples are taken to estimate the cluster centroids. As spectral clustering does not estimate any centroid during the learning process, once the clusters are defined, the mean of each cluster is used as the centroid estimator.
2.6 The CAE-DEC model
Initially, the input data is normalized within the interval [0, 1]. This normalization allows the network to use the most advanced learning rate and avoid the vanishing gradient problems, as well as alleviate overfitting. Further, to achieve a better learning process, the last CONV layer in the decoder structure is activated by a sigmoid activation function. Then, two training steps will be executed during the CAE-DEC learning process. Firstly, a CAE model will be trained to minimize the reconstruction loss Lr computed as
(7) |
where x is the normalized input and is the reconstructed output. This pretrained CAE model is then used as the DA structure in the CAE-DEC model.
In the second step, the CAE-DEC model is trained to simultaneously minimize reconstruction loss and clustering loss. The total loss during this training step will be set as
(8) |
where Lr is the CAE-DEC reconstruction loss, Lc is the CAE-DEC clustering loss, and C is a coefficient to control the loss balance. The training process is shown in Table 1. The goal is to obtain a latent space that minimizes the total loss. Finally, the label of each embedded point is established as
(9) |
where qjm is the probability that point j belongs to a specific cluster center m. On the other hand, the maximum number of iterations Mint and the target distribution P update condition P_change was chosen based on multiple experiments. The final Mint and P_change values were 3000 and 5, respectively. This final P_change improves stability during the training process.
Table 1. Pseudo code for the CAE-DEC training process.
Pseudo code: The CAE-DEC training process |
Input data: Number of clusters n; Normalized input data x; Maximum number of iterations Mint; Balance coefficient C; Pretrained CAE; Stop condition Stop; Target distribution P update condition P_change. |
Training process: |
1. Generate an initial latent space (Z) through the pre-trained CAE |
2. Run spectral clustering with Z to generate the initial cluster centers (C) |
3. Initialize the CAE-DEC model with the pretrained CAE. |
4. Calculate soft assignment distribution Q and target distribution P based on Z and C |
for epoch<Minter do: |
if epoch%P_change = = 0 then: |
Calculate soft assignment distribution Q and target distribution P based on Z and C |
end if |
Feed the CAE-DEC with the normalized input data x |
Calculate the reconstruction loss and the clustering loss |
Update CAE-DEC parameters. Weight, Bias, and Centers. |
if Stop = = True then: |
Break |
end if |
end for |
Obtain the label for each data point from the las optimized Q. |
Output: Latent space, labels |
2.7 Framework evaluation
We trained the CAE-DEC method using data retrieved from the National Survey of Psychoactive Substance Consumption (DANE-DIMPE-ENCSPA-2019), which contains 49,600 observations. A second database with PAS production figures, was used in the spatial analysis stage to correlate the PSA consumption and production. In order to evaluate the framework, we compared our CAE-DEC approach with other approaches, including CAE, and Principal Component Analysis integrated with clustering (PCA-DEC). For evaluation and comparison purposes, we use the Calinski-Harabasz [75], Davies-Bouldin [76], and Silhouette [77] index as intrinsic clustering metrics. In addition, we used the χ2 statistic to investigate potential associations and differences among the patterns (clusters) identified using our approach.
3. Results
3.1 Model comparison for identifying drug consumption patterns
Fig 2 depicts the LFS resulting after applying the CAE and CAE-DEC models to the data. Among all individuals, we identified three different clusters; 14935 (30.19%) individuals belong to cluster 0, 11528 (23.30%) individuals belong to cluster 1, and 23005 (46.50%) individuals belong to cluster 2. Interestingly, the LFS generated with the CAE-DEC has more defined clusters than the CAE model. Although the CAE model seeks to extract a LFS that preserves the essential characteristics of the input data, our proposed CAE-DEC model not only preserves these important characteristics but, at the same time, also forces the encoder structure to generate representative clusters while extracting the new feature space.
Fig 2. Derived Latent Feature Space based on the (a) CAE and (b) CAE-DEC models.
On the other hand, the reconstruction loss obtained through the CAE model is higher than that of the CAE-DEC model. This result may be related to the fact that the CAE-DEC model used the pre-trained CAE model during its construction. It should be noted that the CAE model alone cannot determine the labels of each point or define clusters in the data. Thus, clusters in Fig 2 were obtained through spectral clustering and were the bases for initializing the centroids in the CAE-DEC model.
3.2 Identification of clusters of psychoactive drugs consumption
Here we analyse the patterns in each cluster obtained using the CAE-DEC model. We defined a priority dummy variable Yij quantifying whether the ith person in household jth has consumed PAS; Yij = 1 when an individual has never consumed PAS and Yij = 2 otherwise. Out of the 49468 individuals in the sample, only 5514 (11.15%) consume PAS. Fig 3a and 3b depict, respectively, the derived cluster structure for individuals consuming PAS and those who reported not consuming, derived from the CAE-DEC model. Our results indicate that individuals in clusters 0 and 2 are more likely to consume some PAS (Fig 3a), while most individuals in cluster 1 do not (Table 2). In particular, 1726 (11.56%) individuals in cluster 0, 392 (3.4%) individuals in cluster 1, and 3396 (14.76%) individuals in cluster 2 have used PAS (Table 2). A χ2-based test of independence reveals that the region where individuals are located, age (years), the type of household they live in, their socioeconomic status (SES), and whether they contribute to the household finances are statistically significantly associated with the cluster they belong to (Table 2).
Fig 3. Resulting clusters for individuals (a) consuming and (b) not consuming psychoactive substances based on the CAE-DEC model.
Table 2. Distribution of demographic and social variables across clusters.
Variables | Cluster 0 (n = 14935) | Cluster 1 (n = 11528) | Cluster 2 (n = 23005) | χ 2 | df | P-value | |
---|---|---|---|---|---|---|---|
Region | Caribbean | 3004 | 2991 | 4075 | 1472.9 | 10 | < .0001 |
Central-Eastern | 2526 | 2299 | 6175 | ||||
Central-Southern | 1843 | 1299 | 1702 | ||||
Eje Cafetero–Antioquia | 3902 | 2354 | 6371 | ||||
Llanos Orientales | 1687 | 1305 | 1496 | ||||
Pacific | 1973 | 1280 | 3186 | ||||
Gender | Male | 6606 | 3927 | 10233 | 386.72 | 2 | < .0001 |
Female | 8329 | 7601 | 12772 | ||||
Housing type | House | 8250 | 6538 | 11834 | 114.18 | 6 | < .0001 |
Apartment | 6275 | 4724 | 10595 | ||||
Room | 395 | 256 | 543 | ||||
Indigenous dwelling | 15 | 10 | 33 | ||||
Socioeconomic status | 1 | 3889 | 3626 | 6945 | 843.92 | 10 | < .0001 |
2 | 4627 | 3902 | 8851 | ||||
3 | 4284 | 2867 | 5683 | ||||
4 | 1341 | 718 | 979 | ||||
5 | 510 | 256 | 362 | ||||
6 | 284 | 159 | 185 | ||||
Age (years) | (0, 20] | 2105 | 1576 | 2998 | 345.87 | 4 | < .0001 |
(20, 40] | 6814 | 4329 | 10889 | ||||
(40, 68] | 6016 | 5623 | 9118 | ||||
Contribute to the household finances | Yes | 10128 | 7473 | 16041 | 85.24 | 2 | < .0001 |
No | 4807 | 4055 | 6964 |
df: Degrees of freedom.
Table 3 shows the adjusted residuals for our model. According to our results, the Central-Eastern region significantly contributes to the Region variable. In this region, the observed value is higher than the expected value in cluster 2, while the observed value is lower than the expected value for cluster 0. Although to a lesser extent, the Llanos Orientales region also significantly contributes the χ2 statistic. Indeed, this region shows fewer observed individuals than the expected number of individuals in cluster 2 and a higher number observed than expected individuals in clusters 0 and 1 (Table 3).
Table 3. Adjusted residuals comparing the observed and expected frequencies based on the cluster analysis.
Variables | Cluster 0 (n = 14935) | Cluster 1 (n = 11528) | Cluster 2 (n = 23005) | |
---|---|---|---|---|
Region | Caribbean | -0.88 | 17.02 | -13.61 |
Central-Eastern | -18.72 | -6.76 | 22.97 | |
Central-Southern | 12.54 | 6.09 | -16.7 | |
Eje Cafetero—Antioquia | 2.02 | -14.36 | 10.31 | |
Llanos Orientales | 11.32 | 9.59 | -18.55 | |
Pacific | 0.84 | -6.97 | 5.13 | |
Gender | Male | 6.68 | -19.66 | 10.52 |
Female | -6.68 | 19.66 | -10.52 | |
Housing type | House | 4.17 | 7.13 | -9.88 |
Apartment | -4.83 | -6.61 | 10.05 | |
Room | 2.2 | -1.54 | -0.72 | |
Indigenous dwelling | -0.72 | -1.09 | 1.59 | |
Socioeconomic status | 1 | -10.26 | 5.99 | 4.37 |
2 | -12.72 | -3.3 | 14.51 | |
3 | 9.14 | -3 | -5.87 | |
4 | 17.29 | 0.44 | -16.29 | |
5 | 11.12 | -0.49 | -9.82 | |
6 | 8.26 | 1.2 | -8.62 | |
Age (years) | (0, 20] | 2.54 | 0.61 | -2.85 |
(20, 40] | 3.2 | -17.23 | 11.66 | |
(40, 68] | -4.98 | 16.93 | -9.77 | |
Contribute to the household finances | Yes | -0.61 | -8.37 | 7.65 |
No | 0.61 | 8.37 | -7.65 |
On the other hand, Gender has a higher-than-expected value of males in clusters 0 and 2, while it is lower in cluster 1. For females, the opposite occurs in cluster 1, and lower values are observed in clusters 0 and 2. Similarly, Housing Type has a higher-than-expected value of individuals living at houses in cluster 1 and a lower-than-expected in cluster 2. Conversely, cluster 2 has more individuals living in apartments, and cluster 1 has the lowest (Table 3).
Regarding SES, a higher-than-expected number of individuals in strata 3, 4, 5, and 6 in cluster 0 were found (Table 3). We also observed a lower-than-expected number of individuals in strata 3, 4, 5, and 6 in cluster 2 and a higher-than-expected number in strata 1, 4 and 6 in cluster 1 (Table 3). Moreover, the age variable shows a higher-than-expected observed value for the (0,20] range in cluster 0. For ages between (20,40] years, cluster 2 has a higher-than-expected number of individuals. Conversely, there is a lower number of individuals in cluster 1. Finally, the household economy variable results show that cluster 2 has a higher-than-expected value of individuals contributing to the household finances, and cluster 1 has a lower-than-expected value of individuals not contributing to it. Comparison of Calinski-Harabasz, Davies-Bouldin, and silhouette metrics between a principal component analysis (PCA)-based deep autoencoder (PCA-DEC) and our proposed CAE-DEC model indicates the superiority of the latter (S2 Table).
3.3 Spatial analysis of psychoactive drugs consumption
Different alternative classification algorithms were used to determine the number of choropleth class limits (i.e., Equal Intervals, Quantiles, Maximum Breaks, Box plot, Head-Tail Breaks, Jenks-Caspall, Fisher-Jenks, and Max-p) and compared using the absolute deviation around class medians optimization criterion (Fig 4). According to our results, the Fisher-Jenks classifier performed better and hence was selected.
Fig 4. Absolute deviation around class medians (ADCM) statistic criterion for different alternative classifiers.
Here, lower is better.
Following the same exploratory spatial analysis, we constructed a choropleth with the percentage of PAS use for each of the 32 Colombian departments (Fig 5a). We found that the departments of Arauca, Vichada, Caquetá, Chocó, Magdalena, Cesar, Bolivar, Sucre, Cordoba, and Norte de Santander have low percentages of drug use. However, some of these departments are major drug producers (i.e., Cordoba and Guaviare), according to data from the Drug Observatory of Colombia [78]. Similarly, Putumayo is the department with the highest proportions of PAS use (Fig 5a).
Fig 5. (a) Consumption percentage of psychoactive substances according to the CAE-DEC model; (b) Moran’s I statistic; (c) Moran’s cluster map.
Here, HH, LH, LL and ns represent high-high, low-high, low-low, and not statistically significant quadrants, respectively. This clustering pattern leads to a statistically significant Moran’s I statistic of 0.2 (P-value <0.01). Architecture of the proposed CAE-DEC model. Republished from [64] under a CC BY license, with permission from [ArcGIS Hub], original copyright [2016].
The global Moran’s I results show the presence of a statistically significant positive global spatial autocorrelation (I = 0.2005, P<0.01). Thus, the null hypothesis that the map is random (i.e., that the map shows more spatial patterns than we would expect if the values had been randomly assigned to a location) is rejected. In addition, other global indices such as Geary’s C (C = 0.693, P = 0.003) and Getis and Ord’s G (G = 0.800, P = 0.049) confirm the presence of statistically significant global spatial autocorrelation.
To further explore the relationships between each observation and its environment, the Local Indicators of Spatial Association (LISA) were estimated (more information on LISA statistics is provided in S3 Fig). Fig 5b depicts the Moran diagram, indicating each quadrant’s positive (or negative) association. Specifically, the high-high (HH) and low-low (LL) quadrants indicate a positive association between high and low drug use. On the other hand, the low-high (LH) and high-low (HL) quadrants indicate negative associations with drug use (Fig 5b). Following our results, we found that departments such as Nariño and Cauca belong to the HH cluster. In contrast, la Guajira, Atlántico, Magdalena, Cesar, Norte de Santander, Sucre, and Cordoba belong to the LL. This clustering pattern leads to a statistically significant Moran’s I statistic (P-value <0.01). Thus, a little over 39.4% of the departments are considered, by this analysis, to be part of a spatial cluster (i.e., statistically significant with a P-value <5%). We also identified that, among legal drugs, alcohol and tobacco are the most frequently consumed in the national territory (Fig 6a). At the same time, marijuana, followed by non-prescription tranquilizers and Yagé, and a slight consumption of opioids and Poppers, are the most frequently consumed illegal drugs (Fig 6b).
Fig 6. Frequency of consumption of (a) legal and (b) illegal drugs in Colombia.
Regarding legal drugs, alcohol has the highest consumption rates in Bogotá, Cundinamarca, and Chocó (Fig 7). However, there is moderately high use in Vaupés, Nariño, Bolívar, Magdalena, La Guajira, and Atlántico. As for energy drinks, consumption is the highest in Casanare and Guaviare and has slightly high uses in Boyacá, Nariño, Risaralda, and Arauca. On the other hand, tobacco has the highest consumption in Cundinamarca but has moderately high uses in Bogotá, Boyacá, Nariño, Casanare, Tolima, Quindío, Risaralda, Guainía, Caldas, and Vaupés. It should be mentioned that the use of these drugs is also present across the country but with a lower incidence (Fig 7).
Fig 7. Consumption of illegal drugs by department.
For interpretation purposes, number represents values scaled on a range of 0 to 1. For instance, Bogotá D.C. has the highest LSD consumption and Putumayo has the lowest.
Concerning illegal drugs, non-prescription tranquilizers and stimulants are most prevalent in Casanare (Fig 8). However, the consumption of tranquilizers is slightly higher in Nariño, while inhalants have the highest consumption in Quindío, followed by Cauca, Caldas, and Nariño. Methylene Chloride has the highest consumption in Cauca and a high consumption in Quindío and Nariño; Antioquia, followed by Caldas and Risaralda, shows the highest consumption of popper. On the contrary, marijuana has its highest consumption in Risaralda and moderately high consumption in Caldas, Bogotá, Antioquia, and Quindío. As for cocaine, its consumption is the highest in Risaralda and moderately high in Antioquia (Fig 8).
Fig 8. Consumption of legal drugs by department.
Number represents values scaled on a range of 0 to 1 for psychoactive substance use. Conventions as in Fig 7.
On the other hand, basuco (i.e., cocaine paste) has the highest consumption rate in Guaviare, and critical consumption in Nariño, Cauca, Quindío, Antioquia, and Amazonas; ecstasy has its highest consumption in Risaralda, followed by Bogotá and Caldas; heroin consumption is highest in Vaupes, Huila, Cauca, Quindío, and Arauca, and is slightly higher in Casanare; methamphetamine consumption is highest in Casanare and is moderately high in Boyacá; methadone is most widely used in Quindío, but has slightly high levels of use in Valle del Cauca and Caquetá; opioids are most prevalent in Casanare, followed by Sucre; LSD is most prevalent in Bogotá, but has high levels of use in Caldas, Risaralda, Quindío, and Nariño; mushrooms have their highest consumption in Boyacá and have moderately high uses in Quindío, Risaralda, Bogotá, Cauca, and Casanare; Yagé has a higher incidence in Putumayo; cacao sabanero has its highest consumption in Caldas, and has moderate consumption in Cundinamarca, Bogotá, Antioquia, and Quindío; ketamine has the highest consumption in Casanare, followed by Antioquia; and GHB has the highest consumption in Risaralda, followed by Santander, Valle del Cauca, and Norte de Santander. Finally, 2CB has the highest consumption rate in Risaralda, followed by Caldas. Although the consumption pattern of some departments is not mentioned, there is low and moderate consumption for certain drugs in some of them (Fig 8).
3.4 Regionalization of clusters
We applied a regionalization method as a grouping technique for imposing a spatial restriction, i.e., the result of a regionalization algorithm contains clusters with geographically coherent areas and coherent data profiles. Our approach uses a spatially constrained hierarchical clustering algorithm, which identified three clusters representing the consumption of PAS in the country (Fig 9). The number of clusters was estimated based on the average silhouette indexes, the total intra-cluster variance, and dendrograms (S2 Fig). Following our results, cluster 0 is comprised of departments such as La Guajira, Cesar, Atlántico, Magdalena, Norte de Santander, Bolivar, Sucre, and Cordoba, all of them located in the Northern region of the country; cluster 1 is comprised of Antioquia, Santander, Boyacá, Caldas, Risaralda, and Quindío; and cluster 2 is integrated by the remaining departments (Fig 9). When testing geographical coherence, which is the measure that assesses the “compactness” of a given shape, our results indicate that the clusters derived using the regionalization model represent moderately compact regions. In addition, the feature coherence (i.e., goodness-of-fit) test using different metrics showed that our 3-cluster regionalization structure properly fits the data (S1 Table).
Fig 9. Cluster map of drug use after regionalization.
Republished from [64] under a CC BY license, with permission from [ArcGIS Hub], original copyright [2016].
4. Discussion
In this study, we propose and test a Deep Neural Network-based Clustering-oriented Embedding algorithm (i.e., a ML-based model) for identifying psychoactive substance (PAS) use and abuse patterns in Colombia. This model allows the automatic extraction of features from the input data (such as sex, age, socioeconomic status, and housing type) to determine whether an individual has consumed PAS. It then creates clusters in the new data space generated during the learning process, following the methods outlined in [56, 57]. After the training process, a latent feature space (LFS) is generated, and the results are subsequently analysed.
We have identified clearly marked clusters where the prevalence of individuals who use or do not use PAS is notable. Additionally, we found that region, sex, housing type, socioeconomic strata, age, and whether individuals contribute to household finances have a statistically significant impact on the clustering structure. These findings are consistent with previous studies aimed at identifying PAS consumption patterns [19, 79, 80]. Interestingly, when comparing the CAE-DEC model proposed in this study and the CAE-Spectral model using different metrics (i.e., Silhouette statistic, which measures the internal density of each cluster and the distance that separates them from each other, the Calinski-Harabasz index and the Davies-Bouldin index [DBI]), we found that our model performs better (Silhouette: 0.62 vs. 0.786; Calinski-Harabasz: 22468.26 vs. 775992.45; DBI: 0.2898 vs. 0.63; S2 Table).
Based on our findings, individuals more likely to consume PAS are grouped in cluster 2, while cluster 1 consisted of individuals who did not consume PAS (Table 2). Not surprisingly, a significant proportion of females characterizes cluster 1. In addition, most individuals belong to socioeconomic strata 1, are 40 years old or older, and do not contribute economically to support their household. In contrast, cluster 2 is characterized by a higher proportion of males aged between 20 and 40 in socioeconomical strata 1 and 2, who do not contribute to the household finances (Table 2). Finally, cluster 0 is characterized by a small proportion of males, a higher proportion of individuals in strata 3, 4, 5, and 6, and individuals are more likely to contribute to the household economy (Table 2).
At the level of spatial statistics, we identified that legal drugs such as alcohol have a high prevalence in all regions of Colombia, with a slight tendency to more consumption in coastal areas (Fig 7). In our country, the coastal areas are often popular tourist destinations, and many tourists come to these areas looking for a relaxing experience, which can increase alcohol consumption. Coastal areas typically have warmer temperatures and more sunshine, increasing thirst and making people more likely to consume beverage. Additionally, bars, clubs, and restaurants serve alcoholic beverage due to the high demand from tourists and locals [81, 82]. Another characteristic of this area is the fishing and maritime culture. This culture is often associated with hard work and long working hours, and alcohol may be seen as a way to relax and unwind after a tough day at the sea [83]. Finally, this region has 69% urban and 31% rural zones [59]. The level of development, as measured by gross domestic product (GDP), is the third region with significant economic development in the country [84] (S3 Table). Interestingly, the consumption of illegal drugs is lower in the Northern region than in other regions of the country. However, there is a more representative consumption of non-prescription tranquilizers, opioids, ketamine, GHB, and heroin. In particular, the Atlántico department has the highest consumption proportion within this region (Fig 8).
Tobacco consumption is present in all regions, with a higher proportion in the Central region (Eje Cafetero–Antioquia), where climate conditions resemble temperate weather. Also, this region has a diverse consumption pattern, where drugs such as marijuana, popper, cocaine, ecstasy, inhalants, methadone, heroin, LSD, GHB, 2CB, and mushrooms prevail. This region has Colombia’s largest cities (i.e., Bogotá and Medellin); Bogotá has the highest population density and is a hub for drug trafficking routes, while Medellín has an unfortunate history of drug cartels and gang violence. Ultimately, this region is comprised of 79% urban areas, and the most developed cities in the country are located there [59, 84] (S3 Table).
Energy drinks are more frequently used in the Central-Eastern region, characterized by a continental climate surrounded by flat territory. Our results are in line with the scientific literature suggesting that the location of regions within countries is directly associated with the consumption of PAS [26, 85–87]. The consumption of heroin, basuco, non-prescription tranquilizers, stimulants, methamphetamines, opioids, and ketamine characterizes this region. This zone is the second most developed region in the country, and 71% of urban areas [59, 84], (S3 Table).
Our findings also show that the Southern region is more likely to consume illegal drugs, including basuco, heroin, and Yagé (Fig 8). One of the main reasons for this result is that, unfortunately, this region has favourable environmental characteristics (i.e., majority rainforest) for their consumption and production, being the second largest illegal drug-producing region in Colombia [78]. Furthermore, this region has the highest percentage of rurality (55%) compared to the other regions, and its level of development is low as measured by the GDP [59, 84] (S3 Table).
In the Western region (Pacific), also known as the Pacific region, consumption mostly mainly includes of Methylene Chloride, GHB, heroin, opioids, and methamphetamines. This region (Pacific) is mainly characterized known for its geographical isolation, poverty, and ongoing conflict, which have contributed to the growth of drug production and trafficking in the area. Poverty is one of the main factors driving drug production in the Pacific region, which has led many people to turn to drug cultivation and trafficking for survival. Additionally, the region’s rugged terrain and limited infrastructure have made it difficult for the Colombian government to establish a strong presence, allowing drug traffickers to operate with relative impunity [88]. This region has a similar percentage of urban (53%) and rural (47%) populations than the Southern region and ranks second among the regions with the lowest levels of development (S3 Table).
In the Western region, also known as the Pacific region, consumption mainly includes methylene chloride, GHB, heroin, opioids, and methamphetamines. This region is mainly characterized for its geographical isolation, poverty, and ongoing conflict, which have contributed to the growth of drug production and trafficking in the area. Poverty is one of the main factors driving drug production in the Pacific region, which has led many people to turn to drug cultivation and trafficking for survival. Additionally, the region’s rugged terrain and limited infrastructure have made it difficult for the Colombian government to establish a strong presence, allowing drug traffickers to operate with relative impunity [88]. This region has a similar percentage of urban (53%) and rural (47%) populations than the Southern region, and ranks second among the regions with the lowest levels of development (S3 Table) [88]. This region has a similar percentage of urban (53%) and rural (47%) populations than the Southern region. On the other hand, this region ranks second among the regions with the lowest levels of development (S3 Table).
Conclusion
In summary, the proposed CAE-DEC model simultaneously integrates a feature extraction process within the clustering design, prioritizing features that improve the separation between groups, thus avoiding the manual extraction of features, which is a frequent process in traditional models. Additionally, a geospatial component is sequentially included to expand the resulting insights by considering geographic constraints. Currently, these types of architectures are scarce in understanding mental health problems. As part of future work, the architecture of the proposed model could be improved to integrate the automatic extraction of features while optimizing a geospatial loss. Following our experience with the proposed CAE-DEC in PAS consumption, the application of this model to other mental health problems, such as suicide, depression, and domestic violence, among other pathologies, could be explored. Based on these results, effective interventions and/or government policies to prevent and/or mitigate their impact could be promoted and evaluated, for example, by developing regional interventions based on the types of drugs most prevalent in the area and the cultural and socio-economic characteristics. This can include education, treatment, and harm reduction programs. Also, this information can be used to develop public health campaigns to raise awareness about the risks of drug use and reduce their negative impact. Furthermore, this information can be used to crack down on drug trafficking and distribution networks. On the other hand, this information can be used to alert healthcare providers and regulatory bodies to take appropriate action to prevent their use and discover new drugs.
Supporting information
(DOCX)
(DOCX)
(DOCX)
Republished from [65] under a CC BY license, with permission from [ArcGIS Hub], original copyright [2016].
(DOCX)
(DOCX)
Republished from [65] under a CC BY license, with permission from [ArcGIS Hub], original copyright [2016].
(DOCX)
Acknowledgments
K.P. is a doctoral student at Universidad del Norte, Barranquilla, Colombia, and received a Ph.D. scholarship from this institution. Some of this work is to be presented to the Ph.D. program in partial fulfillment of the requirements for the Ph.D. degree.
Data Availability
The data used in the manuscript were obtained from a third party, the Archivo Nacional de Datos (ANDA), and are fully available and anonymized. The authors confirm that others would be able to access these data in the same manner as themselves; and the authors did not have any special access privileges that others would not have. The data can be publicly retrieved from ANDA (https://microdatos.dane.gov.co/index.php/catalog/680/data-dictionary).
Funding Statement
The author(s) received no specific funding for this work.
References
- 1.OPS, “Abuso de sustancias,” Organización Panamericana de la Salud, 2022. https://www.paho.org/es/temas/abuso-sustancias (accessed Jan. 25, 2022).
- 2.Heesun C., Jaesin L., and Eunmi K., “Trends of novel psychoactive substances (NPSs) and their fatal cases,” Forensic Toxicol., vol. 34, no. 1, pp. 1–11, 2016, Accessed: Jan. 20, 2022. [Online]. https://jglobal.jst.go.jp/en/detail?JGLOBAL_ID=201602283742633549 [Google Scholar]
- 3.Riley A. L. et al., “Abuse potential and toxicity of the synthetic cathinones (i.e., ‘Bath salts’),” Neurosci. Biobehav. Rev., vol. 110, pp. 150–173, Mar. 2020, doi: 10.1016/J.NEUBIOREV.2018.07.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Assi S., Gulyamova N., Kneller P., and Osselton D., “The effects and toxicity of cathinones from the users’ perspectives: A qualitative study,” Hum. Psychopharmacol. Clin. Exp., vol. 32, no. 3, p. e2610, May 2017, doi: 10.1002/hup.2610 [DOI] [PubMed] [Google Scholar]
- 5.Lukić V., Micić R., Arsić B., Nedović B., and Radosavljević Ž., “Overview of the major classes of new psychoactive substances, psychoactive effects, analytical determination and conformational analysis of selected illegal drugs,” Open Chem., vol. 19, no. 1, pp. 60–106, Jan. 2021 [Google Scholar]
- 6.Uchiyama N., Matsuda S., Kawamura M., Kikura-Hanajiri R., and Goda Y., “Two new-type cannabimimetic quinolinyl carboxylates, QUPIC and QUCHIC, two new cannabimimetic carboxamide derivatives, ADB-FUBINACA and ADBICA, and five synthetic cannabinoids detected with a thiophene derivative α-PVT and an opioid receptor agonist AH-7921 identified in illegal products,” Forensic Toxicol. 2013 312, vol. 31, no. 2, pp. 223–240, Mar. 2013, doi: 10.1007/S11419-013-0182-9 [DOI] [Google Scholar]
- 7.Grant B. F. et al. , “Prevalence and co-occurrence of substance use disorders and independent mood and anxiety disorders—Results from the national epidemiologic survey on alcohol and related conditions,” Arch. Gen. Psychiatry, vol. 61, no. 8, pp. 807–816, 2004, doi: 10.1001/archpsyc.61.8.807 [DOI] [PubMed] [Google Scholar]
- 8.CDC, “Understanding the Epidemic,” 2020. https://www.cdc.gov/opioids/basics/epidemic.html (accessed Jan. 21, 2022).
- 9.Goetzel R. Z., Hawkins K., Ozminkowski R. J., and Wang S. H., “The health and productivity cost burden of the ‘top 10’ physical and mental health conditions affecting six large US employers in 1999,” J. Occup. Environ. Med., vol. 45, no. 1, pp. 5–14, 2003, doi: 10.1097/00043764-200301000-00007 [DOI] [PubMed] [Google Scholar]
- 10.Stewart W. F., Ricci J. A., Chee E., Hahn S. R., and Morganstein D., “Cost of lost productive work time among US workers with depression,” JAMA-JOURNAL Am. Med. Assoc., vol. 289, no. 23, pp. 3135–3144, Jun. 2003, doi: 10.1001/jama.289.23.3135 [DOI] [PubMed] [Google Scholar]
- 11.Garcia F. L. G. and Murillo J. C. A., “The United Nations and 21st century security challenges in Colombia,” Rev. Cient. Gen. Jose Maria Cordova, vol. 19, no. 36, pp. 929–940, 2021, doi: 10.21830/19006586.875 [DOI] [Google Scholar]
- 12.Aschner J. P. and Montero J. C., “Architectures, spaces, and territories of illicit drug trafficking in Colombia and Mexico:,” vol. 17, no. 3, pp. 327–351, Mar. 2020, doi: 10.1177/1741659020910212 [DOI] [Google Scholar]
- 13.ODC, “Observatorio de drogas de Colombia,” 2022. https://www.minjusticia.gov.co/programas-co/ODC/Paginas/SIDCO-departamento-municipio.aspx (accessed Jun. 09, 2022).
- 14.DANE, “Encuesta Nacional de Consumo de Sustancias Psicoactivas,” 2020. Accessed: Apr. 23, 2021. [Online]. https://www.dane.gov.co/files/investigaciones/boletines/encspa/comunicado-encspa-2019.pdf
- 15.DANE, “Estudio nacional de consumo de sustancias psicoactivas en Colombia,” Bogotá, 2014. Accessed: Jan. 17, 2022. [Online]. https://www.unodc.org/documents/colombia/2014/Julio/Estudio_de_Consumo_UNODC.pdf
- 16.UNODC, “Drogas sintéticas y nuevas sustancias psicoactivas en América Latina y el Caribe 2021,” Viena, 2021. Accessed: Jan. 21, 2022. [Online]. https://www.minjusticia.gov.co/programas-co/ODC/Documents/Publicaciones/GlobalSmartLA(1).pdf?csf=1&e=MH9EHg
- 17.Griffiths P. and Mcketin R., “Developing a global perspective on drug consumption patterns and trends-the challenge for drug epidemiology,” Bull. Narcotics, vol. 5, no. 1, 2003. [Google Scholar]
- 18.Lanier W. A., Johnson E. M., Rolfs R. T., Friedrichs M. D., and Grey T. C., “Risk factors for prescription opioid-related death, Utah, 2008–2009,” Pain Med., vol. 13, no. 12, pp. 1580–1589, 2012, doi: 10.1111/J.1526-4637.2012.01518.X [DOI] [PubMed] [Google Scholar]
- 19.Martins S. S., Sampson L., Cerdá M., and Galea S., “Worldwide Prevalence and Trends in Unintentional Drug Overdose: A Systematic Review of the Literature,” Am. J. Public Health, vol. 105, no. 11, pp. e29–e49, Nov. 2015, doi: 10.2105/AJPH.2015.302843 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.CDC, “Today’s Heroin Epidemic,” Centers for Disease Control and Prevention, 2015. https://www.cdc.gov/vitalsigns/heroin/index.html (accessed Jan. 16, 2022).
- 21.Fuller C. M. et al., “Effects of race, neighborhood, and social network on age at initiation of injection drug use,” Am. J. Public Health, vol. 95, no. 4, pp. 689–695, Apr. 2005, doi: 10.2105/AJPH.2003.02178 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Fite P. J., Wynn P., Lochman J. E., and Wells K. C., “The Influence of Neighborhood Disadvantage and Perceived Disapproval on Early Substance Use Initiation,” Addict. Behav., vol. 34, no. 9, p. 769, Sep. 2009, doi: 10.1016/j.addbeh.2009.05.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Friedman S. R. et al., “Income inequality, drug-related arrests, and the health of people who inject drugs: Reflections on seventeen years of research,” Int. J. Drug Policy, vol. 32, pp. 11–16, Jun. 2016, doi: 10.1016/j.drugpo.2016.03.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Jensen M., Chassin L., and Gonzales N. A., “Neighborhood Moderation of Sensation Seeking Effects on Adolescent Substance Use Initiation,” J. Youth Adolesc., vol. 46, no. 9, p. 1953, Sep. 2017, doi: 10.1007/s10964-017-0647-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Sarah C. and Leonard A J., “Contextual Perspectives on Heroin Addiction and Recovery: Classic and Contemporary Theories,” Int. Arch. Public Heal. Community Med., vol. 2, no. 1, Dec. 2018, doi: 10.23937/IAPHCM-2017/1710009 [DOI] [Google Scholar]
- 26.Bozorgi P., Porter D. E., Eberth J. M., Eidson J. P., and Karami A., “The leading neighborhood-level predictors of drug overdose: A mixed machine learning and spatial approach,” Drug Alcohol Depend., vol. 229, p. 109143, Dec. 2021, doi: 10.1016/j.drugalcdep.2021.109143 [DOI] [PubMed] [Google Scholar]
- 27.Galea S., Rudenstine S., and Vlahov D., “Drug use, misuse, and the urban environment,” Drug Alcohol Rev., vol. 24, no. 2, pp. 127–136, Mar. 2005, doi: 10.1080/09595230500102509 [DOI] [PubMed] [Google Scholar]
- 28.Latkin C. A., Forman V., Knowlton A., and Sherman S., “Norms, social networks, and HIV-related risk behaviors among urban disadvantaged drug users,” Soc. Sci. Med., vol. 56, no. 3, pp. 465–476, Feb. 2003, doi: 10.1016/s0277-9536(02)00047-3 [DOI] [PubMed] [Google Scholar]
- 29.Schroeder J. R., Latkin C. A., Hoover D. R., Curry A. D., Knowlton A. R., and Celentano D. D., “Illicit drug use in one’s social network and in one’s neighborhood predicts individual heroin and cocaine use,” Ann. Epidemiol., vol. 11, no. 6, pp. 389–394, 2001, doi: 10.1016/s1047-2797(01)00225-3 [DOI] [PubMed] [Google Scholar]
- 30.Campo-Arias A., Suárez-Colorado Y. P., and Caballero- Domínguez C. C., “Asociación entre el consumo de Cannabis y el riesgo de suicidio en adolescentes escolarizados de Santa Marta, Colombia,” Biomédica, vol. 40, no. 3, p. 569, 2020 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Fajardo A. L., “Consumption of psychopharmaceuticals in the city of Bogota (Colombia): a new reality,” Arch. Med., vol. 18, no. 2, 2018, doi: 10.30554/archmed.18.2.2743.2018 [DOI] [Google Scholar]
- 32.Scoppetta O. and Castaño G. A., “Early drug consumption and subsequent risk of illicit drug use in Colombia,” Addict. Disord. their Treat., vol. 18, no. 1, pp. 10–14, Mar. 2019, doi: 10.1097/ADT.0000000000000144 [DOI] [Google Scholar]
- 33.Scheuer C. et al., “El consumo de sustancias psicoactivas en jóvenes estudiantes de una institución educativa del municipio de Neira (Caldas): un estudio de caso desde la mirada de la educación inclusiva,” Cult. y Drog., vol. 23, no. 26, pp. 343–354, Jul. 2018. [Google Scholar]
- 34.Kalyanam J., Katsuki T., Lanckriet G. R.G., and Mackey T. K., “Exploring trends of nonmedical use of prescription drugs and polydrug abuse in the Twittersphere using unsupervised machine learning,” Addict. Behav., vol. 65, pp. 289–295, Feb. 2017, doi: 10.1016/j.addbeh.2016.08.019 [DOI] [PubMed] [Google Scholar]
- 35.Narvaez-Chicaiza M. A., “Harm Reduction Policies Where Drugs Constitute a Security Issue,” Heal. Care Anal., vol. 28, no. 4, pp. 382–390, Dec. 2020, doi: 10.1007/s10728-020-00415-9 [DOI] [PubMed] [Google Scholar]
- 36.Restrepo-Escobar S. M. and Cardona E. A. S., “Educational and prevention campaigns. A review on the use of psychoactive substances in Colombian university students,” Interdisciplinaria, vol. 38, no. 2, pp. 199–208, 2021, doi: 10.16888/INTERD.2021.38.2.13 [DOI] [Google Scholar]
- 37.Hancer E., Xue B., and Zhang M., “A survey on feature selection approaches for clustering,” Artif. Intell. Rev. 2020 536, vol. 53, no. 6, pp. 4519–4545, Jan. 2020, doi: 10.1007/S10462-019-09800-W [DOI] [Google Scholar]
- 38.Wager T. D., Atlas L. Y., Lindquist M. A., Roy M., Woo C.-W., and Kross E., “An fMRI-Based Neurologic Signature of Physical Pain,” N. Engl. J. Med., vol. 368, no. 15, pp. 1388–1397, 2013, doi: 10.1056/NEJMoa1204471 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Henriksson A., Kvist M., Dalianis H., and Duneld M., “Identifying adverse drug event information in clinical notes with distributional semantic representations of context,” J. Biomed. Inform., vol. 57, pp. 333–349, Oct. 2015, doi: 10.1016/j.jbi.2015.08.013 [DOI] [PubMed] [Google Scholar]
- 40.Squeglia L. M. et al. , “Neural Predictors of Initiating Alcohol Use During Adolescence,” Am. J. Psychiatry, vol. 174, no. 2, pp. 172–185, Feb. 2017, doi: 10.1176/appi.ajp.2016.15121587 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Conway M. and O’Connor D., “Social media, big data, and mental health: current advances and ethical implications,” Curr. Opin. Psychol., vol. 9, pp. 77–82, Jun. 2016, doi: 10.1016/j.copsyc.2016.01.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Katsuki T., Mackey T. K., and Cuomo R., “Establishing a Link Between Prescription Drug Abuse and Illicit Online Pharmacies: Analysis of Twitter Data,” J. Med. INTERNET Res., vol. 17, no. 12, 2015, doi: 10.2196/jmir.5144 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Degenhardt L. et al., “The global epidemiology and burden of psychostimulant dependence: Findings from the Global Burden of Disease Study 2010,” Drug Alcohol Depend., vol. 137, pp. 36–47, 2014, doi: 10.1016/j.drugalcdep.2013.12.025 [DOI] [PubMed] [Google Scholar]
- 44.Whiteford H. A. et al. , “Global burden of disease attributable to mental and substance use disorders: findings from the Global Burden of Disease Study 2010,” Lancet, vol. 382, no. 9904, pp. 1575–1586, Nov. 2013, doi: 10.1016/S0140-6736(13)61611-6 [DOI] [PubMed] [Google Scholar]
- 45.Bowman F. D., Caffo B., Bassett S. S., and Kilts C., “A Bayesian hierarchical framework for spatial modeling of fMRI data,” Neuroimage, vol. 39, no. 1, pp. 146–156, 2008, doi: 10.1016/j.neuroimage.2007.08.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Shannon K., Rusch M., Shoveller J., Alexson D., Gibson K., and Tyndall M. W., “Mapping violence and policing as an environmental-structural barrier to health service and syringe availability among substance-using women in street-level sex work,” Int. J. DRUG POLICY, vol. 19, no. 2, pp. 140–147, 2008, doi: 10.1016/j.drugpo.2007.11.024 [DOI] [PubMed] [Google Scholar]
- 47.Freisthler B., Needell B., and Gruenewald P. J., “Is the physical availability of alcohol and illicit drugs related to neighborhood rates of child maltreatment?,” Child Abuse Negl., vol. 29, no. 9, pp. 1049–1060, Sep. 2005, doi: 10.1016/j.chiabu.2004.12.014 [DOI] [PubMed] [Google Scholar]
- 48.Bass J. K. and Lambert S. F., “Urban adolescents’ perceptions of their neighborhoods: An examination of spatial dependence,” J. Community Psychol., vol. 32, no. 3, pp. 277–293, May 2004, doi: 10.1002/jcop.20005 [DOI] [Google Scholar]
- 49.Chaix B. et al. , “Spatial clustering of mental disorders and associated characteristics of the neighbourhood context in Malmo, Sweden, in 2001,” J. Epidemiol. Community Health, vol. 60, no. 5, pp. 427–435, May 2006, doi: 10.1136/jech.2005.040360 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Mowbray C. T., Holter M. C., Teague G. B., and Bybee D., “Fidelity criteria: Development, measurement, and validation,” Am. J. Eval., vol. 24, no. 3, pp. 315–340, 2003, doi: 10.1177/109821400302400303 [DOI] [Google Scholar]
- 51.Peet M. and Stokes C., “Omega-3 fatty acids in the treatment of psychiatric disorders,” Drugs, vol. 65, no. 8, pp. 1051–1059, 2005, doi: 10.2165/00003495-200565080-00002 [DOI] [PubMed] [Google Scholar]
- 52.Chichester K. et al. , “Pharmacies and features of the built environment associated with opioid overdose: A geospatial comparison of rural and urban regions in Alabama, USA,” Int. J. Drug Policy, vol. 79, May 2020, doi: 10.1016/j.drugpo.2020.102736 [DOI] [PubMed] [Google Scholar]
- 53.Geissert P. et al. , “High-risk prescribing and opioid overdose: prospects for prescription drug monitoring program-based proactive alerts,” Pain, vol. 159, no. 1, pp. 150–156, Jan. 2018, doi: 10.1097/j.pain.0000000000001078 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Fraley C. and Raftery A. E., “How many clusters? Which clustering method? Answers via model-based cluster analysis,” Comput. J., vol. 41, no. 8, pp. 586–588, 1998, doi: 10.1093/COMJNL/41.8.578 [DOI] [Google Scholar]
- 55.Saxena A. et al. , “A review of clustering techniques and developments,” Neurocomputing, vol. 267, pp. 664–681, Dec. 2017, doi: 10.1016/J.NEUCOM.2017.06.053 [DOI] [Google Scholar]
- 56.Li B., Pi D., Lin Y., and Cui L., “DNC: A Deep Neural Network-based Clustering-oriented Network Embedding Algorithm,” J. Netw. Comput. Appl., vol. 173, Jan. 2021, doi: 10.1016/J.JNCA.2020.102854 [DOI] [Google Scholar]
- 57.J. Xie, R. Girshick, and A. Farhadi, “Unsupervised Deep Embedding for Clustering Analysis,” 33rd Int. Conf. Mach. Learn. ICML 2016, vol. 1, pp. 740–749, Nov. 2016, Accessed: Jan. 15, 2022. [Online]. https://arxiv.org/abs/1511.06335v2
- 58.S. Sharifipour, H. Fayyazi, and M. Sabokro, “Unsupervised Feature Selection using Encoder-Decoder Networks,” 6th Iran. Conf. Signal Process. Intell. Syst. ICSPIS 2020, Dec. 2020.
- 59.DANE, “Departamento Administrativo Nacional de Estadística. Censo Nacional de Población y Vivienda 2018. Proyecciones de Población 2018–2020, total municipal por área Junio 30.” Bogotá D.C, Colombia, 2018.
- 60.DNP, “Avances y complementariedades estratégicas de los Distritos en el marco de los esquemas asociativos territoriales,” Bogotá D.C, 2018. [Online]. https://colaboracion.dnp.gov.co/CDT/DesarrolloTerritorial/ConversatorioDistritoCali04_10_2018-SantiagoArroyo.pdf
- 61.UNODC, “Monitoreo de territorios afectados por cultivos ilícitos 2020,” Bogotá, 2021. Accessed: Jan. 14, 2022. [Online]. https://www.unodc.org/documents/crop-monitoring/Colombia/Colombia_Monitoreo_de_territorios_afectados_por_cultivos_ilicitos_2020.pdf
- 62.ODC, “Estudio nacional de consumo de sustancias psicoactivas,” Bogotá, 2019. Accessed: Jan. 14, 2022. [Online]. https://www.odc.gov.co/Portals/1/publicaciones/pdf/estudioNacionaldeconsumo2019.pdf
- 63.DANE, “Encuesta Nacional de Consumo de Sustancias Psicoactivas en Población General 2019,” 2020. https://microdatos.dane.gov.co/index.php/catalog/680/get_microdata (accessed Jan. 14, 2022).
- 64.J. Espinosa, “Shapefile,” 2022. https://hub.arcgis.com/datasets/de0e829ddbf743c895ba6dcee1b74fae/about (accessed Jun. 09, 2022).
- 65.Anselin L., Spatial Econometrics: Methods and Models, Springer; Netherlands. 1988. [Google Scholar]
- 66.Moran P., The Interpretation of Statistical Maps, 2nd ed., vol. 10. Journal of the Royal Statistical Society, 1948. Accessed: Jan. 23, 2022. [Online]. https://www.jstor.org/stable/2983777 [Google Scholar]
- 67.Geary R. C., “The Contiguity Ratio and Statistical Mapping,” Inc. Stat., vol. 5, no. 3, p. 115, Nov. 1954, doi: 10.2307/2986645 [DOI] [Google Scholar]
- 68.Getis A. and Ord J. K., “The Analysis of Spatial Association by Use of Distance Statistics,” Geogr. Anal., vol. 24, no. 3, pp. 189–206, Jul. 1992, doi: 10.1111/J.1538-4632.1992.TB00261.X [DOI] [Google Scholar]
- 69.Anselin L., “Local Indicators of Spatial Association—LISA,” Geogr. Anal., vol. 27, no. 2, pp. 93–115, Apr. 1995, doi: 10.1111/J.1538-4632.1995.TB00338.X [DOI] [Google Scholar]
- 70.Duque J. C., Ramos R., and Suriñach J., “Supervised Regionalization Methods: A Survey:,” vol. 30, no. 3, pp. 195–220, Jul. 2016, doi: 10.1177/0160017607301605 [DOI] [Google Scholar]
- 71.S. Rey, D. Arribas-Bel, and L. Wolf, Geographic Data Science with Python. 2020. Accessed: Jan. 23, 2022. [Online]. https://geographicdata.science/book/intro.html
- 72.Van Rossum G. and Drake F. L., Python 3 Reference Manual. Scotts Valley, CA: CreateSpace, 2009. [Google Scholar]
- 73.J. Masci, U. Meier, D. Cireşan, and J. Schmidhuber, “Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 6791 LNCS, no. PART 1, pp. 52–59, 2011.
- 74.Shrestha A. and Mahmood A., “Review of deep learning algorithms and architectures,” IEEE Access, vol. 7, pp. 53040–53065, 2019, doi: 10.1109/ACCESS.2019.2912200 [DOI] [Google Scholar]
- 75.Caliñski T. and Harabasz J., “A Dendrite Method Foe Cluster Analysis,” Commun. Stat., vol. 3, no. 1, pp. 1–27, 1974, doi: 10.1080/03610927408827101 [DOI] [Google Scholar]
- 76.Davies D. L. and Bouldin D. W., “A Cluster Separation Measure,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-1, no. 2, pp. 224–227, 1979, doi: 10.1109/TPAMI.1979.4766909 [DOI] [PubMed] [Google Scholar]
- 77.Rousseeuw P. J., “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis,” J. Comput. Appl. Math., vol. 20, no. C, pp. 53–65, Nov. 1987, doi: 10.1016/0377-0427(87)90125-7 [DOI] [Google Scholar]
- 78.ODC, “Density of drug production in Colombia,” 2021. https://www.datos.gov.co/d/acs4-3wgp/visualization (accessed Jul. 05, 2022).
- 79.Clarke H., Soneji N., Ko D. T., Yun L., and Wijeysundera D. N., “Rates and risk factors for prolonged opioid use after major surgery: population based cohort study,” BMJ, vol. 348, Feb. 2014, doi: 10.1136/bmj.g1251 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Kuo Y. F., Raji M. A., Chen N. W., Hasan H., and Goodwin J. S., “Trends in Opioid Prescriptions Among Part D Medicare Recipients From 2007 to 2012,” Am. J. Med., vol. 129, no. 2, pp. 221.e21–221.e30, Feb. 2016, doi: 10.1016/j.amjmed.2015.10.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Puigcorbé S. et al., “Assessing the association between tourism and the alcohol urban environment in Barcelona: a cross-sectional study,” BMJ Open, vol. 10, no. 9, p. e037569, Sep. 2020, doi: 10.1136/bmjopen-2020-037569 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Easwaran M., Bazroy J., Jayaseelan V., and Singh Z., “Prevalence and determinants of alcohol consumption among adult men in a coastal area of south India,” Int. J. Med. Sci. Public Heal., vol. 4, no. 3, p. 360, 2015, doi: 10.5455/IJMSPH.2015.1010201479 [DOI] [Google Scholar]
- 83.Chinnakali P., Thekkur P., Manoj Kumar A., Ramaswamy G., Bharadwaj B., and Roy G., “Alarmingly high level of alcohol use among fishermen: A community based survey from a coastal area of south India,” J. Forensic Leg. Med., vol. 42, pp. 41–44, Aug. 2016, doi: 10.1016/j.jflm.2016.05.006 [DOI] [PubMed] [Google Scholar]
- 84.DANE, “Producto Interno Bruto por departamento,” 2021.
- 85.García M. C. et al. , “Opioid Prescribing Rates in Nonmetropolitan and Metropolitan Counties Among Primary Care Providers Using an Electronic Health Record System—United States, 2014–2017,” MMWR. Morb. Mortal. Wkly. Rep., vol. 68, no. 2, pp. 25–30, Jan. 2019, doi: 10.15585/mmwr.mm6802a1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Keyes K. M., Cerdá M., Brady J. E., Havens J. R., and Galea S., “Understanding the Rural–Urban Differences in Nonmedical Prescription Opioid Use and Abuse in the United States,” Am. J. Public Health, vol. 104, no. 2, p. e52, Feb. 2014, doi: 10.2105/AJPH.2013.301709 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.King N. B., Fraser V., Boikos C., Richardson R., and Harper S., “Determinants of Increased Opioid-Related Mortality in the United States and Canada, 1990–2013: A Systematic Review,” Am. J. Public Health, vol. 104, no. 8, p. e32, 2014, doi: 10.2105/AJPH.2014.301966 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.UNODC, “Persistencia de los cultivos de coca en la Región Pacífica,” 2010.