Skip to main content
PLOS One logoLink to PLOS One
. 2023 Aug 18;18(8):e0290098. doi: 10.1371/journal.pone.0290098

Leading consumption patterns of psychoactive substances in Colombia: A deep neural network-based clustering-oriented embedding approach

Kevin Palomino 1,*, Carmen R Berdugo 1, Jorge I Vélez 1
Editor: Vinícius Silva Belo2
PMCID: PMC10438020  PMID: 37594973

Abstract

The number of health-related incidents caused using illegal and legal psychoactive substances (PAS) has dramatically increased over two decades worldwide. In Colombia, the use of illicit substances has increased up to 10.3%, while the consumption alcohol and tobacco has increased to 84% and 12%, respectively. It is well-known that identifying drug consumption patterns in the general population is essential in reducing overall drug consumption. However, existing approaches do not incorporate Machine Learning and/or Deep Data Mining methods in combination with spatial techniques. To enhance our understanding of mental health issues related to PAS and assist in the development of national policies, here we present a novel Deep Neural Network-based Clustering-oriented Embedding Algorithm that incorporates an autoencoder and spatial techniques. The primary goal of our model is to identify general and spatial patterns of drug consumption and abuse, while also extracting relevant features from the input data and identifying clusters during the learning process. As a test case, we used the largest publicly available database of legal and illegal PAS consumption comprising 49,600 Colombian households. We estimated and geographically represented the prevalence of consumption and/or abuse of both PAS and non-PAS, while achieving statistically significant goodness-of-fit values. Our results indicate that region, sex, housing type, socioeconomic status, age, and variables related to household finances contribute to explaining the patterns of consumption and/or abuse of PAS. Additionally, we identified three distinct patterns of PAS consumption and/or abuse. At the spatial level, these patterns indicate concentrations of drug consumption in specific regions of the country, which are closely related to specific geographic locations and the prevailing social and environmental contexts. These findings can provide valuable insights to facilitate decision-making and develop national policies targeting specific groups given their cultural, geographic, and social conditions.

1. Introduction

Psychoactive substances (PAS) are chemical substances that change the function of the nervous system and cause alterations in people’s perception, mood, consciousness, cognition, or behavior [1]. PAS can be grouped according to their chemical structure as synthetic cannabinoids, synthetic cathinones, phenethylamines, arylcyclohexylamines, tryptamines, indolalkylamines, new synthetic opioids, piperazines, ketamine, and designer benzodiazepines. They can also be grouped according to their origin as a natural origin or synthetic molecules [24].

The increased numbers of drug use among young people are drawing the attention of national governments [5]. Because the number of health-related incidents caused by using legal and illegal PAS worldwide has dramatically increased over the last two decades [2, 6], this phenomenon has become some of the largest burdens of disease [7, 8]. Drug use constitutes a high cost to society due to premature mortality, increased health expenditure, criminal justice (drug and micro-trafficking), social welfare costs, and other social consequences [9, 10].

Colombia is ranked as one of the largest drug producers in the world [11]. Unfortunately, the production and commercialization of drugs through drug- and micro-trafficking, constantly expands in locations with high levels of poverty and limited government presence [12]. Statistics and indicators of drug consumption, production, and distribution, as well as reports from the National Statistical System (DANE) of Colombia, highlight a dramatic increase in (i) drug production [13]; (ii) intern consumption of PAS at early ages; and (iii) the prevalence in use and/or abuse of drugs have dramatically increased over the last 20 years [14, 15]. Furthermore, the country also has the highest prevalence of drug use among school students in recent years compared to other Latin American countries [16]. Thus, there is an urgent need to develop effective interventions to prevent the use and/or abuse of PAS.

The first step towards reducing this consumption is to identify drug consumption patterns in the general population [17]. Several studies have identified patterns associated with drug use and consumption [18, 19]. According to the Center for Disease Control and Prevention, individuals who do not have their own homes and live in rented accommodations are more likely to use drugs [20]. Other research studies suggest that neighborhood contextual characteristics may increase the risk of substance abuse [2126]. Additionally, population density may also influence substance use and overdose risk through a higher level of socialization in densely populated urban areas [2729]. Other authors have identified that anxiety, sleep disorders, suicide, depression, and other mental illnesses are risk factors for the consumption and abuse of PAS [30, 31]. Furthermore, early marijuana use has been shown to increase the risk of consuming other PAS [32]. Furthermore, people involved in sports and artistic activities perceive drugs as enhancers element for improving their performance [33].

In Colombia, drug consumption patterns and risk factors have also been identified. For instance, Kalyanam et al. [34] analyzed the social impact of basuco and inhalant use among street youths. Narvaez-Chicaiza [35] assessed the social factors that lead to the adoption of harm reduction policies and how these factors influence treatments for substance abuse disorders. Additionally, Restrepo-Escobar & Cardona [36] demonstrated that university students with low satisfaction in their studies tend to be heavy users of alcohol, tobacco, and marijuana. However, these approaches do not consider the use of Machine Learning (ML) and/or Deep Data Mining techniques in combination with spatial models to analyze drug consumption data from the general population. To our knowledge, we have not found any robust models integrating ML and spatial models to identify drug consumption patterns using publicly available Colombian databases.

Although several techniques for analyzing drug consumption patterns are currently available (i.e., ML, Bayesian, spatial, traditional multivariate, or univariate statistical models, or, in some cases, a combination of these), new trends in pattern identification and analysis techniques focus on hybrid and ensemble models [37]. Currently, the most widely used ML techniques are Support Vector Machines (SVMs), Random Forest (RF), and Natural Language Processing (NLP) [3842]. Among Bayesian models, Bayesian meta-regression (DisMod-MR), Bayesian hierarchical models, and Markov Chain Monte Carlo are the most attractive methods [4345]. Regarding spatial models, Spatial Distributions, Spatial Regression Models, Spatial Scan Statistics, Variograms, and Social Mapping are the most frequently used techniques [4649]. On the other hand, logistic regression, confirmatory factor analysis, and correlational analysis are the most employed traditional statistical models to identify drug-associated patterns [5053].

Fraley and Raftery [54] suggest separating clustering approaches into hierarchical and partitioning techniques. Partitioning techniques are divided into density-, model-, and grid-based methods, the most popular of which are K-means, PAM, CLARA, DBSCAN and CLIQUE. On the other hand, hierarchical techniques are divided into agglomerative and divisive methods. Of these, the best-known methods are BRICH, CURE, ROCK, and CHAMELEON (see [55] for further reading). Although these techniques have been shown to perform well when relevant features are removed a priori, it is well-known that in clustering algorithms, irrelevant and redundant features in the data may degrade the quality of clusters and lead to high computational cost. Therefore, removing such features may alleviate these issues. Thus, we focus on identifying patterns of PAS consumption using an ensemble model integrating an autoencoder with both a clustering algorithm and a spatial model. As part of our approach, we used the most recent and representative works for data clustering, and different dimensionality reduction and feature selection methodologies proposed in the literature.

Feature selection approaches in clustering can be split into filter, wrapper, embedded, and hybrid approaches [37]. While wrappers depend on the clustering algorithms to evaluate the clustering quality of a selected feature subset, filters are independent of the clustering algorithm. Embedded approaches also work with a clustering algorithm and, unlike wrappers, incorporate knowledge about the clustering structure. Another type of method is hybrid approaches, which combine filter and wrapper approaches into a single strategy. However, studies on embedded and hybrid feature selection approaches in clustering are limited [37]. Other feature learning-based approaches using Deep Neural Networks have been shown to work well for linear and nonlinear models [56]. For instance, Xie et al. in (2016) [57] propose to work on feature extraction and clustering using pre-trained Auto-Encoders simultaneously. However, these are mainly used to work and process images. In general, deep clustering models use Auto-Encoders since they can learn input features without labels on the data; performance measures show that this approach is reliable for different data types [58]. Thus, deep clustering methods have become a growing field of research for feature selection [58]. In this regard, the use of convolutional networks in autoencoders and the application of feature selection for clustering are open questions that have not been fully addressed yet, especially when dealing with data from different statistical distributions [37].

Here, we propose a Deep Neural Network-based Clustering-oriented Embedding Algorithm that allows us to (i) identify consumption patterns of PAS; and (ii) build an ensemble algorithm integrating an autoencoder with a clustering algorithm and a spatial model to deal with the feature space and cluster memberships. Our approach is based on the model proposed by Xie et al. [57] and B. Li et al. [56], and expands their work by creating an autoencoder from a convolutional network to represent high-order interactions in the data accurately and simultaneously incorporate a spatial analysis to describe drug consumption patterns properly. Our main hypothesis is that incorporating these two critical elements in our proposal will help to identify and better understand drug consumption patterns and support national policy development processes.

2. Materials and methods

2.1 Study area

Located in South America, the Republic of Colombia is a diverse country with a population of over 50 million people distributed over a territory of 440,831 square miles [59], encompassing jungles, highlands, grasslands, deserts, coasts, and islands, distributed in six regions and 32 departments (states) [60] (See S1 Fig). It is worth noting that, unfortunately, Colombia has been a major producer of illegal drugs for a long time, which has had a significant impact on drug consumption and abuse.

According to the United Nations Office on Drugs and Crime, Colombia is the first cocaine-producing country and the eighth country with the highest production of cannabis [61]. In addition, the Colombian Drug Observatory indicates that the use of illicit substances in the territory has increased to 10.3%, with men between the ages of 18 and 24 being the heaviest consumers of these types of drugs. Reports also indicate that consumption of licit substances such as alcohol and tobacco has recently increased dramatically [62].

2.2 Data sources

We used two databases to identify drug consumption patterns in Colombia. The first database was retrieved from the 2019 National Survey of Psychoactive Substance Consumption in the General Population (DANE-DIMPE-ENCSPA-2019; URL: https://microdatos.dane.gov.co/index.php/catalog/680/data-dictionary) conducted by the National Statistical System (DANE) of Colombia [63]. This survey includes observations of 49,600 households, where information on housing, location, general characteristics of individuals, consumption of legal and illegal PAS, and implemented treatments is registered. The second database comes from the Colombian Drug Observatory and contains information on the production of PAS per area during 2019. All these databases are fully available and completely anonymized. In this study, we used departments (states) as georeferenced areas using polygons (i.e., a shapefile) as implemented in ArcGIS Hub [64]. Thus, an ethics statement approved by an ethics committee is not required since we are using public information without the identification or individual information of the people involved.

2.3 Convolutional Auto-Encoder-Deep Embedded Clustering algorithm

Fig 1 presents the proposed Convolutional Auto-Encoder- Deep Embedded Clustering (CAE-DEC) framework based on the implementation presented by Xie et al. [57]. However, unlike the Xie et al. model, our structure is developed by applying convolutional layers for the deep autoencoder (DA) architecture instead of a linear one to represent high-order interactions in the data. In addition, a spectral clustering-based centroid estimation is proposed to achieve an improved initial centroid calculation. We chose the CAE-DEC framework based on its ability to reduce both the number of model parameters and the dimensionality, while creating clusters simultaneously.

Fig 1. Architecture of the proposed CAE-DEC model.

Fig 1

Republished from [64] under a CC BY license, with permission from [ArcGIS Hub], original copyright [2016].

In our approach, an encoder structure is first applied to map the input vector into a lower feature space, called latent feature space (LFS). Then, the LFS is independently passed through a decoder structure and a clustering layer to achieve an efficient clustering framework. The encoder-decoder combination (DA) attempts to extract a LFS preserving the relevant information from the original input data. On the other hand, the clustering layer seeks to execute an improved clustering assignment by minimizing the divergence between a target distribution and a centroid-based probability distribution.

In the last stage of the framework, a spatial analysis was performed using the feature space generated from the autoencoder as input. Here, the spatial data exploration is initially performed using Global Spatial Autocorrelation to determine to which level the similarity between observations in a dataset relates to the similarity of the locations of such observations [65]. To assess GSA, the Moran’s I [66], Geary’s C [67], and Getis and Ord’s G [68] statistics are estimated. We also measure the Local Spatial Autocorrelation, which focuses on the relationships between each observation and its surroundings, rather than providing a single-number summary of these relationships across the map [69]. This is estimated based on the ability to determine whether spatial autocorrelation is present in a geographically referenced data set. Finally, we perform regionalization, which corresponds to a special kind of clustering where the objective is to group similar observations based on their statistical attributes and spatial location [70]. In this sense, regionalization embeds the same logic as standard clustering techniques while applying a series of geographical constraints [71]. This framework was built using the TensorFlow (https://www.tensorflow.org/) and PyTorch (https://pytorch.org/) libraries in Python version 3.11 [72].

2.4 Convolutional Auto-Encoder (CAE)

The DA is a deep neural network architecture capable of learning unsupervised representations of an input data set. Typically, DA networks are used for dimensionality reduction or denoising tasks. The structure of a DA is based on two deep networks: a network to transform the original input data into a latent feature space, and a network trained to reconstruct the original input data using the extracted latent space as input. The first network, used to extract the latent space, is called the encoder, while the second is called the decoder. Rather than using fully connected layers, the implemented DA architecture incorporates convolutional (CONV) layers and fully connected (FC) layers for LFS extraction and reconstruction (Fig 1). Integrating convolutional layers in a DA is also called CAE [73]. Compared to a DA, which is built with only fully connected layers, the CAE structure can reduce the number of parameters compared to a DA [74].

2.4.1 Convolutional layer

The proposed CAE structure is designed using four convolutional layers, two CONV layers during the encoder stage, two CONV layers during the decoder stage, and two fully connected layers (Fig 1). The convolution operation can be denoted as:

zi,j,kl=WklTXi,jl+bkl (1)

where zi,j,kl is the value of each feature at the (i, j) location in the k-th feature map of the l-th layer, Wkl and bkl represent the weight and bias of the k-th filter of the l-th layer, and Xi,jl denotes the input value at location (i, j) of the l-th layer. For non-linear mapping, an activation function g(.) is applied over the convolutional feature zi,j,kl as follows:

ai,j,kl=g(zi,j,kl) (2)

where ai,j,kl is the activation value resulting from applying the activation function g(.). The Rectified Linear Unit (ReLU) function is set as the activation function on each CONV, except in the final decoder CONV layer where a sigmoidal activation is applied.

2.5 Clustering layer

The clustering layer is inspired by Xie et al. [57]. Initially, a soft assignment is computed between the latent space, also known as embedded space, and the cluster centroids. Then, update steps are repeated to define the final cluster centroids and embedded space. The Kullback–Leibler (KL) divergence is used as loss function during the optimization procedure. The objective is to minimize de KL divergence between a soft clustering distribution Q and an auxiliar target distribution P. The KL loss is calculated as:

Lc=KLP||Q=kmpkmlogpkmqkm (3)

where Lc is the clustering loss. To measure the similarity between embedded point zk and the cluster centroid cm, the t Student’s distribution is used as a kernel:

qkm=1+|zkcm|2/αα+12i1+|zkci|2/αα+12 (4)

with α the degrees of freedom of the t Student’s distribution and qkm is a soft clustering assignment distribution of each embedded point (i.e., probability of assigning point k to cluster m). As in Xie et al., when setting α = 1 the similarity function qkm can be calculated as:

qkm=1+|zkcm|21i1+|zkci|21 (5)

To compute the target distribution pkm, the second power of qkm is calculated, and a cluster normalization is applied as follows:

pkm=qkm2/kqkmiqki2/kqki (6)

Then, by minimizing the divergence between P and Q, the embedding learning is achieved through highly confident assignments.

2.5.1 Center initialization

As previously mentioned, the cluster centroids are initialized using a spectral clustering-based approach. The spectral clustering allows flexible distance metrics and provides better cluster estimations than K-means [57]. However, most spectral clustering algorithms have high computational requirements. To overcome these computational requirements, random samples are taken to estimate the cluster centroids. As spectral clustering does not estimate any centroid during the learning process, once the clusters are defined, the mean of each cluster is used as the centroid estimator.

2.6 The CAE-DEC model

Initially, the input data is normalized within the interval [0, 1]. This normalization allows the network to use the most advanced learning rate and avoid the vanishing gradient problems, as well as alleviate overfitting. Further, to achieve a better learning process, the last CONV layer in the decoder structure is activated by a sigmoid activation function. Then, two training steps will be executed during the CAE-DEC learning process. Firstly, a CAE model will be trained to minimize the reconstruction loss Lr computed as

Lr=|xx˜|2 (7)

where x is the normalized input and x~ is the reconstructed output. This pretrained CAE model is then used as the DA structure in the CAE-DEC model.

In the second step, the CAE-DEC model is trained to simultaneously minimize reconstruction loss and clustering loss. The total loss during this training step will be set as

Lt=Lr+CLc (8)

where Lr is the CAE-DEC reconstruction loss, Lc is the CAE-DEC clustering loss, and C is a coefficient to control the loss balance. The training process is shown in Table 1. The goal is to obtain a latent space that minimizes the total loss. Finally, the label of each embedded point is established as

Labelj=argmaxmqjm (9)

where qjm is the probability that point j belongs to a specific cluster center m. On the other hand, the maximum number of iterations Mint and the target distribution P update condition P_change was chosen based on multiple experiments. The final Mint and P_change values were 3000 and 5, respectively. This final P_change improves stability during the training process.

Table 1. Pseudo code for the CAE-DEC training process.

Pseudo code: The CAE-DEC training process
Input data: Number of clusters n; Normalized input data x; Maximum number of iterations Mint; Balance coefficient C; Pretrained CAE; Stop condition Stop; Target distribution P update condition P_change.
Training process:
1. Generate an initial latent space (Z) through the pre-trained CAE
2. Run spectral clustering with Z to generate the initial cluster centers (C)
3. Initialize the CAE-DEC model with the pretrained CAE.
4. Calculate soft assignment distribution Q and target distribution P based on Z and C
for epoch<Minter do:
 if epoch%P_change = = 0 then:
  Calculate soft assignment distribution Q and target distribution P based on Z and C
 end if
 Feed the CAE-DEC with the normalized input data x
 Calculate the reconstruction loss and the clustering loss
 Update CAE-DEC parameters. Weight, Bias, and Centers.
 if Stop = = True then:
  Break
 end if
end for
Obtain the label for each data point from the las optimized Q.
Output: Latent space, labels

2.7 Framework evaluation

We trained the CAE-DEC method using data retrieved from the National Survey of Psychoactive Substance Consumption (DANE-DIMPE-ENCSPA-2019), which contains 49,600 observations. A second database with PAS production figures, was used in the spatial analysis stage to correlate the PSA consumption and production. In order to evaluate the framework, we compared our CAE-DEC approach with other approaches, including CAE, and Principal Component Analysis integrated with clustering (PCA-DEC). For evaluation and comparison purposes, we use the Calinski-Harabasz [75], Davies-Bouldin [76], and Silhouette [77] index as intrinsic clustering metrics. In addition, we used the χ2 statistic to investigate potential associations and differences among the patterns (clusters) identified using our approach.

3. Results

3.1 Model comparison for identifying drug consumption patterns

Fig 2 depicts the LFS resulting after applying the CAE and CAE-DEC models to the data. Among all individuals, we identified three different clusters; 14935 (30.19%) individuals belong to cluster 0, 11528 (23.30%) individuals belong to cluster 1, and 23005 (46.50%) individuals belong to cluster 2. Interestingly, the LFS generated with the CAE-DEC has more defined clusters than the CAE model. Although the CAE model seeks to extract a LFS that preserves the essential characteristics of the input data, our proposed CAE-DEC model not only preserves these important characteristics but, at the same time, also forces the encoder structure to generate representative clusters while extracting the new feature space.

Fig 2. Derived Latent Feature Space based on the (a) CAE and (b) CAE-DEC models.

Fig 2

On the other hand, the reconstruction loss obtained through the CAE model is higher than that of the CAE-DEC model. This result may be related to the fact that the CAE-DEC model used the pre-trained CAE model during its construction. It should be noted that the CAE model alone cannot determine the labels of each point or define clusters in the data. Thus, clusters in Fig 2 were obtained through spectral clustering and were the bases for initializing the centroids in the CAE-DEC model.

3.2 Identification of clusters of psychoactive drugs consumption

Here we analyse the patterns in each cluster obtained using the CAE-DEC model. We defined a priority dummy variable Yij quantifying whether the ith person in household jth has consumed PAS; Yij = 1 when an individual has never consumed PAS and Yij = 2 otherwise. Out of the 49468 individuals in the sample, only 5514 (11.15%) consume PAS. Fig 3a and 3b depict, respectively, the derived cluster structure for individuals consuming PAS and those who reported not consuming, derived from the CAE-DEC model. Our results indicate that individuals in clusters 0 and 2 are more likely to consume some PAS (Fig 3a), while most individuals in cluster 1 do not (Table 2). In particular, 1726 (11.56%) individuals in cluster 0, 392 (3.4%) individuals in cluster 1, and 3396 (14.76%) individuals in cluster 2 have used PAS (Table 2). A χ2-based test of independence reveals that the region where individuals are located, age (years), the type of household they live in, their socioeconomic status (SES), and whether they contribute to the household finances are statistically significantly associated with the cluster they belong to (Table 2).

Fig 3. Resulting clusters for individuals (a) consuming and (b) not consuming psychoactive substances based on the CAE-DEC model.

Fig 3

Table 2. Distribution of demographic and social variables across clusters.

Variables Cluster 0 (n = 14935) Cluster 1 (n = 11528) Cluster 2 (n = 23005) χ 2 df P-value
Region Caribbean 3004 2991 4075 1472.9 10 < .0001
Central-Eastern 2526 2299 6175
Central-Southern 1843 1299 1702
Eje Cafetero–Antioquia 3902 2354 6371
Llanos Orientales 1687 1305 1496
Pacific 1973 1280 3186
Gender Male 6606 3927 10233 386.72 2 < .0001
Female 8329 7601 12772
Housing type House 8250 6538 11834 114.18 6 < .0001
Apartment 6275 4724 10595
Room 395 256 543
Indigenous dwelling 15 10 33
Socioeconomic status 1 3889 3626 6945 843.92 10 < .0001
2 4627 3902 8851
3 4284 2867 5683
4 1341 718 979
5 510 256 362
6 284 159 185
Age (years) (0, 20] 2105 1576 2998 345.87 4 < .0001
(20, 40] 6814 4329 10889
(40, 68] 6016 5623 9118
Contribute to the household finances Yes 10128 7473 16041 85.24 2 < .0001
No 4807 4055 6964

df: Degrees of freedom.

Table 3 shows the adjusted residuals for our model. According to our results, the Central-Eastern region significantly contributes to the Region variable. In this region, the observed value is higher than the expected value in cluster 2, while the observed value is lower than the expected value for cluster 0. Although to a lesser extent, the Llanos Orientales region also significantly contributes the χ2 statistic. Indeed, this region shows fewer observed individuals than the expected number of individuals in cluster 2 and a higher number observed than expected individuals in clusters 0 and 1 (Table 3).

Table 3. Adjusted residuals comparing the observed and expected frequencies based on the cluster analysis.

Variables Cluster 0 (n = 14935) Cluster 1 (n = 11528) Cluster 2 (n = 23005)
Region Caribbean -0.88 17.02 -13.61
Central-Eastern -18.72 -6.76 22.97
Central-Southern 12.54 6.09 -16.7
Eje Cafetero—Antioquia 2.02 -14.36 10.31
Llanos Orientales 11.32 9.59 -18.55
Pacific 0.84 -6.97 5.13
Gender Male 6.68 -19.66 10.52
Female -6.68 19.66 -10.52
Housing type House 4.17 7.13 -9.88
Apartment -4.83 -6.61 10.05
Room 2.2 -1.54 -0.72
Indigenous dwelling -0.72 -1.09 1.59
Socioeconomic status 1 -10.26 5.99 4.37
2 -12.72 -3.3 14.51
3 9.14 -3 -5.87
4 17.29 0.44 -16.29
5 11.12 -0.49 -9.82
6 8.26 1.2 -8.62
Age (years) (0, 20] 2.54 0.61 -2.85
(20, 40] 3.2 -17.23 11.66
(40, 68] -4.98 16.93 -9.77
Contribute to the household finances Yes -0.61 -8.37 7.65
No 0.61 8.37 -7.65

On the other hand, Gender has a higher-than-expected value of males in clusters 0 and 2, while it is lower in cluster 1. For females, the opposite occurs in cluster 1, and lower values are observed in clusters 0 and 2. Similarly, Housing Type has a higher-than-expected value of individuals living at houses in cluster 1 and a lower-than-expected in cluster 2. Conversely, cluster 2 has more individuals living in apartments, and cluster 1 has the lowest (Table 3).

Regarding SES, a higher-than-expected number of individuals in strata 3, 4, 5, and 6 in cluster 0 were found (Table 3). We also observed a lower-than-expected number of individuals in strata 3, 4, 5, and 6 in cluster 2 and a higher-than-expected number in strata 1, 4 and 6 in cluster 1 (Table 3). Moreover, the age variable shows a higher-than-expected observed value for the (0,20] range in cluster 0. For ages between (20,40] years, cluster 2 has a higher-than-expected number of individuals. Conversely, there is a lower number of individuals in cluster 1. Finally, the household economy variable results show that cluster 2 has a higher-than-expected value of individuals contributing to the household finances, and cluster 1 has a lower-than-expected value of individuals not contributing to it. Comparison of Calinski-Harabasz, Davies-Bouldin, and silhouette metrics between a principal component analysis (PCA)-based deep autoencoder (PCA-DEC) and our proposed CAE-DEC model indicates the superiority of the latter (S2 Table).

3.3 Spatial analysis of psychoactive drugs consumption

Different alternative classification algorithms were used to determine the number of choropleth class limits (i.e., Equal Intervals, Quantiles, Maximum Breaks, Box plot, Head-Tail Breaks, Jenks-Caspall, Fisher-Jenks, and Max-p) and compared using the absolute deviation around class medians optimization criterion (Fig 4). According to our results, the Fisher-Jenks classifier performed better and hence was selected.

Fig 4. Absolute deviation around class medians (ADCM) statistic criterion for different alternative classifiers.

Fig 4

Here, lower is better.

Following the same exploratory spatial analysis, we constructed a choropleth with the percentage of PAS use for each of the 32 Colombian departments (Fig 5a). We found that the departments of Arauca, Vichada, Caquetá, Chocó, Magdalena, Cesar, Bolivar, Sucre, Cordoba, and Norte de Santander have low percentages of drug use. However, some of these departments are major drug producers (i.e., Cordoba and Guaviare), according to data from the Drug Observatory of Colombia [78]. Similarly, Putumayo is the department with the highest proportions of PAS use (Fig 5a).

Fig 5. (a) Consumption percentage of psychoactive substances according to the CAE-DEC model; (b) Moran’s I statistic; (c) Moran’s cluster map.

Fig 5

Here, HH, LH, LL and ns represent high-high, low-high, low-low, and not statistically significant quadrants, respectively. This clustering pattern leads to a statistically significant Moran’s I statistic of 0.2 (P-value <0.01). Architecture of the proposed CAE-DEC model. Republished from [64] under a CC BY license, with permission from [ArcGIS Hub], original copyright [2016].

The global Moran’s I results show the presence of a statistically significant positive global spatial autocorrelation (I = 0.2005, P<0.01). Thus, the null hypothesis that the map is random (i.e., that the map shows more spatial patterns than we would expect if the values had been randomly assigned to a location) is rejected. In addition, other global indices such as Geary’s C (C = 0.693, P = 0.003) and Getis and Ord’s G (G = 0.800, P = 0.049) confirm the presence of statistically significant global spatial autocorrelation.

To further explore the relationships between each observation and its environment, the Local Indicators of Spatial Association (LISA) were estimated (more information on LISA statistics is provided in S3 Fig). Fig 5b depicts the Moran diagram, indicating each quadrant’s positive (or negative) association. Specifically, the high-high (HH) and low-low (LL) quadrants indicate a positive association between high and low drug use. On the other hand, the low-high (LH) and high-low (HL) quadrants indicate negative associations with drug use (Fig 5b). Following our results, we found that departments such as Nariño and Cauca belong to the HH cluster. In contrast, la Guajira, Atlántico, Magdalena, Cesar, Norte de Santander, Sucre, and Cordoba belong to the LL. This clustering pattern leads to a statistically significant Moran’s I statistic (P-value <0.01). Thus, a little over 39.4% of the departments are considered, by this analysis, to be part of a spatial cluster (i.e., statistically significant with a P-value <5%). We also identified that, among legal drugs, alcohol and tobacco are the most frequently consumed in the national territory (Fig 6a). At the same time, marijuana, followed by non-prescription tranquilizers and Yagé, and a slight consumption of opioids and Poppers, are the most frequently consumed illegal drugs (Fig 6b).

Fig 6. Frequency of consumption of (a) legal and (b) illegal drugs in Colombia.

Fig 6

Regarding legal drugs, alcohol has the highest consumption rates in Bogotá, Cundinamarca, and Chocó (Fig 7). However, there is moderately high use in Vaupés, Nariño, Bolívar, Magdalena, La Guajira, and Atlántico. As for energy drinks, consumption is the highest in Casanare and Guaviare and has slightly high uses in Boyacá, Nariño, Risaralda, and Arauca. On the other hand, tobacco has the highest consumption in Cundinamarca but has moderately high uses in Bogotá, Boyacá, Nariño, Casanare, Tolima, Quindío, Risaralda, Guainía, Caldas, and Vaupés. It should be mentioned that the use of these drugs is also present across the country but with a lower incidence (Fig 7).

Fig 7. Consumption of illegal drugs by department.

Fig 7

For interpretation purposes, number represents values scaled on a range of 0 to 1. For instance, Bogotá D.C. has the highest LSD consumption and Putumayo has the lowest.

Concerning illegal drugs, non-prescription tranquilizers and stimulants are most prevalent in Casanare (Fig 8). However, the consumption of tranquilizers is slightly higher in Nariño, while inhalants have the highest consumption in Quindío, followed by Cauca, Caldas, and Nariño. Methylene Chloride has the highest consumption in Cauca and a high consumption in Quindío and Nariño; Antioquia, followed by Caldas and Risaralda, shows the highest consumption of popper. On the contrary, marijuana has its highest consumption in Risaralda and moderately high consumption in Caldas, Bogotá, Antioquia, and Quindío. As for cocaine, its consumption is the highest in Risaralda and moderately high in Antioquia (Fig 8).

Fig 8. Consumption of legal drugs by department.

Fig 8

Number represents values scaled on a range of 0 to 1 for psychoactive substance use. Conventions as in Fig 7.

On the other hand, basuco (i.e., cocaine paste) has the highest consumption rate in Guaviare, and critical consumption in Nariño, Cauca, Quindío, Antioquia, and Amazonas; ecstasy has its highest consumption in Risaralda, followed by Bogotá and Caldas; heroin consumption is highest in Vaupes, Huila, Cauca, Quindío, and Arauca, and is slightly higher in Casanare; methamphetamine consumption is highest in Casanare and is moderately high in Boyacá; methadone is most widely used in Quindío, but has slightly high levels of use in Valle del Cauca and Caquetá; opioids are most prevalent in Casanare, followed by Sucre; LSD is most prevalent in Bogotá, but has high levels of use in Caldas, Risaralda, Quindío, and Nariño; mushrooms have their highest consumption in Boyacá and have moderately high uses in Quindío, Risaralda, Bogotá, Cauca, and Casanare; Yagé has a higher incidence in Putumayo; cacao sabanero has its highest consumption in Caldas, and has moderate consumption in Cundinamarca, Bogotá, Antioquia, and Quindío; ketamine has the highest consumption in Casanare, followed by Antioquia; and GHB has the highest consumption in Risaralda, followed by Santander, Valle del Cauca, and Norte de Santander. Finally, 2CB has the highest consumption rate in Risaralda, followed by Caldas. Although the consumption pattern of some departments is not mentioned, there is low and moderate consumption for certain drugs in some of them (Fig 8).

3.4 Regionalization of clusters

We applied a regionalization method as a grouping technique for imposing a spatial restriction, i.e., the result of a regionalization algorithm contains clusters with geographically coherent areas and coherent data profiles. Our approach uses a spatially constrained hierarchical clustering algorithm, which identified three clusters representing the consumption of PAS in the country (Fig 9). The number of clusters was estimated based on the average silhouette indexes, the total intra-cluster variance, and dendrograms (S2 Fig). Following our results, cluster 0 is comprised of departments such as La Guajira, Cesar, Atlántico, Magdalena, Norte de Santander, Bolivar, Sucre, and Cordoba, all of them located in the Northern region of the country; cluster 1 is comprised of Antioquia, Santander, Boyacá, Caldas, Risaralda, and Quindío; and cluster 2 is integrated by the remaining departments (Fig 9). When testing geographical coherence, which is the measure that assesses the “compactness” of a given shape, our results indicate that the clusters derived using the regionalization model represent moderately compact regions. In addition, the feature coherence (i.e., goodness-of-fit) test using different metrics showed that our 3-cluster regionalization structure properly fits the data (S1 Table).

Fig 9. Cluster map of drug use after regionalization.

Fig 9

Republished from [64] under a CC BY license, with permission from [ArcGIS Hub], original copyright [2016].

4. Discussion

In this study, we propose and test a Deep Neural Network-based Clustering-oriented Embedding algorithm (i.e., a ML-based model) for identifying psychoactive substance (PAS) use and abuse patterns in Colombia. This model allows the automatic extraction of features from the input data (such as sex, age, socioeconomic status, and housing type) to determine whether an individual has consumed PAS. It then creates clusters in the new data space generated during the learning process, following the methods outlined in [56, 57]. After the training process, a latent feature space (LFS) is generated, and the results are subsequently analysed.

We have identified clearly marked clusters where the prevalence of individuals who use or do not use PAS is notable. Additionally, we found that region, sex, housing type, socioeconomic strata, age, and whether individuals contribute to household finances have a statistically significant impact on the clustering structure. These findings are consistent with previous studies aimed at identifying PAS consumption patterns [19, 79, 80]. Interestingly, when comparing the CAE-DEC model proposed in this study and the CAE-Spectral model using different metrics (i.e., Silhouette statistic, which measures the internal density of each cluster and the distance that separates them from each other, the Calinski-Harabasz index and the Davies-Bouldin index [DBI]), we found that our model performs better (Silhouette: 0.62 vs. 0.786; Calinski-Harabasz: 22468.26 vs. 775992.45; DBI: 0.2898 vs. 0.63; S2 Table).

Based on our findings, individuals more likely to consume PAS are grouped in cluster 2, while cluster 1 consisted of individuals who did not consume PAS (Table 2). Not surprisingly, a significant proportion of females characterizes cluster 1. In addition, most individuals belong to socioeconomic strata 1, are 40 years old or older, and do not contribute economically to support their household. In contrast, cluster 2 is characterized by a higher proportion of males aged between 20 and 40 in socioeconomical strata 1 and 2, who do not contribute to the household finances (Table 2). Finally, cluster 0 is characterized by a small proportion of males, a higher proportion of individuals in strata 3, 4, 5, and 6, and individuals are more likely to contribute to the household economy (Table 2).

At the level of spatial statistics, we identified that legal drugs such as alcohol have a high prevalence in all regions of Colombia, with a slight tendency to more consumption in coastal areas (Fig 7). In our country, the coastal areas are often popular tourist destinations, and many tourists come to these areas looking for a relaxing experience, which can increase alcohol consumption. Coastal areas typically have warmer temperatures and more sunshine, increasing thirst and making people more likely to consume beverage. Additionally, bars, clubs, and restaurants serve alcoholic beverage due to the high demand from tourists and locals [81, 82]. Another characteristic of this area is the fishing and maritime culture. This culture is often associated with hard work and long working hours, and alcohol may be seen as a way to relax and unwind after a tough day at the sea [83]. Finally, this region has 69% urban and 31% rural zones [59]. The level of development, as measured by gross domestic product (GDP), is the third region with significant economic development in the country [84] (S3 Table). Interestingly, the consumption of illegal drugs is lower in the Northern region than in other regions of the country. However, there is a more representative consumption of non-prescription tranquilizers, opioids, ketamine, GHB, and heroin. In particular, the Atlántico department has the highest consumption proportion within this region (Fig 8).

Tobacco consumption is present in all regions, with a higher proportion in the Central region (Eje Cafetero–Antioquia), where climate conditions resemble temperate weather. Also, this region has a diverse consumption pattern, where drugs such as marijuana, popper, cocaine, ecstasy, inhalants, methadone, heroin, LSD, GHB, 2CB, and mushrooms prevail. This region has Colombia’s largest cities (i.e., Bogotá and Medellin); Bogotá has the highest population density and is a hub for drug trafficking routes, while Medellín has an unfortunate history of drug cartels and gang violence. Ultimately, this region is comprised of 79% urban areas, and the most developed cities in the country are located there [59, 84] (S3 Table).

Energy drinks are more frequently used in the Central-Eastern region, characterized by a continental climate surrounded by flat territory. Our results are in line with the scientific literature suggesting that the location of regions within countries is directly associated with the consumption of PAS [26, 8587]. The consumption of heroin, basuco, non-prescription tranquilizers, stimulants, methamphetamines, opioids, and ketamine characterizes this region. This zone is the second most developed region in the country, and 71% of urban areas [59, 84], (S3 Table).

Our findings also show that the Southern region is more likely to consume illegal drugs, including basuco, heroin, and Yagé (Fig 8). One of the main reasons for this result is that, unfortunately, this region has favourable environmental characteristics (i.e., majority rainforest) for their consumption and production, being the second largest illegal drug-producing region in Colombia [78]. Furthermore, this region has the highest percentage of rurality (55%) compared to the other regions, and its level of development is low as measured by the GDP [59, 84] (S3 Table).

In the Western region (Pacific), also known as the Pacific region, consumption mostly mainly includes of Methylene Chloride, GHB, heroin, opioids, and methamphetamines. This region (Pacific) is mainly characterized known for its geographical isolation, poverty, and ongoing conflict, which have contributed to the growth of drug production and trafficking in the area. Poverty is one of the main factors driving drug production in the Pacific region, which has led many people to turn to drug cultivation and trafficking for survival. Additionally, the region’s rugged terrain and limited infrastructure have made it difficult for the Colombian government to establish a strong presence, allowing drug traffickers to operate with relative impunity [88]. This region has a similar percentage of urban (53%) and rural (47%) populations than the Southern region and ranks second among the regions with the lowest levels of development (S3 Table).

In the Western region, also known as the Pacific region, consumption mainly includes methylene chloride, GHB, heroin, opioids, and methamphetamines. This region is mainly characterized for its geographical isolation, poverty, and ongoing conflict, which have contributed to the growth of drug production and trafficking in the area. Poverty is one of the main factors driving drug production in the Pacific region, which has led many people to turn to drug cultivation and trafficking for survival. Additionally, the region’s rugged terrain and limited infrastructure have made it difficult for the Colombian government to establish a strong presence, allowing drug traffickers to operate with relative impunity [88]. This region has a similar percentage of urban (53%) and rural (47%) populations than the Southern region, and ranks second among the regions with the lowest levels of development (S3 Table) [88]. This region has a similar percentage of urban (53%) and rural (47%) populations than the Southern region. On the other hand, this region ranks second among the regions with the lowest levels of development (S3 Table).

Conclusion

In summary, the proposed CAE-DEC model simultaneously integrates a feature extraction process within the clustering design, prioritizing features that improve the separation between groups, thus avoiding the manual extraction of features, which is a frequent process in traditional models. Additionally, a geospatial component is sequentially included to expand the resulting insights by considering geographic constraints. Currently, these types of architectures are scarce in understanding mental health problems. As part of future work, the architecture of the proposed model could be improved to integrate the automatic extraction of features while optimizing a geospatial loss. Following our experience with the proposed CAE-DEC in PAS consumption, the application of this model to other mental health problems, such as suicide, depression, and domestic violence, among other pathologies, could be explored. Based on these results, effective interventions and/or government policies to prevent and/or mitigate their impact could be promoted and evaluated, for example, by developing regional interventions based on the types of drugs most prevalent in the area and the cultural and socio-economic characteristics. This can include education, treatment, and harm reduction programs. Also, this information can be used to develop public health campaigns to raise awareness about the risks of drug use and reduce their negative impact. Furthermore, this information can be used to crack down on drug trafficking and distribution networks. On the other hand, this information can be used to alert healthcare providers and regulatory bodies to take appropriate action to prevent their use and discover new drugs.

Supporting information

S1 Table. Feature coherence measurements.

(DOCX)

S2 Table. Model comparison.

(DOCX)

S3 Table. Characteristics of the level of development, urbanity, rurality, and drug production in the regions of Colombia.

(DOCX)

S1 Fig. Location map.

Republished from [65] under a CC BY license, with permission from [ArcGIS Hub], original copyright [2016].

(DOCX)

S2 Fig. The optimal number of clusters using dendrogram.

(DOCX)

S3 Fig. Maps of Local Indicators of Spatial Association (LISA).

Republished from [65] under a CC BY license, with permission from [ArcGIS Hub], original copyright [2016].

(DOCX)

Acknowledgments

K.P. is a doctoral student at Universidad del Norte, Barranquilla, Colombia, and received a Ph.D. scholarship from this institution. Some of this work is to be presented to the Ph.D. program in partial fulfillment of the requirements for the Ph.D. degree.

Data Availability

The data used in the manuscript were obtained from a third party, the Archivo Nacional de Datos (ANDA), and are fully available and anonymized. The authors confirm that others would be able to access these data in the same manner as themselves; and the authors did not have any special access privileges that others would not have. The data can be publicly retrieved from ANDA (https://microdatos.dane.gov.co/index.php/catalog/680/data-dictionary).

Funding Statement

The author(s) received no specific funding for this work.

References

  • 1.OPS, “Abuso de sustancias,” Organización Panamericana de la Salud, 2022. https://www.paho.org/es/temas/abuso-sustancias (accessed Jan. 25, 2022).
  • 2.Heesun C., Jaesin L., and Eunmi K., “Trends of novel psychoactive substances (NPSs) and their fatal cases,” Forensic Toxicol., vol. 34, no. 1, pp. 1–11, 2016, Accessed: Jan. 20, 2022. [Online]. https://jglobal.jst.go.jp/en/detail?JGLOBAL_ID=201602283742633549 [Google Scholar]
  • 3.Riley A. L. et al., “Abuse potential and toxicity of the synthetic cathinones (i.e., ‘Bath salts’),” Neurosci. Biobehav. Rev., vol. 110, pp. 150–173, Mar. 2020, doi: 10.1016/J.NEUBIOREV.2018.07.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Assi S., Gulyamova N., Kneller P., and Osselton D., “The effects and toxicity of cathinones from the users’ perspectives: A qualitative study,” Hum. Psychopharmacol. Clin. Exp., vol. 32, no. 3, p. e2610, May 2017, doi: 10.1002/hup.2610 [DOI] [PubMed] [Google Scholar]
  • 5.Lukić V., Micić R., Arsić B., Nedović B., and Radosavljević Ž., “Overview of the major classes of new psychoactive substances, psychoactive effects, analytical determination and conformational analysis of selected illegal drugs,” Open Chem., vol. 19, no. 1, pp. 60–106, Jan. 2021 [Google Scholar]
  • 6.Uchiyama N., Matsuda S., Kawamura M., Kikura-Hanajiri R., and Goda Y., “Two new-type cannabimimetic quinolinyl carboxylates, QUPIC and QUCHIC, two new cannabimimetic carboxamide derivatives, ADB-FUBINACA and ADBICA, and five synthetic cannabinoids detected with a thiophene derivative α-PVT and an opioid receptor agonist AH-7921 identified in illegal products,” Forensic Toxicol. 2013 312, vol. 31, no. 2, pp. 223–240, Mar. 2013, doi: 10.1007/S11419-013-0182-9 [DOI] [Google Scholar]
  • 7.Grant B. F. et al. , “Prevalence and co-occurrence of substance use disorders and independent mood and anxiety disorders—Results from the national epidemiologic survey on alcohol and related conditions,” Arch. Gen. Psychiatry, vol. 61, no. 8, pp. 807–816, 2004, doi: 10.1001/archpsyc.61.8.807 [DOI] [PubMed] [Google Scholar]
  • 8.CDC, “Understanding the Epidemic,” 2020. https://www.cdc.gov/opioids/basics/epidemic.html (accessed Jan. 21, 2022).
  • 9.Goetzel R. Z., Hawkins K., Ozminkowski R. J., and Wang S. H., “The health and productivity cost burden of the ‘top 10’ physical and mental health conditions affecting six large US employers in 1999,” J. Occup. Environ. Med., vol. 45, no. 1, pp. 5–14, 2003, doi: 10.1097/00043764-200301000-00007 [DOI] [PubMed] [Google Scholar]
  • 10.Stewart W. F., Ricci J. A., Chee E., Hahn S. R., and Morganstein D., “Cost of lost productive work time among US workers with depression,” JAMA-JOURNAL Am. Med. Assoc., vol. 289, no. 23, pp. 3135–3144, Jun. 2003, doi: 10.1001/jama.289.23.3135 [DOI] [PubMed] [Google Scholar]
  • 11.Garcia F. L. G. and Murillo J. C. A., “The United Nations and 21st century security challenges in Colombia,” Rev. Cient. Gen. Jose Maria Cordova, vol. 19, no. 36, pp. 929–940, 2021, doi: 10.21830/19006586.875 [DOI] [Google Scholar]
  • 12.Aschner J. P. and Montero J. C., “Architectures, spaces, and territories of illicit drug trafficking in Colombia and Mexico:,” vol. 17, no. 3, pp. 327–351, Mar. 2020, doi: 10.1177/1741659020910212 [DOI] [Google Scholar]
  • 13.ODC, “Observatorio de drogas de Colombia,” 2022. https://www.minjusticia.gov.co/programas-co/ODC/Paginas/SIDCO-departamento-municipio.aspx (accessed Jun. 09, 2022).
  • 14.DANE, “Encuesta Nacional de Consumo de Sustancias Psicoactivas,” 2020. Accessed: Apr. 23, 2021. [Online]. https://www.dane.gov.co/files/investigaciones/boletines/encspa/comunicado-encspa-2019.pdf
  • 15.DANE, “Estudio nacional de consumo de sustancias psicoactivas en Colombia,” Bogotá, 2014. Accessed: Jan. 17, 2022. [Online]. https://www.unodc.org/documents/colombia/2014/Julio/Estudio_de_Consumo_UNODC.pdf
  • 16.UNODC, “Drogas sintéticas y nuevas sustancias psicoactivas en América Latina y el Caribe 2021,” Viena, 2021. Accessed: Jan. 21, 2022. [Online]. https://www.minjusticia.gov.co/programas-co/ODC/Documents/Publicaciones/GlobalSmartLA(1).pdf?csf=1&e=MH9EHg
  • 17.Griffiths P. and Mcketin R., “Developing a global perspective on drug consumption patterns and trends-the challenge for drug epidemiology,” Bull. Narcotics, vol. 5, no. 1, 2003. [Google Scholar]
  • 18.Lanier W. A., Johnson E. M., Rolfs R. T., Friedrichs M. D., and Grey T. C., “Risk factors for prescription opioid-related death, Utah, 2008–2009,” Pain Med., vol. 13, no. 12, pp. 1580–1589, 2012, doi: 10.1111/J.1526-4637.2012.01518.X [DOI] [PubMed] [Google Scholar]
  • 19.Martins S. S., Sampson L., Cerdá M., and Galea S., “Worldwide Prevalence and Trends in Unintentional Drug Overdose: A Systematic Review of the Literature,” Am. J. Public Health, vol. 105, no. 11, pp. e29–e49, Nov. 2015, doi: 10.2105/AJPH.2015.302843 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.CDC, “Today’s Heroin Epidemic,” Centers for Disease Control and Prevention, 2015. https://www.cdc.gov/vitalsigns/heroin/index.html (accessed Jan. 16, 2022).
  • 21.Fuller C. M. et al., “Effects of race, neighborhood, and social network on age at initiation of injection drug use,” Am. J. Public Health, vol. 95, no. 4, pp. 689–695, Apr. 2005, doi: 10.2105/AJPH.2003.02178 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Fite P. J., Wynn P., Lochman J. E., and Wells K. C., “The Influence of Neighborhood Disadvantage and Perceived Disapproval on Early Substance Use Initiation,” Addict. Behav., vol. 34, no. 9, p. 769, Sep. 2009, doi: 10.1016/j.addbeh.2009.05.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Friedman S. R. et al., “Income inequality, drug-related arrests, and the health of people who inject drugs: Reflections on seventeen years of research,” Int. J. Drug Policy, vol. 32, pp. 11–16, Jun. 2016, doi: 10.1016/j.drugpo.2016.03.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Jensen M., Chassin L., and Gonzales N. A., “Neighborhood Moderation of Sensation Seeking Effects on Adolescent Substance Use Initiation,” J. Youth Adolesc., vol. 46, no. 9, p. 1953, Sep. 2017, doi: 10.1007/s10964-017-0647-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Sarah C. and Leonard A J., “Contextual Perspectives on Heroin Addiction and Recovery: Classic and Contemporary Theories,” Int. Arch. Public Heal. Community Med., vol. 2, no. 1, Dec. 2018, doi: 10.23937/IAPHCM-2017/1710009 [DOI] [Google Scholar]
  • 26.Bozorgi P., Porter D. E., Eberth J. M., Eidson J. P., and Karami A., “The leading neighborhood-level predictors of drug overdose: A mixed machine learning and spatial approach,” Drug Alcohol Depend., vol. 229, p. 109143, Dec. 2021, doi: 10.1016/j.drugalcdep.2021.109143 [DOI] [PubMed] [Google Scholar]
  • 27.Galea S., Rudenstine S., and Vlahov D., “Drug use, misuse, and the urban environment,” Drug Alcohol Rev., vol. 24, no. 2, pp. 127–136, Mar. 2005, doi: 10.1080/09595230500102509 [DOI] [PubMed] [Google Scholar]
  • 28.Latkin C. A., Forman V., Knowlton A., and Sherman S., “Norms, social networks, and HIV-related risk behaviors among urban disadvantaged drug users,” Soc. Sci. Med., vol. 56, no. 3, pp. 465–476, Feb. 2003, doi: 10.1016/s0277-9536(02)00047-3 [DOI] [PubMed] [Google Scholar]
  • 29.Schroeder J. R., Latkin C. A., Hoover D. R., Curry A. D., Knowlton A. R., and Celentano D. D., “Illicit drug use in one’s social network and in one’s neighborhood predicts individual heroin and cocaine use,” Ann. Epidemiol., vol. 11, no. 6, pp. 389–394, 2001, doi: 10.1016/s1047-2797(01)00225-3 [DOI] [PubMed] [Google Scholar]
  • 30.Campo-Arias A., Suárez-Colorado Y. P., and Caballero- Domínguez C. C., “Asociación entre el consumo de Cannabis y el riesgo de suicidio en adolescentes escolarizados de Santa Marta, Colombia,” Biomédica, vol. 40, no. 3, p. 569, 2020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Fajardo A. L., “Consumption of psychopharmaceuticals in the city of Bogota (Colombia): a new reality,” Arch. Med., vol. 18, no. 2, 2018, doi: 10.30554/archmed.18.2.2743.2018 [DOI] [Google Scholar]
  • 32.Scoppetta O. and Castaño G. A., “Early drug consumption and subsequent risk of illicit drug use in Colombia,” Addict. Disord. their Treat., vol. 18, no. 1, pp. 10–14, Mar. 2019, doi: 10.1097/ADT.0000000000000144 [DOI] [Google Scholar]
  • 33.Scheuer C. et al., “El consumo de sustancias psicoactivas en jóvenes estudiantes de una institución educativa del municipio de Neira (Caldas): un estudio de caso desde la mirada de la educación inclusiva,” Cult. y Drog., vol. 23, no. 26, pp. 343–354, Jul. 2018. [Google Scholar]
  • 34.Kalyanam J., Katsuki T., Lanckriet G. R.G., and Mackey T. K., “Exploring trends of nonmedical use of prescription drugs and polydrug abuse in the Twittersphere using unsupervised machine learning,” Addict. Behav., vol. 65, pp. 289–295, Feb. 2017, doi: 10.1016/j.addbeh.2016.08.019 [DOI] [PubMed] [Google Scholar]
  • 35.Narvaez-Chicaiza M. A., “Harm Reduction Policies Where Drugs Constitute a Security Issue,” Heal. Care Anal., vol. 28, no. 4, pp. 382–390, Dec. 2020, doi: 10.1007/s10728-020-00415-9 [DOI] [PubMed] [Google Scholar]
  • 36.Restrepo-Escobar S. M. and Cardona E. A. S., “Educational and prevention campaigns. A review on the use of psychoactive substances in Colombian university students,” Interdisciplinaria, vol. 38, no. 2, pp. 199–208, 2021, doi: 10.16888/INTERD.2021.38.2.13 [DOI] [Google Scholar]
  • 37.Hancer E., Xue B., and Zhang M., “A survey on feature selection approaches for clustering,” Artif. Intell. Rev. 2020 536, vol. 53, no. 6, pp. 4519–4545, Jan. 2020, doi: 10.1007/S10462-019-09800-W [DOI] [Google Scholar]
  • 38.Wager T. D., Atlas L. Y., Lindquist M. A., Roy M., Woo C.-W., and Kross E., “An fMRI-Based Neurologic Signature of Physical Pain,” N. Engl. J. Med., vol. 368, no. 15, pp. 1388–1397, 2013, doi: 10.1056/NEJMoa1204471 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Henriksson A., Kvist M., Dalianis H., and Duneld M., “Identifying adverse drug event information in clinical notes with distributional semantic representations of context,” J. Biomed. Inform., vol. 57, pp. 333–349, Oct. 2015, doi: 10.1016/j.jbi.2015.08.013 [DOI] [PubMed] [Google Scholar]
  • 40.Squeglia L. M. et al. , “Neural Predictors of Initiating Alcohol Use During Adolescence,” Am. J. Psychiatry, vol. 174, no. 2, pp. 172–185, Feb. 2017, doi: 10.1176/appi.ajp.2016.15121587 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Conway M. and O’Connor D., “Social media, big data, and mental health: current advances and ethical implications,” Curr. Opin. Psychol., vol. 9, pp. 77–82, Jun. 2016, doi: 10.1016/j.copsyc.2016.01.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Katsuki T., Mackey T. K., and Cuomo R., “Establishing a Link Between Prescription Drug Abuse and Illicit Online Pharmacies: Analysis of Twitter Data,” J. Med. INTERNET Res., vol. 17, no. 12, 2015, doi: 10.2196/jmir.5144 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Degenhardt L. et al., “The global epidemiology and burden of psychostimulant dependence: Findings from the Global Burden of Disease Study 2010,” Drug Alcohol Depend., vol. 137, pp. 36–47, 2014, doi: 10.1016/j.drugalcdep.2013.12.025 [DOI] [PubMed] [Google Scholar]
  • 44.Whiteford H. A. et al. , “Global burden of disease attributable to mental and substance use disorders: findings from the Global Burden of Disease Study 2010,” Lancet, vol. 382, no. 9904, pp. 1575–1586, Nov. 2013, doi: 10.1016/S0140-6736(13)61611-6 [DOI] [PubMed] [Google Scholar]
  • 45.Bowman F. D., Caffo B., Bassett S. S., and Kilts C., “A Bayesian hierarchical framework for spatial modeling of fMRI data,” Neuroimage, vol. 39, no. 1, pp. 146–156, 2008, doi: 10.1016/j.neuroimage.2007.08.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Shannon K., Rusch M., Shoveller J., Alexson D., Gibson K., and Tyndall M. W., “Mapping violence and policing as an environmental-structural barrier to health service and syringe availability among substance-using women in street-level sex work,” Int. J. DRUG POLICY, vol. 19, no. 2, pp. 140–147, 2008, doi: 10.1016/j.drugpo.2007.11.024 [DOI] [PubMed] [Google Scholar]
  • 47.Freisthler B., Needell B., and Gruenewald P. J., “Is the physical availability of alcohol and illicit drugs related to neighborhood rates of child maltreatment?,” Child Abuse Negl., vol. 29, no. 9, pp. 1049–1060, Sep. 2005, doi: 10.1016/j.chiabu.2004.12.014 [DOI] [PubMed] [Google Scholar]
  • 48.Bass J. K. and Lambert S. F., “Urban adolescents’ perceptions of their neighborhoods: An examination of spatial dependence,” J. Community Psychol., vol. 32, no. 3, pp. 277–293, May 2004, doi: 10.1002/jcop.20005 [DOI] [Google Scholar]
  • 49.Chaix B. et al. , “Spatial clustering of mental disorders and associated characteristics of the neighbourhood context in Malmo, Sweden, in 2001,” J. Epidemiol. Community Health, vol. 60, no. 5, pp. 427–435, May 2006, doi: 10.1136/jech.2005.040360 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Mowbray C. T., Holter M. C., Teague G. B., and Bybee D., “Fidelity criteria: Development, measurement, and validation,” Am. J. Eval., vol. 24, no. 3, pp. 315–340, 2003, doi: 10.1177/109821400302400303 [DOI] [Google Scholar]
  • 51.Peet M. and Stokes C., “Omega-3 fatty acids in the treatment of psychiatric disorders,” Drugs, vol. 65, no. 8, pp. 1051–1059, 2005, doi: 10.2165/00003495-200565080-00002 [DOI] [PubMed] [Google Scholar]
  • 52.Chichester K. et al. , “Pharmacies and features of the built environment associated with opioid overdose: A geospatial comparison of rural and urban regions in Alabama, USA,” Int. J. Drug Policy, vol. 79, May 2020, doi: 10.1016/j.drugpo.2020.102736 [DOI] [PubMed] [Google Scholar]
  • 53.Geissert P. et al. , “High-risk prescribing and opioid overdose: prospects for prescription drug monitoring program-based proactive alerts,” Pain, vol. 159, no. 1, pp. 150–156, Jan. 2018, doi: 10.1097/j.pain.0000000000001078 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Fraley C. and Raftery A. E., “How many clusters? Which clustering method? Answers via model-based cluster analysis,” Comput. J., vol. 41, no. 8, pp. 586–588, 1998, doi: 10.1093/COMJNL/41.8.578 [DOI] [Google Scholar]
  • 55.Saxena A. et al. , “A review of clustering techniques and developments,” Neurocomputing, vol. 267, pp. 664–681, Dec. 2017, doi: 10.1016/J.NEUCOM.2017.06.053 [DOI] [Google Scholar]
  • 56.Li B., Pi D., Lin Y., and Cui L., “DNC: A Deep Neural Network-based Clustering-oriented Network Embedding Algorithm,” J. Netw. Comput. Appl., vol. 173, Jan. 2021, doi: 10.1016/J.JNCA.2020.102854 [DOI] [Google Scholar]
  • 57.J. Xie, R. Girshick, and A. Farhadi, “Unsupervised Deep Embedding for Clustering Analysis,” 33rd Int. Conf. Mach. Learn. ICML 2016, vol. 1, pp. 740–749, Nov. 2016, Accessed: Jan. 15, 2022. [Online]. https://arxiv.org/abs/1511.06335v2
  • 58.S. Sharifipour, H. Fayyazi, and M. Sabokro, “Unsupervised Feature Selection using Encoder-Decoder Networks,” 6th Iran. Conf. Signal Process. Intell. Syst. ICSPIS 2020, Dec. 2020.
  • 59.DANE, “Departamento Administrativo Nacional de Estadística. Censo Nacional de Población y Vivienda 2018. Proyecciones de Población 2018–2020, total municipal por área Junio 30.” Bogotá D.C, Colombia, 2018.
  • 60.DNP, “Avances y complementariedades estratégicas de los Distritos en el marco de los esquemas asociativos territoriales,” Bogotá D.C, 2018. [Online]. https://colaboracion.dnp.gov.co/CDT/DesarrolloTerritorial/ConversatorioDistritoCali04_10_2018-SantiagoArroyo.pdf
  • 61.UNODC, “Monitoreo de territorios afectados por cultivos ilícitos 2020,” Bogotá, 2021. Accessed: Jan. 14, 2022. [Online]. https://www.unodc.org/documents/crop-monitoring/Colombia/Colombia_Monitoreo_de_territorios_afectados_por_cultivos_ilicitos_2020.pdf
  • 62.ODC, “Estudio nacional de consumo de sustancias psicoactivas,” Bogotá, 2019. Accessed: Jan. 14, 2022. [Online]. https://www.odc.gov.co/Portals/1/publicaciones/pdf/estudioNacionaldeconsumo2019.pdf
  • 63.DANE, “Encuesta Nacional de Consumo de Sustancias Psicoactivas en Población General 2019,” 2020. https://microdatos.dane.gov.co/index.php/catalog/680/get_microdata (accessed Jan. 14, 2022).
  • 64.J. Espinosa, “Shapefile,” 2022. https://hub.arcgis.com/datasets/de0e829ddbf743c895ba6dcee1b74fae/about (accessed Jun. 09, 2022).
  • 65.Anselin L., Spatial Econometrics: Methods and Models, Springer; Netherlands. 1988. [Google Scholar]
  • 66.Moran P., The Interpretation of Statistical Maps, 2nd ed., vol. 10. Journal of the Royal Statistical Society, 1948. Accessed: Jan. 23, 2022. [Online]. https://www.jstor.org/stable/2983777 [Google Scholar]
  • 67.Geary R. C., “The Contiguity Ratio and Statistical Mapping,” Inc. Stat., vol. 5, no. 3, p. 115, Nov. 1954, doi: 10.2307/2986645 [DOI] [Google Scholar]
  • 68.Getis A. and Ord J. K., “The Analysis of Spatial Association by Use of Distance Statistics,” Geogr. Anal., vol. 24, no. 3, pp. 189–206, Jul. 1992, doi: 10.1111/J.1538-4632.1992.TB00261.X [DOI] [Google Scholar]
  • 69.Anselin L., “Local Indicators of Spatial Association—LISA,” Geogr. Anal., vol. 27, no. 2, pp. 93–115, Apr. 1995, doi: 10.1111/J.1538-4632.1995.TB00338.X [DOI] [Google Scholar]
  • 70.Duque J. C., Ramos R., and Suriñach J., “Supervised Regionalization Methods: A Survey:,” vol. 30, no. 3, pp. 195–220, Jul. 2016, doi: 10.1177/0160017607301605 [DOI] [Google Scholar]
  • 71.S. Rey, D. Arribas-Bel, and L. Wolf, Geographic Data Science with Python. 2020. Accessed: Jan. 23, 2022. [Online]. https://geographicdata.science/book/intro.html
  • 72.Van Rossum G. and Drake F. L., Python 3 Reference Manual. Scotts Valley, CA: CreateSpace, 2009. [Google Scholar]
  • 73.J. Masci, U. Meier, D. Cireşan, and J. Schmidhuber, “Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 6791 LNCS, no. PART 1, pp. 52–59, 2011.
  • 74.Shrestha A. and Mahmood A., “Review of deep learning algorithms and architectures,” IEEE Access, vol. 7, pp. 53040–53065, 2019, doi: 10.1109/ACCESS.2019.2912200 [DOI] [Google Scholar]
  • 75.Caliñski T. and Harabasz J., “A Dendrite Method Foe Cluster Analysis,” Commun. Stat., vol. 3, no. 1, pp. 1–27, 1974, doi: 10.1080/03610927408827101 [DOI] [Google Scholar]
  • 76.Davies D. L. and Bouldin D. W., “A Cluster Separation Measure,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-1, no. 2, pp. 224–227, 1979, doi: 10.1109/TPAMI.1979.4766909 [DOI] [PubMed] [Google Scholar]
  • 77.Rousseeuw P. J., “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis,” J. Comput. Appl. Math., vol. 20, no. C, pp. 53–65, Nov. 1987, doi: 10.1016/0377-0427(87)90125-7 [DOI] [Google Scholar]
  • 78.ODC, “Density of drug production in Colombia,” 2021. https://www.datos.gov.co/d/acs4-3wgp/visualization (accessed Jul. 05, 2022).
  • 79.Clarke H., Soneji N., Ko D. T., Yun L., and Wijeysundera D. N., “Rates and risk factors for prolonged opioid use after major surgery: population based cohort study,” BMJ, vol. 348, Feb. 2014, doi: 10.1136/bmj.g1251 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Kuo Y. F., Raji M. A., Chen N. W., Hasan H., and Goodwin J. S., “Trends in Opioid Prescriptions Among Part D Medicare Recipients From 2007 to 2012,” Am. J. Med., vol. 129, no. 2, pp. 221.e21–221.e30, Feb. 2016, doi: 10.1016/j.amjmed.2015.10.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Puigcorbé S. et al., “Assessing the association between tourism and the alcohol urban environment in Barcelona: a cross-sectional study,” BMJ Open, vol. 10, no. 9, p. e037569, Sep. 2020, doi: 10.1136/bmjopen-2020-037569 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Easwaran M., Bazroy J., Jayaseelan V., and Singh Z., “Prevalence and determinants of alcohol consumption among adult men in a coastal area of south India,” Int. J. Med. Sci. Public Heal., vol. 4, no. 3, p. 360, 2015, doi: 10.5455/IJMSPH.2015.1010201479 [DOI] [Google Scholar]
  • 83.Chinnakali P., Thekkur P., Manoj Kumar A., Ramaswamy G., Bharadwaj B., and Roy G., “Alarmingly high level of alcohol use among fishermen: A community based survey from a coastal area of south India,” J. Forensic Leg. Med., vol. 42, pp. 41–44, Aug. 2016, doi: 10.1016/j.jflm.2016.05.006 [DOI] [PubMed] [Google Scholar]
  • 84.DANE, “Producto Interno Bruto por departamento,” 2021.
  • 85.García M. C. et al. , “Opioid Prescribing Rates in Nonmetropolitan and Metropolitan Counties Among Primary Care Providers Using an Electronic Health Record System—United States, 2014–2017,” MMWR. Morb. Mortal. Wkly. Rep., vol. 68, no. 2, pp. 25–30, Jan. 2019, doi: 10.15585/mmwr.mm6802a1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Keyes K. M., Cerdá M., Brady J. E., Havens J. R., and Galea S., “Understanding the Rural–Urban Differences in Nonmedical Prescription Opioid Use and Abuse in the United States,” Am. J. Public Health, vol. 104, no. 2, p. e52, Feb. 2014, doi: 10.2105/AJPH.2013.301709 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.King N. B., Fraser V., Boikos C., Richardson R., and Harper S., “Determinants of Increased Opioid-Related Mortality in the United States and Canada, 1990–2013: A Systematic Review,” Am. J. Public Health, vol. 104, no. 8, p. e32, 2014, doi: 10.2105/AJPH.2014.301966 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.UNODC, “Persistencia de los cultivos de coca en la Región Pacífica,” 2010.

Decision Letter 0

Vinícius Silva Belo

16 Feb 2023

PONE-D-22-28262Leading Consumption Patterns of Psychoactive Substances in Colombia: A Deep Neural Network-based Clustering-oriented Embedding ApproachPLOS ONE

Dear Dr. Palomino,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

 Also take note of the comments in the attached file.

Please submit your revised manuscript by Apr 02 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Vinícius Silva Belo

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please confirm that all data sources you used were publicly available and anonymized. Furthermore, please clarify how the data were accessed for the purpose of this study.

3. In the ethics statement in the manuscript and in the online submission form, please provide additional information about the patient records/samples used in your retrospective study. Specifically, please ensure that you have discussed whether all data/samples were fully anonymized before you accessed them and/or whether the IRB or ethics committee waived the requirement for informed consent. If patients provided informed written consent to have data/samples from their medical records used in research, please include this information.

3. Thank you for stating the following financial disclosure: 

"The author(s) received no specific funding for this work."

At this time, please address the following queries:

a) Please clarify the sources of funding (financial or material support) for your study. List the grants or organizations that supported your study, including funding received from your institution. 

b) State what role the funders took in the study. If the funders had no role in your study, please state: “The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.”

c) If any authors received a salary from any of your funders, please state which authors and which funders.

d) If you did not receive any funding for this study, please state: “The authors received no specific funding for this work.”

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

4. Thank you for stating the following in the Acknowledgments Section of your manuscript: 

"K.P. is a doctoral student at Universidad del Norte, Barranquilla, Colombia, and received a Ph.D. scholarship from this institution. Some of this work is to be presented to the Ph.D. program in partial fulfillment of the requirements for the Ph.D. degree. The sponsor of the study has no role in study design; in the collection, analysis, and interpretation of data; in the writing of the report; and in the decision to submit the article for publication."

We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. 

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: 

"The author(s) received no specific funding for this work."

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

5. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.

Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.

Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

6. We note that Figures 1, 5 9 and S2 in your submission contain map images which may be copyrighted. All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For these reasons, we cannot publish previously copyrighted maps or satellite images created using proprietary data, such as Google software (Google Maps, Street View, and Earth). For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright.

We require you to either (1) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (2) remove the figures from your submission:

a. You may seek permission from the original copyright holder of Figures 1, 5, 9 and S2 to publish the content specifically under the CC BY 4.0 license.  

We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text:

“I request permission for the open-access journal PLOS ONE to publish XXX under the Creative Commons Attribution License (CCAL) CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). Please be aware that this license allows unrestricted use and distribution, even commercially, by third parties. Please reply and provide explicit written permission to publish XXX under a CC BY license and complete the attached form.”

Please upload the completed Content Permission Form or other proof of granted permissions as an ""Other"" file with your submission.

In the figure caption of the copyrighted figure, please include the following text: “Reprinted from [ref] under a CC BY license, with permission from [name of publisher], original copyright [original copyright year].”

b. If you are unable to obtain permission from the original copyright holder to publish these figures under the CC BY 4.0 license or if the copyright holder’s requirements are incompatible with the CC BY 4.0 license, please either i) remove the figure or ii) supply a replacement figure that complies with the CC BY 4.0 license. Please check copyright information on all replacement figures and update the figure caption with source information. If applicable, please specify in the figure caption text when a figure is similar but not identical to the original image and is therefore for illustrative purposes only.

The following resources for replacing copyrighted map figures may be helpful:

USGS National Map Viewer (public domain): http://viewer.nationalmap.gov/viewer/

The Gateway to Astronaut Photography of Earth (public domain): http://eol.jsc.nasa.gov/sseop/clickmap/

Maps at the CIA (public domain): https://www.cia.gov/library/publications/the-world-factbook/index.html and https://www.cia.gov/library/publications/cia-maps-publications/index.html

NASA Earth Observatory (public domain): http://earthobservatory.nasa.gov/

Landsat: http://landsat.visibleearth.nasa.gov/

USGS EROS (Earth Resources Observatory and Science (EROS) Center) (public domain): http://eros.usgs.gov/#

Natural Earth (public domain): http://www.naturalearthdata.com/

7. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information. 

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: N/A

Reviewer #2: N/A

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: No

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The manuscript was well constructed. However, the methodology seems very technical, with no explanation of what data were used, how and where the analyzes were carried out. I suggest making the methodology clearer, for example in CAE not only citing authors who used the technique, but making it clear why it was chosen and not another deep learning technique. In the results, improve the identification of tables and figures. Include subtitles to make files more understandable. In the discussion, explore further the impact of the results and assess the environmental issue of these sites. There was a stratification of regions and consumption profile, but there is no information about the location. The area is urban or rural, what level of development. Perhaps these and other local (environmental) aspects may be related and/or favor the consumption of these substances.

Reviewer #2: 1-abstract should rewrite and included technical approach more.

2-page 5 "Feature selection approaches for clustering can be split into filter, wrapper, embedded,"

is it this methods just for clustering !!?

3-in introduction please give problem, challenges clearly.

4-introduction is written separately.

5-in introduction did not cite many reverences together" IDEC (Guo et al., 2017), DEPICT (Dizaji et al., 2017), DBC (F.

14 Li et al., 2017), DualAAE (Ge et al., 2020), VAED (Lim et al., 2020), and DNC (B. Li et al., 2021) "

6- what is difference between convolutional auto-encoder (CAE) and stacked space auto encoder?

7- did not compare other previous works.

8- your problem is classification or clustring?

9-please give your performance metrics

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: PONE-D-22-28262_reviewer.pdf

PLoS One. 2023 Aug 18;18(8):e0290098. doi: 10.1371/journal.pone.0290098.r002

Author response to Decision Letter 0


7 Jun 2023

Dr. Vinicius Silva Belo

Academic Editor, PLOS ONE

RE: Revised Manuscript # PONE-D-22-28262

Dear Dr. Silva,

Thank you for the opportunity to submit a revised version of our manuscript, "Leading Consumption Patterns of Psychoactive Substances in Colombia: A Deep Neural Network-based Clustering-oriented Embedding Approach,” which now includes responses to the concerns, inquiries, comments, and suggestions raised by two anonymous reviewers. Please find below our response (in blue).

We very much appreciate all your efforts as Editor-in-Chief and the detailed and extraordinary revision this manuscript had. We enjoyed the thorough review of our manuscript, and it was a pleasure to respond to the reviewers. Please, Dr. Silva, allows us to mention that this exercise of the fair and exigent peer-review process is disappearing, and only good journals like PLOS ONE keep it. We hope that our new version might be suitable for publication in PLOS ONE.

Thank you very much for your time and consideration.

Yours sincerely,

Kevin R. Palomino, PhD(c)

Corresponding author

Comments from the Editor

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

We very much appreciate your comments. We have updated our manuscript according to the journal’s formatting. In particular,

We changed the figure citation, i.e., from “Figure” to “Fig”.

We added the Numbered Equation.

We added the supporting information captions.

We changed the Cite references into brackets.

2. Please confirm that all data sources you used were publicly available and anonymized. Furthermore, please clarify how the data were accessed for the purpose of this study.

Thank you very much for your comments. The data is fully available and anonymized, and can be retrieved from https://microdatos.dane.gov.co/index.php/catalog/680/data-dictionary

For practicality, we downloaded the data and stored it on the cloud. The data can be retrieved from https://acortar.link/ktgCD9

3. In the ethics statement in the manuscript and in the online submission form, please provide additional information about the patient records/samples used in your retrospective study. Specifically, please ensure that you have discussed whether all data/samples were fully anonymized before you accessed them and/or whether the IRB or ethics committee waived the requirement for informed consent. If patients provided informed written consent to have data/samples from their medical records used in research, please include this information.

Thank you for your comment. As per Colombian regulations, the data were fully anonymized by the Colombian government before we could use them.

4. Thank you for stating the following financial disclosure:

"The author(s) received no specific funding for this work."

At this time, please address the following queries:

a) Please clarify the sources of funding (financial or material support) for your study. List the grants or organizations that supported your study, including funding received from your institution.

K.P. is a doctoral student at Universidad del Norte, Barranquilla, Colombia, and received a Ph.D. scholarship from this institution to cover tuition expenses. However, Universidad del Norte did not provide any additional funds for this study.

b) State what role the funders took in the study. If the funders had no role in your study, please state: “The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.”

Thank you for your comment. The following statement was included in the revised version of the manuscript to address this:

“The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript”.

c) If any authors received a salary from any of your funders, please state which authors and which funders.

As stated previously, we did not receive any salary from organizations.

d) If you did not receive any funding for this study, please state: “The authors received no specific funding for this work.”

Done. We have added the following statement in the revised version of the manuscript:

“The authors received no specific funding for this work”.

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

4. Thank you for stating the following in the Acknowledgments Section of your manuscript:

"K.P. is a doctoral student at Universidad del Norte, Barranquilla, Colombia, and received a Ph.D. scholarship from this institution. Some of this work is to be presented to the Ph.D. program in partial fulfilment of the requirements for the Ph.D. degree. The sponsor of the study has no role in study design; in the collection, analysis, and interpretation of data; in the writing of the report; and in the decision to submit the article for publication."

We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

"The author(s) received no specific funding for this work."

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

We appreciate your correction. As per your request, we have removed any funding-related text from the manuscript and clarified that Universidad del Norte provided only tuition expenses for our PhD student.

5. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.

Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.

Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

Thank you!

6. We note that Figures 1, 5 9 and S2 in your submission contain map images which may be copyrighted. All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For these reasons, we cannot publish previously copyrighted maps or satellite images created using proprietary data, such as Google software (Google Maps, Street View, and Earth). For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright.

We require you to either (1) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (2) remove the figures from your submission:

a. You may seek permission from the original copyright holder of Figures 1, 5, 9 and S2 to publish the content specifically under the CC BY 4.0 license.

We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text:

“I request permission for the open-access journal PLOS ONE to publish XXX under the Creative Commons Attribution License (CCAL) CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). Please be aware that this license allows unrestricted use and distribution, even commercially, by third parties. Please reply and provide explicit written permission to publish XXX under a CC BY license and complete the attached form.”

Please upload the completed Content Permission Form or other proof of granted permissions as an ""Other"" file with your submission.

In the figure caption of the copyrighted figure, please include the following text: “Reprinted from [ref] under a CC BY license, with permission from [name of publisher], original copyright [original copyright year].”

We are very appreciative of your comments.

Figures containing maps were created using the plot function from GeoPandas (https://geopandas.org/en/stable/), which is an open-source library in Python. This library has a permissive license similar to the BSD 2-Clause License, and it has permissions for distribution, private use, modification, and commercial use (see https://github.com/geopandas/geopandas/blob/main/LICENSE.txt).

On the other hand, when creating all maps, we used a shape file that contains the location’s geometry. This shapefile was retrieved from https://hub.arcgis.com/datasets/de0e829ddbf743c895ba6dcee1b74fae/about. According to the author, this file can be freely accessible and used, which indicates that neither a license nor permission is needed to use it (i.e., https://hub.arcgis.com/datasets/de0e829ddbf743c895ba6dcee1b74fae/about).

b. If you are unable to obtain permission from the original copyright holder to publish these figures under the CC BY 4.0 license or if the copyright holder’s requirements are incompatible with the CC BY 4.0 license, please either i) remove the figure or ii) supply a replacement figure that complies with the CC BY 4.0 license. Please check copyright information on all replacement figures and update the figure caption with source information. If applicable, please specify in the figure caption text when a figure is similar but not identical to the original image and is therefore for illustrative purposes only.

The following resources for replacing copyrighted map figures may be helpful:

USGS National Map Viewer (public domain): http://viewer.nationalmap.gov/viewer/

The Gateway to Astronaut Photography of Earth (public domain): http://eol.jsc.nasa.gov/sseop/clickmap/

Maps at the CIA (public domain): https://www.cia.gov/library/publications/the-world-factbook/index.html and https://www.cia.gov/library/publications/cia-maps-publications/index.html

NASA Earth Observatory (public domain): http://earthobservatory.nasa.gov/

Landsat: http://landsat.visibleearth.nasa.gov/

USGS EROS (Earth Resources Observatory and Science (EROS) Center) (public domain): http://eros.usgs.gov/#

Natural Earth (public domain): http://www.naturalearthdata.com/

Kindly see our response to Comment #6. As mentioned above, a license to use the maps is not required.

7. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.

We very much appreciate your supporting comments. We have updated our manuscript according to the journal’s formatting.

We added the Supporting Information captions in the manuscript after references.

We separated the supplementary materials into four (4) files: S1_Table, S2_Table2, S3_Fig, and S4_Fig.

We changed the Supporting Information citation on the manuscript i.e. “Table 1S, supplementary material” to “S1 Table”.

Comments from Reviewer #1

The manuscript was well constructed. However, the methodology seems very technical, with no explanation of what data were used, how and where the analyzes were carried out. I suggest making the methodology clearer, for example in CAE not only citing authors who used the technique but making it clear why it was chosen and not another deep learning technique.

We very much appreciate the reviewer’s comments.

We used data from the Colombian National Survey on Psychoactive Substance Consumption. Data were extracted from https://microdatos.dane.gov.co/index.php/catalog/680/data-dictionary. Analyses were performed using Python’s libraries GeoPandas (https://geopandas.org/en/stable/), TensorFlow (https://www.tensorflow.org/), PyTorch (https://pytorch.org/) and scikit-learn (https://scikit-learn.org/stable/), among others. All notebooks and code written for processing and analyzing the data are available from first author under reasonable request.

Following the reviewer’s suggestion, we have modified the methodology in the revised version of the manuscript. We added a subtitle in the methodology denominated “Framework evaluation.”

Now, the text reads:

“2.8 Framework Evaluation.

We trained the CAE-DEC method using data retrieved from the National Survey of Psychoactive Substance Consumption (DANE-DIMPE-ENCSPA-2019), which contains 49,600 observations. A second database with PAS production figures, was used in the spatial analysis stage to correlate the PSA consumption and production. In order to evaluate the framework, we compared our CAE-DEC approach with other approaches, including CAE, and Principal Component Analysis integrated with clustering (PCA-DEC). For evaluation and comparison purposes, we use the Calinski-Harabasz [76], Davies-Bouldin [77], and Silhouette [78] index as intrinsic clustering metrics. In addition, we used the χ^2 statistic to investigate potential associations and differences among the patterns (clusters) identified using our approach.”

On the other hand, we rewrote a paragraph in the methodology to make clear why the model was chosen. Now, the text reads:

“Fig 1 presents the proposed Convolutional Auto-Encoder- Deep Embedded Clustering (CAE-DEC) framework based on the implementation presented by Xie et al. [57]. However, unlike the Xie et al. model, our structure is developed by applying convolutional layers for the deep autoencoder (DA) architecture instead of a linear one to represent high-order interactions in the data. In addition, a spectral clustering-based centroid estimation is proposed to achieve an improved initial centroid calculation. We chose the CAE-DEC framework based on its ability to reduce both the number of model parameters and the dimensionality, while creating clusters simultaneously.”

In the results, improve the identification of tables and figures. Include subtitles to make files more understandable.

Thank you for your comment. As suggested and following the Journal’s formatting for Figures and Table file naming, we have improved the tables and figures captions in the revised version of the manuscript (i.e., from “Palomino-fig-1” to “Fig1”).

In the Discussion, explore further the impact of the results and assess the environmental issue of these sites. There was a stratification of regions and consumption profile, but there is no information about the location. The area is urban or rural, what level of development. Perhaps these and other local (environmental) aspects may be related and/or favour the consumption of these substances.

Thank you for your suggestion. We improved the discussion section accordingly in the revised version of the manuscript. The relevant text now reads:

“In this study, we propose and test a Deep Neural Network-based Clustering-oriented Embedding algorithm (i.e., a ML-based model) for identifying psychoactive substance (PAS) use and abuse patterns in Colombia. This model allows the automatic extraction of features from the input data (such as sex, age, socioeconomic status, and housing type) to determine whether an individual has consumed PAS. It then creates clusters in the new data space generated during the learning process, following the methods outlined in [57, 59]. After the training process, a latent feature space (LFS) is generated and the results are subsequently analysed. We have identified clearly marked clusters where the prevalence of individuals who use or do not use PAS is notable. Additionally, we found that region, sex, housing type, socioeconomic strata, age, and whether individuals contribute to household finances have a statistically significant impact on the clustering structure. These findings are consistent with previous studies aimed at identifying PAS consumption patterns [19, 80, 81]. Interestingly, when comparing the CAE-DEC model proposed in this study and the CAE-Spectral model using different metrics (i.e., Silhouette statistic, which measures the internal density of each cluster and the distance that separates them from each other, the Calinski-Harabasz index and the Davies-Bouldin index [DBI]), we found that our model performs better (Silhouette: 0.62 vs. 0.786; Calinski-Harabasz: 22468.26 vs. 775992.45; DBI: 0.2898 vs. 0.63; S2 Table, Supplementary Material).

Based on our findings, individuals more likely to consume PAS are grouped in cluster 2, while cluster 1 consisted of individuals who did not consume PAS (Table 1). Not surprisingly, a significant proportion of females characterizes cluster 1. In addition, most individuals belong to socioeconomic strata 1, are 40 years old or older, and do not contribute economically to support their household. In contrast, cluster 2 is characterized by a higher proportion of males aged between 20 and 40 in socioeconomical strata 1 and 2, who do not contribute to the household finances (Table 1). Finally, cluster 0 is characterized by a small proportion of males, a higher proportion of individuals in strata 3, 4, 5, and 6, and individuals are more likely to contribute to the household economy (Table 1).

At the level of spatial statistics, we identified that legal drugs such as alcohol have a high prevalence in all regions of Colombia, with a slight tendency to more consumption in coastal areas (Fig 7). In our country, the coastal areas are often popular tourist destinations, and many tourists come to these areas looking for a relaxing experience, which can increase alcohol consumption. Coastal areas typically have warmer temperatures and more sunshine, increasing thirst and making people more likely to consume alcohol. Additionally, bars, clubs, and restaurants serve alcoholic beverage due to the high demand from tourists and locals [82, 83]. Another characteristic of this area is the fishing and maritime culture. This culture is often associated with hard work and long working hours, and alcohol may be seen as a way to relax and unwind after a tough day at the sea [84]. Finally, this region has 68% urban and 32% rural zones [60]. The level of development, as measured by gross domestic product (GDP), is the third region with significant economic development in the country [85] (S3 Table, Supplementary Material). Interestingly, the consumption of illegal drugs is lower in the Northern region than in other regions of the country. However, there is a more representative consumption of non-prescription tranquilizers, opioids, ketamine, GHB, and heroin. In particular, the Atlántico department has the highest consumption proportion within this region (Fig 8).

Tobacco consumption is present in all regions, with a higher proportion in the Central region, where climate conditions resemble temperate weather. Also, this region has a diverse consumption pattern, where drugs such as marijuana, popper, cocaine, ecstasy, inhalants, methadone, heroin, LSD, GHB, 2CB, and mushrooms prevail. This region has Colombia’s largest cities (i.e., Bogotá and Medellin); Bogotá has the highest population density and is a hub for drug trafficking routes, while Medellín has an unfortunate history of drug cartels and gang violence. Ultimately, this region is comprised of 77% urban areas, and the most developed cities in the country are located there [60, 85] (S3 Table, Supplementary Material). Energy drinks are more frequently used in the Eastern region, characterized by a continental climate surrounded by flat territory. Our results are in line with the scientific literature suggesting that the location of regions within countries is directly associated with the consumption of PAS [26, [86–88]. The consumption of heroin, basuco, non-prescription tranquilizers, stimulants, methamphetamines, opioids, and ketamine characterizes this region. This zone is the second most developed region in the country, and 72% of urban areas [60], [85](S3 Table, Supplementary Material).Our findings also show that the Southern region is more likely to consume illegal drugs, including basuco, heroin, and Yagé (Fig 8). One of the main reasons for this result is that, unfortunately, this region has favourable environmental characteristics (i.e., majority rainforest) for their consumption and production, being the second largest illegal drug-producing region in Colombia [79]. Furthermore, this region has the highest percentage of rurality (52%) compared to the other regions, and its level of development is low as measured by the GDP [60, 85] (S3 Table, Supplementary Material).

In the Western region, also known as the Pacific region, consumption mostly mainly includes of Methylene Chloride (DICK), GHB, heroin, opioids, and methamphetamines. This region (Pacific) is mainly characterized known for its geographical isolation, poverty, and ongoing conflict, which have contributed to the growth of drug production and trafficking in the area. Poverty is one of the main factors driving drug production in the Pacific region, which has led many people to turn to drug cultivation and trafficking for survival. Additionally, the region’s rugged terrain and limited infrastructure have made it difficult for the Colombian government to establish a strong presence, allowing drug traffickers to operate with relative impunity [89]. This region has a similar percentage of urban (53%) and rural (47%) populations than the Southern region, and ranks second among the regions with the lowest levels of development (S3 Table, Supplementary Material). In the Western region, also known as the Pacific region, consumption mainly includes methylene chloride, GHB, heroin, opioids, and methamphetamines. This region is mainly characterized for its geographical isolation, poverty, and ongoing conflict, which have contributed to the growth of drug production and trafficking in the area. Poverty is one of the main factors driving drug production in the Pacific region, which has led many people to turn to drug cultivation and trafficking for survival. Additionally, the region’s rugged terrain and limited infrastructure have made it difficult for the Colombian government to establish a strong presence, allowing drug traffickers to operate with relative impunity [89]. This region has a similar percentage of urban (53%) and rural (47%) populations than the Southern region, and ranks second among the regions with the lowest levels of development (S3 Table, Supplementary Material).[89]. This region has a similar percentage of urban (53%) and rural (47%) populations than the Southern region. On the other hand, this region ranks second among the regions with the lowest levels of development (S3 Table, Supplementary Material).”

Comments from Reviewer #2

Abstract should rewrite and included technical approach more.

Done.

page 5 "Feature selection approaches for clustering can be split into filter, wrapper, embedded," are these methods just for clustering!!?

Thank you for your comment.

No, feature selection approaches do not only apply for clustering algorithms.

Feature selection is the process of selecting a subset of relevant features (variables and/or attributes) to be used in a supervised or unsupervised model and constitutes an important step in Machine Learning as including irrelevant or redundant features in a model can lead to overfitting, decreased model performance, and increased computational complexity. In this sense, feature selection approaches can be useful for classification, clustering, or regression.

In the Introduction, please give problem, challenges clearly. Introduction is written separately.

Thank you for your input. Following your advice, several changes have been made in the Introduction of the revised version of the manuscript to address this.

In introduction did not cite many reverences together" IDEC (Guo et al., 2017), DEPICT (Dizaji et al., 2017), DBC (Li et al., 2017), DualAAE (Ge et al., 2020), VAED (Lim et al., 2020), and DNC (B. Li et al., 2021) "

Done.

What is difference between convolutional auto-encoder (CAE) and stacked space auto encoder?

Both Convolutional Autoencoder (CAE) and Stacked Autoencoder (SAE) are types of autoencoders, a type of neural network architecture that is used for unsupervised learning and data compression.

The main difference between CAE and SAE is the way they handle the input data. A CAE is typically used for processing image data. It uses convolutional layers to extract spatial features from the input image and then uses deconvolutional layers to reconstruct the image. CAEs are well-suited for image data because they can capture the spatial relationships between pixels in an image and can learn to recognize visual patterns and shapes. On the other hand, SAE is typically used for processing structured or unstructured data. It consists of multiple layers of neural networks that encode the input data into a lower-dimensional representation and then decode it back to the original dimensions. SAEs are useful for feature learning and data compression in many different types of data, including text, audio, and structured data.

In summary, while CAE is focused on processing image data using convolutional layers, SAE can be applied to various types of data and uses multiple layers of neural networks for encoding and decoding.

Did not compare other previous works.

Thank you for the comments. In the supplementary section, S2 Table shows the comparison of our model with PCA-K-means, and CAE-Spectral.

Table S2. Performance metrics for different models.

Performance metric CAE-DEC PCA-K-means CAE-Spectral

Calinski-Harabasz 775992.45 128651.83 22468.26

Davies-Bouldin 0.2898 0.567 0.63

Silhouette 0.786 0.6061 0.62

In this sense, S2 Table shows that our CAE-DEC model results in the highest Calinski-Harabasz score, which means that identified clusters are dense and well separated. On the other hand, our model has the smallest Davies-Bouldin score, which indicates that identified clusters groups have a better partition. Regarding the Silhouette index, our model gives the highest value, which implies that clusters are highly dense.

Performance metric interpretation

Calinski-Harabasz score: A high score indicates that clusters are dense and well separated.

Davies-Bouldin score: Lower is better. Lower values indicating better clustering.

Silhouette index: The best value is 1 and the worst value is -1. Values near 0 indicate overlapping clusters. Negative values generally indicate that a sample has been assigned to the wrong cluster, as a different cluster is more similar.

Your problem is classification or clustering?

We appreciate your question.

The problem we are addressing is this study is a clustering problem, as we aim to identify consumption patterns of psychoactive substances (PAS) in the Colombian territory. In particular, we used clustering techniques (i.e., an ensemble model that integrates an autoencoder and cluster method) to find the different patterns of PAS consumptions in Colombian citizens considering the consumers’ location.

Please give your performance metrics

The performance metrics are stated in the Discussion of the manuscript. The relevant text reads:

“Interestingly, when comparing the CAE-DEC model proposed herein and the CAE-Spectral model using different score metrics (i.e., Silhouette score, which measures the internal density of each cluster and the distance that separates them from each other, the Calinski-Harabasz index and the Davies-Bouldin index [DBI]) showed that the our proposal model performs better (Silhouette: 0.62 vs. 0.786; Calinski-Harabasz: 22468.26 vs. 775992.45; DBI: 0.2898 vs. 0.63) than the CAE-Spectral alone model”

More information about performance metrics can be found in S2 Table of the Supplementary Material. According to our results, the proposed CAE-DEC model shows a well-separated and highly dense cluster, meaning we can define better groups and identify PAS consumer patterns more precisely.

Attachment

Submitted filename: rebuttal_letter.docx

Decision Letter 1

Vinícius Silva Belo

11 Jul 2023

PONE-D-22-28262R1Leading Consumption Patterns of Psychoactive Substances in Colombia: A Deep Neural Network-based Clustering-oriented Embedding ApproachPLOS ONE

Dear Dr. palomino, 

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Aug 25 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Vinícius Silva Belo

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments:

Please review the comments in the attached file.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: PONE-D-22-28262_R1.pdf

PLoS One. 2023 Aug 18;18(8):e0290098. doi: 10.1371/journal.pone.0290098.r004

Author response to Decision Letter 1


20 Jul 2023

1. I suggest including a location map in the methodology section. The map will offer a clear spatial reference, enabling readers to visualize precise locations of study points or mentioned areas. The inclusion of a location map can improve overall clarity and comprehension, making the article more accessible to a broad audience, including non-specialist readers.

We very much appreciate your comments. We have added a locations map in the methodology section. The relevant text now reads:

“Located in South America, the Republic of Colombia is a diverse country with a population of over 50 million people distributed over a territory of 440,831 square miles [60], encompassing jungles, highlands, grasslands, deserts, coasts, and islands, distributed in six regions and 32 departments (states)[61], (see S1 Fig). It is worth noting that, unfortunately, Colombia has been a major producer of illegal drugs for a long time, which has had a significant impact on drug consumption and abuse.”

2. On page 25, Line 7, I suggest improving the following sentence: “Coastal areas are known to have warmer temperatures and higher levels of sunshine, which can contribute to increased thirst and a higher likelihood of alcohol consumption among people.” The increase in temperature contributes to increased beverage consumption, but not necessarily alcoholic beverages.

Thank you for your suggestion. We improved the discussion section accordingly in the revised version of the manuscript. The relevant text now reads:

In our country, the coastal areas are often popular tourist destinations, and many tourists come to these areas looking for a relaxing experience, which can increase alcohol consumption. Coastal areas typically have warmer temperatures and more sunshine, increasing thirst and making people more likely to consume beverage. Additionally, bars, clubs, and restaurants serve alcoholic beverage due to the high demand from tourists and locals.

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 2

Vinícius Silva Belo

2 Aug 2023

Leading Consumption Patterns of Psychoactive Substances in Colombia: A Deep Neural Network-based Clustering-oriented Embedding Approach

PONE-D-22-28262R2

Dear Dr. Palomino,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Vinícius Silva Belo

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Vinícius Silva Belo

10 Aug 2023

PONE-D-22-28262R2

Leading Consumption Patterns of Psychoactive Substances in Colombia: A Deep Neural Network-based Clustering-oriented Embedding Approach

Dear Dr. Palomino:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Vinícius Silva Belo

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Table. Feature coherence measurements.

    (DOCX)

    S2 Table. Model comparison.

    (DOCX)

    S3 Table. Characteristics of the level of development, urbanity, rurality, and drug production in the regions of Colombia.

    (DOCX)

    S1 Fig. Location map.

    Republished from [65] under a CC BY license, with permission from [ArcGIS Hub], original copyright [2016].

    (DOCX)

    S2 Fig. The optimal number of clusters using dendrogram.

    (DOCX)

    S3 Fig. Maps of Local Indicators of Spatial Association (LISA).

    Republished from [65] under a CC BY license, with permission from [ArcGIS Hub], original copyright [2016].

    (DOCX)

    Attachment

    Submitted filename: PONE-D-22-28262_reviewer.pdf

    Attachment

    Submitted filename: rebuttal_letter.docx

    Attachment

    Submitted filename: PONE-D-22-28262_R1.pdf

    Attachment

    Submitted filename: Response to Reviewers.docx

    Data Availability Statement

    The data used in the manuscript were obtained from a third party, the Archivo Nacional de Datos (ANDA), and are fully available and anonymized. The authors confirm that others would be able to access these data in the same manner as themselves; and the authors did not have any special access privileges that others would not have. The data can be publicly retrieved from ANDA (https://microdatos.dane.gov.co/index.php/catalog/680/data-dictionary).


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES