ABSTRACT
Background
The demand for fresh strategies to analyze intricate multidimensional data in neuroscience is increasingly evident. One of the most complex events during our neurodevelopment is adolescence, where our nervous system suffers constant changes, not only in neuroanatomical traits but also in neurophysiological components. One of the most impactful factors we deal with during this time is our environment, especially when encountering external factors such as social behaviors or substance consumption. Binge drinking (BD) has emerged as an extended pattern of alcohol consumption in teenagers, not only affecting their future lifestyle but also changing their neurodevelopment. Recent studies have changed their scope into finding predisposition factors that may lead adolescents into this kind of patterns of consumption.
Methods
In this article, using unsupervised machine learning (UML) algorithms, we analyze the relationship between electrophysiological activity of healthy teenagers and the levels of consumption they had 2 years later. We used hierarchical agglomerative UML techniques based on Ward's minimum variance criterion to clusterize relations between power spectrum and functional connectivity and alcohol consumption, based on similarity in their correlations, in frequency bands from theta to gamma.
Results
We found that all frequency bands studied had a pattern of clusterization based on anatomical regions of interest related to neurodevelopment and cognitive and behavioral aspects of addiction, highlighting the dorsolateral and medial prefrontal, the sensorimotor, the medial posterior, and the occipital cortices. All these patterns, of great cohesion and coherence, showed an abnormal electrophysiological activity, representing a dysregulation in the development of core resting‐state networks. The clusters found maintained not only plausibility in nature but also robustness, making this a great example of the usage of UML in the analysis of electrophysiological activity—a new perspective into analysis that, while contributing to classical statistics, can clarify new characteristics of the variables of interest.
Keywords: binge drinking, clustering, electrophysiology, predisposition factors, unsupervised machine learning
Electrophysiological activity in certain brain networks, characterized by unsupervised machine learning, predispose certain adolescents to develop heavy drinking patterns in later years. The clustering of these functional networks in the brain allows us to understand risk factors in society, as well as to program new forms of prevention.
1. Introduction
As the data load of neurosciences keeps growing, working with its complexity and dimensionality becomes more difficult. This is why the need for original perspectives of analysis is becoming more pressing. In this regard, the emergence of innovative technologies, such as machine learning (ML), has proven to be a valuable alternative to the traditional statistical analyses (Bzdok and Meyer‐Lindenberg 2018; Landolfi et al. 2021; Ray, Wijesekera, and Cirstea 2022).
Originating in the mid‐20th century, ML appeared in pursuit of predicting new outcomes, using intrinsic characteristics from the data where the algorithm is designed to generalize and classify unseen data (Badillo et al. 2020). This, the supervised ML (SML), may be the most commonly known usage of ML. Furthermore, we can find unsupervised ML (UML), a set of techniques that allows us to analyze raw data without preexisting labels, not predicting any outcome, but grouping data by its similarities or reducing dimensionality of the general dataset by eliminating redundancy. The application of UML has increased as novel technologies (ML, artificial intelligence…) and has been refined, influencing several lines of research, like disease identification (Eshaghi et al. 2021; Faghri et al. 2022; Kung et al. 2022) and differential analyses of traits of interest in healthy population (Ghomroudi, Scaltritti, and Grecucci 2023; Palacios, Noreña, and Londero 2020; Sorella et al. 2022).
The UML approaches include hierarchical agglomeration algorithms, which involve methods that group variables within the data based on similarities of their characteristics. This framework clusters variables in groups, from 1 (where all data are enclosed) to k groups (where each variable is its own group). These approaches allow us to discretize dataset in a k number of groups with an optimal trade‐off between information and coherence to achieve better explainability (Sasirekha and Baby 2013).
Numerous criteria have been designed to group variables based on their similarity. When aiming to cluster complex data (such as neuroimaging and electrophysiology data), it is crucial to minimize internal noise within the clusters. Ward's minimum variance criterion (Ward 1963), an enhanced version of the Lance–Williams dissimilarity formula, facilitates the coherent grouping of data improving compactness of clustering and maximizing the information captured within them (Murtagh and Legendre 2014). This technique facilitates extracting relevant information from complex interactions.
Sadaghiani, Brookes, and Baillet (2022) noted that electrophysiology and, mostly, brain connectivity contain a high degree of complexity, especially when the system is affected by abnormal situations, like a disease or a misregulation. Moreover, brain's complex dynamics are extremely sensible to both internal and external factors that may contribute to alter its functioning and subsequent behavioral outcomes. In this scenario, UML emerges as a useful approach to disentangle the complex associations between brain's functioning data and behavioral profiles.
Adolescence is a critical developmental stage where brain dynamics change toward a more organized system (Arain et al. 2013). During this period, the brain is undergoing a general maturation across several functional systems, radically changing its electrophysiology, with important consequences on cognition and behavior. One of the most common lifestyle factors emerging during adolescence is risk‐taking behaviors, and more concerning, intensive alcohol consumption or binge drinking (BD) (Hammer et al. 2018). BD is characterized by the ingestion of high doses of alcohol (at least 4 standard alcohol units (SAUs) for women and 5 for men) within short periods of time of 2–3 h (Courtney and Polich 2009). This pattern of consumption has been demonstrated to be harmful to the adolescent's brain, due to its acute vulnerability against external factors during its development (Guerri and Pascual 2010). More precisely, BD has been shown to lead to abnormal structural (Doallo et al. 2014; Sousa et al. 2017), functional (Almeida‐Antunes et al. 2022; Blanco‐Ramos et al. 2022; Correas et al. 2016), and neuropsychological impairments (Gil‐Hernandez and Garcia‐Moreno 2016), which can have enduring consequences for individuals.
One of the key advancements in the study of adolescent BD is the identification of predisposition profiles that cause some individuals to engage in BD during adolescence, an area of research that is gaining increasing importance. (Green et al. 2023). Neuroanatomical studies have identified brain's structural differences, including a reduction in the average volume of prefrontal, temporal, and parietal regions before BD onset (Brumback et al. 2016; Squeglia et al. 2017). Regarding neurofunctional evidence, fMRI studies have reported decreased BOLD signal among prefrontal and parietal regions (Norman et al. 2011; Wetherill et al. 2013), tightly associated with behavioral control circuits. In addition, electrophysiological studies using magnetoencephalography (MEG) have found increased functional connectivity (FC) profiles associated with future BD patterns later in adolescence, both in inhibitory control networks (Antón‐Toro et al. 2021; Antón‐Toro et al. 2022; Del Cerro‐León et al. 2024) and resting‐state networks (RSn) (Antón‐Toro et al. 2023). All these findings suggest neurodevelopmental abnormalities linked to the development of alcohol misuse. Nevertheless, despite the growing body of evidence concerning this matter, the specific relationship between complex electrophysiological profiles and future BD behaviors remains poorly understood.
With that objective, this prospective study applies UML techniques in the identification of electrophysiological patterns associated with future alcohol BD in teenagers. We expect, based on previous results (referenced above), that higher FC and power spectra patterns, related to alcohol consumption, will be found. We used a database containing resting‐state MEG signals from adolescents before the onset of alcohol consumption, and their relationship with consumption habits 2 years later. Using Ward's minimum variance clustering, we explored the interaction between FC and power spectra profiles in association with future alcohol consumption. This framework allowed us to identify patterns of electrophysiological activity, distinctive of higher alcohol intake years later during adolescence, topographically consistent with the organization of several functional systems. This method would offer a new perspective regarding electrophysiological traits associated with BD development by analyzing the interaction patterns between multiple variables and complementing the previous statistical analysis and results found in the field.
2. Methods
2.1. Participants
The sample for this study was comprised of adolescents recruited from various schools in the Region of Madrid, as part of two independent longitudinal projects funded by the Spanish Ministerio de Sanidad in 2015 and 2017. Both projects followed the same assessment protocol (Figure 1A), consisting of two evaluation stages separated by a 2‐year follow‐up period. Prior to the experiment, all participants reported no history of alcohol consumption or familiar alcohol use disorder. In addition, all subjects successfully completed the alcohol use disorder identification test (AUDIT) (Guillamón, Solé, and Farran 1999), and individuals who reported prior alcohol use were excluded from the study. In the first stage (before alcohol use onset), 148 adolescents agreed to participate in the first arm of the study, undergoing a semi‐structured interview about their drug use habits to exclude any potential consumers; later, they underwent an MEG study, consisting in 5‐min eyes‐closed resting‐state, from which 142 also agreed to participate in an MRI study.
FIGURE 1.
Schematic overview of the general workflow, from data acquisition to data treatment, analysis and representation. (A) Representation of the data sampling and acquisition, with MEG recording and interviewing steps, prior and post 2‐year time leap. (B) Representation of the machine learning workflow, from the initial data sources (Spearman correlations between power spectrum/functional connectivity and alcohol consumption) to the general steps within the pipeline, with the packages and the programming language used for them; and the data output, with the visual representation, inscribed in the AAL atlas, of the clusters formed.
After a follow‐up period of 2 years, in the second stage, 104 of these participants underwent the AUDIT test and a semi‐structured interview again, aiming to measure their alcohol consumption habits. Using this information, we calculated the quantity of SAUs consumed during regular drinking episodes for each participant (with a mean consumption of 3.7 ± 2.98), considering the number of beverages consumed within a 2–3‐h period (a histogram quantifying the values for each alcohol consumption group can be found in Figure S1). Tobacco and cannabis use were controlled by excluding those participants with regular use of these substances. Finally, after quality control of the MEG and MRI data, a final sample of 103 subjects (mean age 13.75 ± 0.64, 53 females) completed the entire protocol and were selected for analysis.
The sample consisted almost entirely of native‐born participants and was balanced in terms of weight and height; elements such as parental education, participants' auto‐informed quality of life, or dedication to the study were also controlled for (for a more detailed explanation of demographics, see Table S1). Informed consent was obtained from all participants and their parents or legal guardians during the first stage of the study, following the guidelines outlined in the Declaration of Helsinki. The ethical committee of Universidad Complutense de Madrid granted approval for the study.
2.2. MRI Recordings
The participants underwent a 3D T1‐weighted high‐resolution brain MRI scan with a power of 1.5 T from the Santa Elena Foundation (General Electric Optima MR450 w, echo time = 4.2 ms, repetition time = 11.2 ms, inversion time = 450 ms, flip angle = 12°, field of view = 100, acquisition matrix = 256 × 256, and slice thickness = 1 mm) or the Clinical Hospital of Madrid (General Electric Signa HDxt, echo time = 4.2 ms, repetition time = 11.2 ms, inversion time = 450 ms, flip angle = 20°, field of view = 100, acquisition matrix = 256 × 256, and slice thickness = 1 mm).
2.3. MEG Recordings
The recordings were conducted at the Cognitive and Computational Neuroscience Laboratory (UCM‐UPM) of the Biomedical Technology Centre (CTB) (Madrid, Spain) using an Elekta Neuromag system with 306 channels (Elekta AB, Stockholm, Sweden) placed inside a magnetically shielded room (VacuumSchmelze GmbH, Hanau, Germany). We acquired 5 min of task‐free (resting‐state) data with eyes closed using an online bandpass filter between 0.1 and 330 Hz, and a sampling rate of 1000 Hz. The participants were asked to remain still, with eyes closed, during the time of the study, as relaxed as possible. After 5 min, the participants were instructed to open their eyes and look for a fixated cross for another 5 min.
To later calculate their electrical components and subtract them from the brain signal, we also acquired eye blinks and heartbeat using two sets of bipolar channels with the same configuration. To aid in the source reconstruction stage, the head shape of the participants was also obtained using a Fastrak three‐dimensional digitizer (Polhemus, Colchester, and Vermont).
We initially preprocessed the data using a spatial filtering provided by the manufacturer (tSSS) (Taulu and Hari 2009) with MaxFilter software (v 2.2, Elekta AB, Sweden), tuning the parameters to 0.9 as correlation limit and 10 s as length of the correlation window. Then, we used the FieldTrip packages (Oostenveld et al. 2011) to detect artifacts and remove interfering signals (noise, heart beats, ocular activity) with a SOBI‐based ICA (Belouchrani et al. 1997). Finally, we segmented the data into 4‐second epochs, avoiding the artifacted segments.
2.4. Source Reconstruction
As a source model, we defined a homogeneous grid with 1 cm of separation between sources using the MNI template, yielding 2459 source positions inside the cranial cavity. These sources' positions were labeled using the AAL atlas (Tzourio‐Mazoyer et al. 2002), and only those 1206 positions designated as 1 of the 78 cortical areas were considered for further analysis. With the help of a linear normalization between the MNI space standard T1 image and the subject‐specific T1 image, we transformed this source model into a subject space, segmenting the image into the different tissues using the unified segmentation algorithm in SPM12 (Ashburner and Friston 2005). This combination was used to build a realistic single‐shell interface representing the inner skull cavity. Then, we transformed this source and volume conduction model into the MEG space using the head shape as aid.
We solved the forward problem by building a lead field based on a modified spherical solution (Nolte 2003), and the inverse problem by using an LCMV beamformer (Van Veen et al. 1997), finally recreating the source‐level activity. The spatial filter was build using LCMV and the covariance matrix generated from the broadband (2–45 Hz) sensor‐space activity, filtered using a 1800th ‐order FIR filter. To avoid any possible distortion, data were filtered in two passes, and using 2 s (2000 samples) of real data at each side of the epoch as padding.
2.5. Power Spectrum
We estimated the relative source‐level activity for specific frequencies of each band (theta (4–8 Hz), alpha (8–12 Hz), beta (12–20 Hz), and gamma (30–45)) using an mtmfft method with dpss as windowing function, with a 1 Hz smoothing. These data served as the precise dataset of brain activation in each of the AAL 78 cortical regions for each frequency band.
2.6. Functional Connectivity
We used the phase locking value (PLV), which examines consistency of the phase differences between two time series, to evaluate FC. To determine the PLV (Bruña, Maestú, and Pereda 2018; Lachaux et al. 1999) for each frequency and epoch, we used the Hilbert transformation to extract the instantaneous phase of each signal node at each specified time and then estimated the synchrony between each pair of signals using the difference of their phases. Both the band‐pass and the Hilbert filters were applied over the epochs including 2 s of real data as padding (removed before continuing the analysis). We obtained the whole‐brain FC matrix, estimating the PLV between each pair of cortical sources position. Then, we estimated the PLV between each pair of the 78 cortical regions using the root mean square of the PLVs. Finally, we calculated the FC strength of each ROI as the average PLV value of that ROI.
2.7. Unsupervised Machine Learning Workflow
Although the usage of independent variables (FC values, power spectrum, and alcohol consumption) was plausible, our work focused on the relationship between these variables, not the variables themselves. To this aim, we decided to work with Spearman correlation matrices between the electrophysiological variables and the alcohol consumption, as the relationships between variables are expected to be nonlinear.
We followed a pipeline (Figure 1B) using predominantly Python libraries. First, to determine the optimal number of groups into which our dataset could be divided, we assessed the cluster quality with varying numbers of groups. This involved using k‐means automatically grouping of the original dataset, iterating with a range of groups from 2 (the minimal number of groups possible) to 10 (a quantity estimated as large enough to be adequate). The efficiency of the groups produced was then assessed by computing the within‐cluster sum of squares (WCSS), as well as cohesion measures such as Silhouette score (Rousseeuw 1987), the Calinski–Harabasz index (Caliński and Harabasz 1974), and the Davies–Bouldin index (Davies and Bouldin 1979). Consequently, the optimal number of groups was determined for the dataset, considering it not as the strictly best number but rather as the one yielding the highest consistency.
We performed this analysis using the tools provided by the Sci‐kit learn metrics package (https://scikit‐learn.org/stable/modules/classes.html#module‐sklearn.metrics). All metric values for the range of groups k described above are depicted at large in Table S2.
To enhance the information contained within the working dataframe, as well as normalizing the data and reducing the internal noise, improving the coherence of the clustering, we added an additional step, performing a normalization of the data based on mutual information (Cover and Thomas 1991). This technique, emerged from information theory, characterizes the amount of information from a variable x within a variable y, measuring the redundancy between variables, and resulting in a measure of the uncertainty of all our variables by eliminating this redundant information. The correction applied to the original dataset, then, is based on this measurement, specifically from the variation of information (Meilă 2003) within the system, a measure of the total entropy (or novel information) each step of the correction. Applying a correction of the whole data (power spectra and FC variables) in each iteration, we filter the noise, increasing novel information content, thereby enhancing the system's stability. The package utilized to calculate the mutual information score was sourced from the Sci‐kit learn package mentioned before.
Finally, with this corrected dataset, we performed data clustering by similarity using agglomerative hierarchical UML algorithms. We considered the previously obtained division value k for the group number, to direct the algorithm toward obtaining k separated clusters. We found, across all bands, that the most efficient number of groups in which our dendrogram should be cut was 2. We performed the same pipeline for several methods and, given its superior stability across various iterations of the model and its robustness in generating coherent groupings, we opted for the minimum variance model or the Ward's method:
where u is the newly joined cluster consisting of clusters s and t, v is an unused cluster within the data frame, and
as depicted in the literature (Bar‐Joseph 2001; Müllner 2011). We used Euclidean distance to quantify the distance between clusters (as a measure of dissimilarity between variables).
The ultimate representation and analysis of the data following the data clustering process was carried out using dendrogram visualizations, tree‐like structures where data are grouped based on the distance between variables. For each frequency band, we obtained a dendrogram, which identifies, within distinct groups, the cortical ROIs associated with each type of variable (strength‐consumption or power spectrum‐consumption).
All the methods utilized for unsupervised hierarchical agglomeration, as well as the representation in dendrograms, were sourced from the SciPy toolkit (https://docs.scipy.org/doc/scipy/index.html).
3. Results
3.1. Theta Band
Theta band clustering defined two separate clusters with an evident and uneven separation (see Figure 2A).
FIGURE 2.
Dendrogram‐based depiction of the distribution of correlations among electrophysiological variables (power spectrum and strength) and consumption (SAUs), generated through unsupervised machine learning, utilizing Ward's minimum variance criterion, for each frequency band (A: theta, B: alpha, C: beta, D: gamma). For each band, within the overall grouping: 1 (in blue) = Cluster 1 grouping; 2 (in red) = Cluster 2 grouping.
Cluster 1 has an average intergroup distance of 1.41 ± 0.84 (SD), with a general cutoff at 4.62. We found that negative power–BD correlations appeared in this cluster in parietal cortex (somatosensorial cortex, supramarginal gyrus), as well as in the middle cingulate and temporal cortices. Specifically, the Heschl's and the superior temporal gyri were included only for the right hemisphere. On the other hand, we found that positive strength–BD variables appeared in the occipital regions (left lingual and inferior occipital gyri), as well as in the posterior cingulate cortex and superior parietal and angular gyri (see Figure 3A–C).
FIGURE 3.
Visual representation of the grouping, into two distinct clusters of correlations among electrophysiological variables (power spectrum and strength) and alcohol consumption (SAUs) for the four frequency bands of interest (theta, alpha, beta, and gamma, in descending order). (A, D, G, J): Visual representation of variables within Cluster 1 for each frequency band. (B, E, H, K): Visual representation of variables within Cluster 2 for each frequency band. (C, F, I, L): Correlation graph between electrophysiological variables and alcohol consumption for each lobe and the cingulate cortex, for both clusters within each frequency band. In a stripped pattern are the values of the overlapped ROIs within each group of areas.
The vast majority of the variables were found within Cluster 2. This cluster is characterized by an average intergroup distance of 1.36 ± 1.08 (SD), breaking away from the general group at 6.95. Negative power–BD and positive strength–BD correlations overlapped, in this cluster, at the prefrontal regions, anterior cingulate cortex, superior temporal gyrus, and occipital lobe. Individually, negative power–BD correlations appeared at the angular gyrus, the posterior cingulate cortex and precuneus; on the other hand, positive strength–BD variables were found in the parietal cortex (somatosensorial cortex, supramarginal gyrus) as well as in the middle cingulate cortex and paracentral and fusiform gyri (see Figure 3B,C).
For a detailed report of each individual region in both clusters, see Figure S2A.
3.2. Alpha Band
Clustering in the alpha band represents an unbalanced division of variables (see Figure 2B).
Cluster 1 has an average intergroup distance of 1.44 ± 1.06 (SD), with a general block cutoff at 6.17. Negative power–BD correlations were found in parieto‐temporal regions (including the precuneus, the posterior paracentral, and angular gyri), with the addiction of the positive power–BD correlations on the dorsal superior frontal gyrus. Positive strength–BD correlations appeared overlapping with power–BD variables in the supplementary motor area, middle cingulate cortex, parietal regions, and temporal areas (plus the hippocampus and parahippocampus) (see Figure 3D–F).
Regarding Cluster 2, it exhibits an average intergroup distance of 1.32 ± 1.05 (SD), with a general block cutoff occurring at 6.47. Positive power–BD variables were shown around occipital regions, as well as the temporal poles and the anterior cingulate cortex; negative power–BD correlations were found in the frontal lobe. On the other hand, positive strength–BD variables appeared in the same areas but taking all frontal lobe (with the exception of the right precentral gyrus) as well as all occipital lobe and part of the temporal poles. Both variables overlapped at the orbitofrontal gyri, as well as in the temporal poles and the most posterior part of the occipital lobe (see Figure 3E,F).
For a detailed report of each individual region in both clusters, see Figure S2B.
3.3. Beta Band
Clustering in the beta band represents an uneven but distributed clusterization (see Figure 2C).
Cluster 1 was identified as a highly cohesive cluster, with an average distance of 1.45 ± 0.93 (SD), breaking away from the general block at 5.18. Positive power–BD correlations appeared in parieto‐temporal regions (all temporal gyri, as well as Heschl's gyrus), and the middle cingulate cortex. On the other hand, positive strength–BD correlations were found with a more occipital orientation, reaching also the posterior cingulate cortex, as well as the middle and inferior temporal gyri. Both variables overlapped there, as well as in the supramarginal and the fusiform gyri (see Figure 3G–I).
Cluster 2 has an average intergroup distance of 1.3 ± 1.13 (SD), with a cutoff of the general group at 7.88. Both positive power–BD and positive strength–BD correlations overlapped at the temporal poles, as well as in the frontal lobe (excepting for precentral gyrus, not present for power–BD) and in the anterior cingulate cortex and insula. Positive power–BD also appeared in the occipital lobe, as well as in the posterior cingulate cortex. Positive strength–BD, on the other hand, appears in the parietal lobe (superior parietal and angular gyri, and precuneus) and the middle cingulate cortex (see Figure 3H,I).
For a detailed report of each individual region in both clusters, see Figure S2C.
3.4. Gamma Band
Gamma band clustering defines an even and balanced division of variables (see Figure 2D).
Cluster 1 exhibits an exceptionally high internal coherence, with an average distance of 1.48 ± 0.76 (SD), and an early separation of the general block at 4.12. Positive power–BD correlations were shifted toward the right hemisphere, taking the entire frontal lobe (except for the middle and orbital gyri) as well as the temporal and the anterior part of the parietal lobe (paracentral, postcentral, and supramarginal gyri). Positive strength–BD correlations were shown at the occipital lobe completely. Both variables overlapped at the left precentral gyrus, as well as the left postcentral and supramarginal gyri, the paracentral gyrus, and the right middle temporal gyrus (see Figure 3J–L).
Cluster 2 has an average intergroup distance of 1.36 ± 0.96 (SD), diverging from the general block at 6.02. Positive power–BD correlations appeared in the occipital lobe, whereas negative power–BD correlations were found at orbitofrontal areas. On the other hand, positive strength–BD correlations are shifted toward the right hemisphere, appearing in the temporal lobes as well as in the frontal and right parietal regions (right postcentral and supramarginal gyri). Both variables overlapped at the anterior cingulate cortex, as well as in the orbitofrontal areas and the gyrus rectus (see Figure 3K,L).
For a detailed report of each individual region in both clusters, see Figure S2D.
3.5. General Analysis of Precision of Clustering
To ascertain the viability and explainability of the clustering performed, we replicated the general pipeline in a random set of data. The results, shown in the Figure S3, are a clear demonstration of a random patching of cerebral areas in both power–BD and strength–BD correlations, different from the results found in real electrophysiological data clustering. We also assessed the efficiency of different clustering criteria, choosing Ward's minimum variance as the most plausible algorithm we could use. The different clustering made by the different criterion can be visualized in Figure S4.
4. Discussion
The application of ML techniques has witnessed a progressive increase in recent years. Their growing renown can be attributed to their ability to discern previously undetected patterns within datasets, which can pave the way for novel hypotheses and subsequently inform future analytical processes. In the present study, UML methodologies were utilized to elucidate patterns of correlation between electrophysiological variables, including power spectra and FC, and the subsequent emergence of alcohol consumption habits in adolescents. Through this methodology, we were able to discern unique clustering conformations, each exhibiting a uniform cortical distribution and displaying distinct associations with alcohol consumption across different frequency bands.
4.1. Machine Learning Results
Most of the technical analysis in this paper is based on the examination of the cohesion and coherence between the different variables under investigation. Electrophysiological brain activity presents nonlinear relationships that, when computed in association with environmental traits (like alcohol intake), increase its complexity, making it crucial to provide a new perspective other than classical statistics. The type of data we encounter in the various frequency bands is inherently distinct, as was demonstrated by the clustering results depicted above. Depending on the frequency range we are dealing with, the achieved cohesion varies from the maximum homogeneity observed in the slowest band (theta), to the maximum heterogeneity observed in the division of the fastest, and possibly noisiest, band (gamma). These unequal responses speak not only to the nature of the different frequencies but also to the precision of the algorithm's division; even when analyzing noisy frequency bands, the clustering depicted keep a consistency with ecological characteristics of the brain. In this sense, the richness in the nature of the different frequency bands does not reduce the explainability of the algorithm, finding plausible groups.
Applying rigorous clustering criteria such as the Ward's minimum variance method has proven highly suitable for the information considered in this work. As we can observe when comparing the results obtained from random data to those obtained from actual data, the separation and grouping of variables within the anatomically defined areas of the brain is coherent, while also maintaining bilateral symmetry. Other less stringent (yet not simplistic) algorithms (such as single linkage), or methods based on geometric criterion (like centroid or median linkage), yield less precise groupings. In fact, ad hoc analyses were performed, finding that, as we can observe in Figure S4, the clusterings found using these other different criteria are practically random; algorithms that apply weights to the clusterings (like weighted linkage) introduce smoothing in the divisions, subsequently losing information. It is necessary, therefore, to obtain groupings that consider not only the intragroup cohesion of each cluster but also the comparative coherence of one cluster with the next; in this sense, the Ward's algorithm provides both capabilities, rendering it ideal for our scenario (Murtagh and Legendre 2014).
4.2. Electrophysiological Results
Overall, our results indicate that power spectra and FC variables coalesce into two distinct similarity clusters, primarily characterized by four unique, anatomically driven patterns of activity: dorsolateral and medial prefrontal, sensorimotor and temporal, medial and posterior parietal, and occipital patterns. These patterns are uniformly distributed in the cortex across frequency bands, yet they exhibit diverse interactions among themselves. Notably, these patterns are in line with the disposition of resting‐state networks identified in electrophysiological MEG data by Brookes et al. (2011), clustering cortical activity in crucial nodes of the default mode network: the frontoparietal network, the sensorimotor network, the medial parietal network, and the visual network. Concerning the relationship between power and strength variables with alcohol misuse, the distribution of these RSn within the two similarity clusters contrasts between each other, that is, regions pertinent to one variable in Cluster 1 correspond to those related to the other variable in Cluster 2. These results evidence the complex and nonlinear interactions between power and FC dynamics within cortical networks.
General results showed that patterns of higher FC across frequency bands and functional networks correspond to higher levels of alcohol use in the future. On the other hand, power exhibited diverse associations depending on the frequency and the cortical network. Interestingly, slower frequency bands tend to exhibit an inverse relationship of power and strength variables with alcohol use. Contrarily, within faster frequency bands both power and strength variables showed patterns of positive correlations with alcohol consumption. However, there are notable exceptions, concerning the prefrontal power within the alpha and gamma bands, which reversed these patterns. Such divergent patterns of prefrontal alpha and gamma bands are coincident with their unique distribution within the prefrontal cortex, separating the activity of medial and orbital regions from the dorsolateral part. Such regions have been demonstrated to be the core of the neuromaturative changes of the adolescence's brain (Blakemore and Choudhury 2006) and have been found to be particularly associated with the potential development of risk behaviors such as substance consumption (Blakemore and Choudhury 2006; Caballero, Granberg, and Tseng 2016; Crane et al. 2018).
4.3. Relevance and Relationship With Prior Findings
Prior studies have pointed the neurological underpinnings predisposing adolescents to substance consumption, often highlighting the differences of neural pathways regulating self‐control and risk assessment (Crews, He, and Hodge 2007; Spear 2000). This relationship gains complexity when considering prior research indicating variances in RSn among adolescents, which are thought to underlie various cognitive and behavioral aspects of addiction. Studies have shown alterations in RSn in adolescents with substance abuse disorders, including those prone to BD (Antón‐Toro et al. 2022; Chen and Lasek 2020; Correas et al. 2016; Sousa et al. 2019). These networks are instrumental in self‐referential thinking, emotional regulation, and cognitive control, respectively, and their dysregulation has been associated with increased impulsivity and risk‐taking behaviors, prevalent in individuals with adolescent alcohol misuse (Green et al. 2023). Our observation of the distinct cortical activity clusters coincides with these studies, suggesting a distinct interplay between various cortical networks and their developmental trajectories. For instance, the heightened sensitivity in the dorsolateral and medial prefrontal regions, as corroborated by our power and FC dynamics, resonates with their established role in decision‐making and emotional regulation, faculties often compromised in adolescents prone to BD (Casey, Jones, and Hare 2008). Furthermore, this cortical complexity works as a potential mediator of behavior, reflecting the neuroadaptive processes, critical during adolescence (Blakemore and Choudhury 2006).
Research on the electrophysiological signatures of predisposition to BD is currently limited, and it is therefore challenging to connect present findings with existing literature. Nevertheless, the observed outcomes can be evaluated in the context of developmental trajectories. Established research has considered a typical neurodevelopment of electrophysiological markers during adolescence (Brookes et al. 2018; Hunt et al. 2019). Broadly, throughout adolescence, there is a decline in power–spectra density in the theta band (in posterior regions), whereas there is an increase in beta and gamma bands (in the latter, in the prefrontal regions), having a mixed evolution in the alpha band (increase in temporoparietal regions and reduction in prefrontal regions). Given this backdrop, our results suggest that a precocious maturation profile within specific cortical areas and frequency bands could correlate with increased future consumption tendencies. In contrast, patterns discerned in the alpha band for both prefrontal and parieto‐occipital zones seem to present an inverse correlation with the conventional neuromaturational trajectory. This inverse correlation is also noticeable in the prefrontal pattern of power in the gamma band, suggesting a contradictory link between consumption and cortical power compared to standard neurodevelopmental progression. In relation to FC, a consistent increase across all frequency bands is noted during adolescence, with the exception of the gamma band, which exhibits a reduction with age. Interestingly, our results indicate that pronounced alcohol misuse aligns with elevated FC in the gamma band, especially within the frontal cortex. All these neuroadaptive mechanisms might identify individuals more susceptible to substance use as a function of both the unique neural maturational profiles and the concurrent psychosocial pressures typical of this developmental stage.
4.4. Relevancy of Machine Learning, Future Implications, and Limitations
The results obtained in this work underscore two key points regarding the utility of ML in the analysis of neurophysiological variables. First, it prompts consideration for the optimization and refinement of analyses that would involve a substantial effort, performing complementary analyses to those made by classical statistics. The use of such tools has successfully navigated studies that would otherwise have been impossible or exceedingly laborious and costly (Drysdale et al. 2017; Längkvist, Karlsson, and Loutfi 2012; Molano‐Mazon et al. 2018). This also mitigates potential biases that researchers may introduce when analyzing or processing their data. Within the field of neuroscience, reducing noise and enhancing the coherence of analyses is important, and the application of these automated techniques might contribute to increase precision and objectivity (Glaser et al. 2019; Ullman 2019; Vu et al. 2018), giving support to the existing knowledge. Second, the study of neuroscience aims to go beyond the analysis of isolated variables, but rather the examination of interactions among them, with a holistic perspective. In this context, the application of non‐linear methodologies that allow for the visualization or identification of links among variables is essential. Techniques such as those used in this work, or ML strategies like reinforcement learning (RL) (Botvinick et al. 2020; Matsuo et al. 2022), are invaluable. They significantly diminish the dimensionality and intricacy of the data, displaying the information into a comprehensible structure. These advanced approaches are crucial for refining complex datasets, particularly in studies dealing with multifaceted electrophysiological information and its behavioral outcomes.
The findings presented here are consistent with empirical observations in natural settings. The extraction of patterns could be used in the future as a foundational source of information for training convolutional neural networks (CNNs) that predict the occurrence of intensive alcohol consumption behaviors in adolescents solely using noninvasive neuroimaging measures. This would enable the identification of potential future consumers, allowing institutions to increase the resources provided to prevent these behaviors.
These findings, while pointing in a direction that aligns with prior knowledge and providing new insights into the next steps to be taken, have several limitations worth highlighting. The use of ML techniques, as employed in this article, poses significant issues regarding explainability. In neuroscience, the use of induction algorithms like these can be highly beneficial, but it is crucial to minimize reliance on the “black boxes” that ML algorithms rely on to function. Peering into these boxes to avoid being left with prediction values lacking relevant information can be costly and requires the use of simpler techniques or a thorough understanding of the methods employed. The use of UML or RL and deep RL techniques may be a step toward adding explainability, albeit at the expense of some efficiency and robustness compared to the more commonly used methods.
5. Conclusion
This work points out that areas closely related to the neurodevelopment of teenagers, such as the prefrontal cortex, the DMN, and the somatosensory cortex, are pivotal areas within the clustering performed by UML techniques, serving as patterns within the observed dynamics, and providing new information regarding abnormal neurodevelopment as predisposing factor toward alcohol consumption. These results open up future studies in the search for predisposition patterns and the assessment of alcohol consumption in adolescents, as well as providing new insights on the usage of novel automatic techniques.
Author Contributions
Marcos Uceta: conceptualization, data curation, formal analysis, methodology, visualization, writing–original draft, writing–review and editing. Alberto del Cerro‐León: conceptualization, data curation, formal analysis, investigation, writing–original draft, writing–review and editing. Danylyna Shpakivska‐Bilán: data curation, formal analysis, methodology, writing–original draft, writing–review and editing. Luis M. García‐Moreno: conceptualization, project administration, resources, supervision, writing–review and editing. Fernando Maestú: conceptualization, project administration, resources, supervision, validation, writing–review and editing. Luis Fernando Antón‐Toro: conceptualization, data curation, formal analysis, methodology, project administration, supervision, validation, visualization, writing–original draft, writing–review and editing.
Ethics Statement
The studies involving human participants were reviewed and approved by Universidad Complutense de Madrid.
Consent
Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.
Conflicts of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Peer Review
The peer review history for this article is available at https://publons.com/publon/10.1002/brb3.70157.
Supporting information
Figure S1. Distribution of alcohol consumption (Standard Alcohol Units, SAUs) throughout the population, represented in a histogram. In the X axis, the value of SAUs in the population are represented. In the Y axis, the number of participants in each group is represented.
Figure S2. Graphs of correlations, broken down for all areas of the AAL atlas, between electrophysiological variables (power spectrum and strength) and consumption (SAUs) for the four frequency bands of interest (A: alpha, B: beta, C: gamma, and D: theta). For each, on the left are the correlation graphs for Cluster 1, with: power‐UBEs graph (in blue), strength‐UBEs graph (in green), and the graph showing areas of overlap between both variables. On the right, the correlation graphs for Cluster 2, with: power‐UBEs graph (in blue), strength‐UBEs graph (in green), and the graph showing areas of overlap between both variables. For each of the graphs, beneath the title, the mean rho is displayed, along with its standard deviation.
Figure S3. Visualization of clusterization of random data compared to visualization of real electrophysiological data (theta, alpha, beta, and gamma band, respectively). (A) Visualization of data from Cluster 1. (B) Visualization of data from Cluster 2. (C) Dendrogram and cohesion/coherence parameters of the random data clustering. In blue, the parameters of the first cluster; in red, the parameters of the second cluster.
Figure S4. Representation of different clusterization, based on specific linkage criterion, of gamma frequency band data, visualizing the random or plausible patterns generated by the different criterion. (A) Simple and geometric criterions of linkage; this type of criteria generate random‐like clusterization. (B) Complex criteria, like Ward's minimum variance (the one used on the study). (C) Weighted linkage criterion. While producing Ward‐like clusterization, generates a smoothing of the data, reducing the information given by the clusterization.
Table S1. Sample demographics. Number of participants belonging to each sex, with percentage relative to the total sample; number of participants with country of birth in Spain, and with country of birth other than Spain, with percentage relative to the total sample; number of participants with mother tongue Spanish and with mother tongue other than Spanish, with percentage relative to the total sample; mean age, in years (with standard deviation); mean weight, in kilograms (with standard deviation); mean height, in centimeters (with standard deviation); mean study time, in hours, self‐reported by the participant (with standard deviation); mean level of parental and maternal education (with standard deviation), from 1 to 4, where 1 represents basic levels of education, and 4 represents advanced levels of education; mean health status (with standard deviation), self‐reported by the participants, from 1 to 9, where 1 represents low levels of health, and 9 represents ideal health status; and main scores of the neuropsychological tests run throughout the study, with sensation‐seeking tests (SSS‐V), executive function tests (BDEFS‐20, BRIEF‐SR, DEX) and impulsivity tests (BIS‐11) (reported values are from the total scores of the tests, not sub‐scores of said tests).
Table S2. Metrics of efficiency for the unsupervised machine learning model for each frequency band. For each band, the used metrics were: within‐cluster sum of squares (WCSS), silhouette score, Calinski‐Harabasz index, and Davies‐Bouldin index. Highlighted in grey, for each frequency band and metric, is the most efficient k number of groups.
Funding: This article is funded by the Plan Nacional de Drogas of the Ministry of Health of the Spanish Government (Grant/Award Number: PND2021I075).
Data Availability Statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation. The source code of the methods are available in https://scikit‐learn.org/stable/modules/classes.html#module‐sklearn.metrics and https://docs.scipy.org/doc/scipy/index.html, being both public, open access libraries.
References
- Almeida‐Antunes, N. , Antón‐Toro L., Crego A., Rodrigues R., Sampaio A., and Lopez‐Caneda E.. 2022. “‘It's a Beer!’: Brain Functional Hyperconnectivity During Processing of Alcohol‐Related Images in Young Binge Drinkers.” Addiction Biology 27, no. 2: e13152. 10.1111/ADB.13152. [DOI] [PubMed] [Google Scholar]
- Antón‐Toro, L. F. , Bruña R., Del Cerro‐León A., et al. 2022. “Electrophysiological Resting‐State Hyperconnectivity and Poorer Behavioural Regulation as Predisposing Profiles of Adolescent Binge Drinking.” Addiction Biology 27, no. 4: e13199. 10.1111/adb.13199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Antón‐Toro, L. F. , Bruña R., Suárez‐Méndez I., Correas Á., García‐Moreno L. M., and Maestú F.. 2021. “Abnormal Organization of Inhibitory Control Functional Networks in Future Binge Drinkers.” Drug and Alcohol Dependence 218: 108401. 10.1016/j.drugalcdep.2020.108401. [DOI] [PubMed] [Google Scholar]
- Antón‐Toro, L. F. , Shpakivska‐Bilán D., Del Cerro‐León A., et al. 2023. “Longitudinal Change of Inhibitory Control Functional Connectivity Associated With the Development of Heavy Alcohol Drinking.” Frontiers in Psychology 14: 1069990. 10.3389/fpsyg.2023.1069990. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Arain, M. , Haque M., Johal L., et al. 2013. “Maturation of the Adolescent Brain.” Neuropsychiatric Disease and Treatment 9: 449–461. 10.2147/NDT.S39776. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ashburner, J. , and Friston K.. 2005. “Unified Segmentation.” NeuroImage 26, no. 3: 839–851. 10.1016/j.neuroimage.2005.02.018. [DOI] [PubMed] [Google Scholar]
- Badillo, S. , Banfai B., Birzele F., et al. 2020. “An Introduction to Machine Learning.” Clinical Pharmacology & Therapeutics 107, no. 4: 871–885. 10.1002/cpt.1796. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bar‐Joseph, Z. 2001. “Fast Optimal Leaf Ordering for Hierarchical Clustering.” Bioinformatics 17, no. 1: 22–29. 10.1093/bioinformatics/17.suppl_1.S22. [DOI] [PubMed] [Google Scholar]
- Belouchrani, A. , Abed‐Meraim K., Cardoso J. F., and Moulines E.. 1997. “A Blind Source Separation Technique Using Second‐Order Statistics.” IEEE Transactions on Signal Processing 45, no. 2: 434–444. 10.1109/78.554307. [DOI] [Google Scholar]
- Blakemore, S. J. , and Choudhury S.. 2006. “Development of the Adolescent Brain: Implications for Executive Function and Social Cognition.” Journal of Child Psychology and Psychiatry 47, no. 3‐4: 296–312. 10.1111/j.1469-7610.2006.01611.x. [DOI] [PubMed] [Google Scholar]
- Blanco‐Ramos, J. , Antón‐Toro L. F., Cadaveira F., Doallo S., Suárez‐Suárez S., and Rodríguez Holguín S.. 2022. “Alcohol‐Related Stimuli Modulate Functional Connectivity During Response Inhibition in Young Binge Drinkers.” Addiction Biology 27, no. 2: e13141. 10.1111/adb.13141. [DOI] [PubMed] [Google Scholar]
- Botvinick, M. , Wang J. X., Dabney W., Miller K. J., and Kurth‐Nelson Z.. 2020. “Deep Reinforcement Learning and Its Neuroscientific Implications.” Neuron 107, no. 4: 603–616. 10.1016/j.neuron.2020.06.014. [DOI] [PubMed] [Google Scholar]
- Brookes, M. J. , Groom M. J., Liuzzi L., et al. 2018. “Altered Temporal Stability in Dynamic Neural Networks Underlies Connectivity Changes in Neurodevelopment.” NeuroImage 174: 563–575. 10.1016/j.neuroimage.2018.03.008. [DOI] [PubMed] [Google Scholar]
- Brookes, M. J. , Woolrich M., Luckhoo H., et al. 2011. “Investigating the Electrophysiological Basis of Resting State Networks Using Magnetoencephalography.” PNAS 108, no. 40: 16783–16788. 10.1073/pnas.1112685108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brumback, T. Y. , Worley M., Nguyen‐Louie T. T., Squeglia L. M., Jacobus J., and Tapert S. F.. 2016. “Neural Predictors of Alcohol Use and Psychopathology Symptoms in Adolescents.” Development and Psychopathology 28, no. 4: 1209–1216. 10.1017/S0954579416000766. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bruña, R. , Maestú F., and Pereda E.. 2018. “Phase Locking Value Revisited: Teaching New Tricks to an Old Dog.” Journal of Neural Engineering 15, no. 5: 056011. 10.1088/1741-2552/aacfe4. [DOI] [PubMed] [Google Scholar]
- Bzdok, D. , and Meyer‐Lindenberg A.. 2018. “Machine Learning for Precision Psychiatry: Opportunities and Challenges.” Biological Psychiatry: Cognitive Neuroscience and Neuroimaging 3, no. 3: 223–230. 10.1016/j.bpsc.2017.11.007. [DOI] [PubMed] [Google Scholar]
- Caballero, A. , Granberg R., and Tseng K. Y.. 2016. “Mechanisms Contributing to Prefrontal Cortex Maturation During Adolescence.” Neuroscience and Biobehavioral Reviews 70: 4–12. 10.1016/j.neubiorev.2016.05.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Caliński, T. , and Harabasz J.. 1974. “A Dendrite Method for Cluster Analysis.” Communications in Statistics 3: 1–27. 10.1080/03610927408827101. [DOI] [Google Scholar]
- Casey, B. J. , Jones R. M., and Hare T. A.. 2008. “The Adolescent Brain.” Annals of the New York Academy of Sciences 1124: 111–126. 10.1196/annals.1440.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen, H. , and Lasek A. W.. 2020. “Perineuronal Nets in the Insula Regulate Aversion‐Resistant Alcohol Drinking.” Addiction Biology 25, no. 6: e12821. 10.1111/adb.12821. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Correas, A. , Cuesta P., Lopez‐Caneda E., et al. 2016. “Functional and Structural Brain Connectivity of Young Binge Drinkers: A Follow‐Up Study.” Scientific Reports 6: 31293. 10.1038/srep31293. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Courtney, K. E. , and Polich J.. 2009. “Binge Drinking in Young Adults: Data, Definitions, and Determinants.” Psychological Bulletin 135, no. 1: 142–156. 10.1037/a0014414. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cover, T. , and Thomas J.. 1991. Elements of Information Theory. Hoboken, NJ: Wiley. [Google Scholar]
- Crane, N. A. , Gorka S. M., Phan K. L., and Childs E.. 2018. “Amygdala‐Orbitofrontal Functional Connectivity Mediates the Relationship Between Sensation Seeking and Alcohol Use Among Binge‐Drinking Adults.” Drug and Alcohol Dependence 192: 208–214. 10.1016/j.drugalcdep.2018.07.044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Crews, F. , He J., and Hodge C.. 2007. “Adolescent Cortical Development: A Critical Period of Vulnerability for Addiction.” Pharmacology Biochemistry and Behavior 86, no. 2: 189–199. 10.1016/j.pbb.2006.12.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Davies, D. L. , and Bouldin D. W.. 1979. “A Cluster Separation Measure.” IEEE Transactions on Pattern Analysis and Machine Intelligence 1, no. 2: 224–227. 10.1109/TPAMI.1979.4766909. [DOI] [PubMed] [Google Scholar]
- Del Cerro‐León, A. , Fernando Antón‐Toro L., Shpakivska‐Bilan D., et al. 2024. “Adolescent Alcohol Consumption Predicted by Differences in Electrophysiological Functional Connectivity and Neuroanatomy.” Proceedings of the National Academy of Sciences of the United States of America 121, no. 42: e2320805121. 10.1073/pnas.2320805121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Doallo, S. , Cadaveira F., Corral M., Mota N., Lopez‐Caneda E., and Holguín S. R.. 2014. “Larger Mid‐Dorsolateral Prefrontal Gray Matter Volume in Young Binge Drinkers Revealed by Voxel‐Based Morphometry.” PLoS ONE 9, no. 5: e96380. 10.1371/journal.pone.0096380. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Drysdale, A. T. , Grosenick L., Downar J., et al. 2017. “Resting‐State Connectivity Biomarkers Define Neurophysiological Subtypes of Depression.” Nature Medicine 23, no. 1: 28–38. 10.1038/nm.4246. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eshaghi, A. , Young A. L., Wijeratne P. A., et al. 2021. “Identifying Multiple Sclerosis Subtypes Using Unsupervised Machine Learning and MRI Data.” Nature Communications 12, no. 1: 2078. 10.1038/s41467-021-22265-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Faghri, F. , Brunn F., Dadu A., et al. 2022. “Identifying and Predicting Amyotrophic Lateral Sclerosis Clinical Subgroups: A Population‐Based Machine‐Learning Study.” Lancet Digital Health 4, no. 5: e359–e369. 10.1016/S2589-7500(21)00274-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ghomroudi, P. A. , Scaltritti M., and Grecucci A.. 2023. “Decoding Reappraisal and Suppression From Neural Circuits: A Combined Supervised and Unsupervised Machine Learning Approach.” Cognitive, Affective & Behavioral Neuroscience 23: 1095–1112. 10.3758/s13415-023-01076-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gil‐Hernandez, S. , and Garcia‐Moreno L. M.. 2016. “Executive Performance and Dysexecutive Symptoms in Binge Drinking Adolescents.” Alcohol 51: 79–87. 10.1016/j.alcohol.2016.01.003. [DOI] [PubMed] [Google Scholar]
- Glaser, J. I. , Benjamin A. S., Farhoodi R., and Kording K. P.. 2019. “The Roles of Supervised Machine Learning in Systems Neuroscience.” Progress in Neurobiology 175: 126–137. 10.1016/j.pneurobio.2019.01.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Green, R. J. , Meredith L. R., Mewton L., and Squeglia L. M.. 2023. “Adolescent Neurodevelopment Within the Context of Impulsivity and Substance Use.” Current Addiction Reports 10: 166–177. 10.1007/s40429-023-00485-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guerri, C. , and Pascual M.. 2010. “Mechanisms Involved in the Neurotoxic, Cognitive, and Neurobehavioral Effects of Alcohol Consumption During Adolescence.” Alcohol 44, no. 1: 15–26. 10.1016/j.alcohol.2009.10.003. [DOI] [PubMed] [Google Scholar]
- Guillamón, M. C. , Solé A. G., and Farran C.. 1999. “Alcohol Use Disorders Identification Test (AUDIT): Translation and Validation to Catalan and Spanish.” Adicciones 11, no. 4. [Google Scholar]
- Hammer, J. H. , Parent M. C., Spiker D. A., and World Health Organization . 2018. Global Status Report on Alcohol and Health 2018. Geneva: World Health Organization. https://iris.who.int/bitstream/handle/10665/274603/9789241565639‐eng.pdf. [Google Scholar]
- Hunt, B. A. E. , Wong S. M., Vandewouw M. M., Brookes M. J., Dunkley B. T., and Taylor M. J.. 2019. “Spatial and Spectral Trajectories in Typical Neurodevelopment From Childhood to Middle Age.” Network Neuroscience 3, no. 2: 497–520. 10.1162/netn_a_00077. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kung, B. , Chiang M., Perera G., Pritchard M., and Stewart R.. 2022. “Unsupervised Machine Learning to Identify Depressive Subtypes.” Healthcare Informatics Research 28, no. 3: 256–266. 10.4258/hir.2022.28.3.256. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lachaux, J. P. , Rodriguez E., Martinerie J., and Varela F. J.. 1999. “Measuring Phase Synchrony in Brain Signals.” Human Brain Mapping 8, no. 4: 194–208. . [DOI] [PMC free article] [PubMed] [Google Scholar]
- Landolfi, A. , Ricciardi C., Donisi L., et al. 2021. “Machine Learning Approaches in Parkinson's Disease.” Current Medicinal Chemistry 28, no. 32: 6548–6568. 10.2174/0929867328999210111211420. [DOI] [PubMed] [Google Scholar]
- Längkvist, M. , Karlsson L., and Loutfi A.. 2012. “Sleep Stage Classification Using Unsupervised Feature Learning.” Advances in Artificial Neural Systems 2012: 107046. 10.1155/2012/107046. [DOI] [Google Scholar]
- Matsuo, Y. , LeCun Y., Sahani M., et al. 2022. “Deep Learning, Reinforcement Learning, and World Models.” Neural Networks 152: 267–275. 10.1016/j.neunet.2022.03.037. [DOI] [PubMed] [Google Scholar]
- Meilă, M. 2003. “Comparing Clusterings by the Variation of Information.” In Learning Theory and Kernel Machines. Lecture Notes in Computer Science, edited by Schölkopf B. and Warmuth M. K., vol 2777. Berlin, Heidelberg: Springer. 10.1007/978-3-540-45167-9_14. [DOI] [Google Scholar]
- Molano‐Mazon, M. , Onken A., Piasini E., and Panzeri S.. 2018. “Synthesizing Realistic Neural Population Activity Patterns Using Generative Adversarial Networks.” arXiv. 10.48550/arXiv.1803.00338. [DOI]
- Müllner, D. 2011. “Modern Hierarchical, Agglomerative Clustering Algorithms.” arXiv. 10.48550/arXiv.1109.2378. [DOI]
- Murtagh, F. , and Legendre P.. 2014. “Ward's Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward's Criterion?” Journal of Classification 31: 274–295. 10.1007/s00357-014-9161-z. [DOI] [Google Scholar]
- Nolte, G. 2003. “The Magnetic Lead Field Theorem in the Quasi‐Static Approximation and Its Use for Magnetoencephalography Forward Calculation in Realistic Volume Conductors.” Physics in Medicine and Biology 48, no. 22: 3637–3652. 10.1088/0031-9155/48/22/002. [DOI] [PubMed] [Google Scholar]
- Norman, A. L. , Pulido C., Squeglia L. M., Spadoni A. D., Paulus M. P., and Tapert S. F.. 2011. “Neural Activation During Inhibition Predicts Initiation of Substance Use in Adolescence.” Drug and Alcohol Dependence 119, no. 3: 216–223. 10.1016/j.drugalcdep.2011.06.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Oostenveld, R. , Fries P., Maris E., and Schoffelen J. M.. 2011. “FieldTrip: Open‐Source Software for Advanced Analysis of MEG, EEG, and Invasive Electrophysiological Data.” Computational Intelligence and Neuroscience 2011, no. 1: 156869. 10.1155/2011/156869. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Palacios, G. , Noreña A., and Londero A.. 2020. “Assessing the Heterogeneity of Complaints Related to Tinnitus and Hyperacusis From an Unsupervised Machine Learning Approach: An Exploratory Study.” Audiology and Neurotology 25, no. 4: 174–189. 10.1159/000504741. [DOI] [PubMed] [Google Scholar]
- Ray, J. , Wijesekera L., and Cirstea S.. 2022. “Machine Learning and Clinical Neurophysiology.” Journal of Neurology 269, no. 12: 6678–6684. 10.1007/s00415-022-11283-9. [DOI] [PubMed] [Google Scholar]
- Rousseeuw, P. J. 1987. “Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis.” Journal of Computational and Applied Mathematics 20: 53–65. 10.1016/0377-0427(87)90125-7. [DOI] [Google Scholar]
- Sadaghiani, S. , Brookes M. J., and Baillet S.. 2022. “Connectomics of Human Electrophysiology.” NeuroImage 247: 118788. 10.1016/j.neuroimage.2021.118788. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sasirekha, K. , and Baby P.. 2013. “Agglomerative Hierarchical Clustering Algorithm—A Review.” International Journal of Scientific and Research Publications 3: 3. [Google Scholar]
- Sorella, S. , Vellani V., Siugzdaite R., Feraco P., and Grecucci A.. 2022. “Structural and Functional Brain Networks of Individual Differences in Trait Anger and Anger Control: An Unsupervised Machine Learning Study.” European Journal of Neuroscience 55, no. 2: 510–527. 10.1111/ejn.15537. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sousa, S. S. , Sampaio A., Marques P., Gonçalves Ó. F., and Crego A.. 2017. “Gray Matter Abnormalities in the Inhibitory Circuitry of Young Binge Drinkers: A Voxel‐based Morphometry Study.” Frontiers in Psychology 8: 1–8. 10.3389/fpsyg.2017.01567. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sousa, S. S. , Sampaio A., Marques P., López‐Caneda E., Gonçalves Ó. F., and Crego A.. 2019. “Functional and Structural Connectivity of the Executive Control Network in College Binge Drinkers.” Addictive Behaviors 99: 106009. 10.1016/j.addbeh.2019.05.033. [DOI] [PubMed] [Google Scholar]
- Spear, L. P. 2000. “The Adolescent Brain and Age‐Related Behavioral Manifestations.” Neuroscience and Biobehavioral Reviews 24, no. 4: 417–463. 10.1016/s0149-7634(00)00014-2. [DOI] [PubMed] [Google Scholar]
- Squeglia, L. M. , Ball T. M., Jacobus J., et al. 2017. “Neural Predictors of Initiating Alcohol Use During Adolescence.” American Journal of Psychiatry 174, no. 2: 172–185. 10.1176/appi.ajp.2016.15121587. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Taulu, S. , and Hari R.. 2009. “Removal of Magnetoencephalographic Artifacts With Temporal Signal‐Space Separation: Demonstration With Single‐Trial Auditory‐Evoked Responses.” Human Brain Mapping 30, no. 5: 1524–1534. 10.1002/hbm.20627. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tzourio‐Mazoyer, N. , Landeau B., Papathanassiou D., et al. 2002. “Automated Anatomical Labelling of Activations in SPM Using a Macroscopic Anatomical Parcellation of the MNI MRI Single‐Subject Brain.” NeuroImage 15, no. 1: 273–289. 10.1006/nimg.2001.0978. [DOI] [PubMed] [Google Scholar]
- Ullman, S. 2019. “Using Neuroscience to Develop Artificial Intelligence.” Science 363, no. 6428: 692–693. 10.1126/science.aau6595. [DOI] [PubMed] [Google Scholar]
- Van Veen, B. D. , Van Drongelen W., Yuchtman M., and Suzuki A.. 1997. “Localization of Brain Electrical Activity via Linearly Constrained Minimum Variance Spatial Filtering.” IEEE Transactions on Bio‐Medical Engineering 44, no. 9: 867–880. 10.1109/10.623056. [DOI] [PubMed] [Google Scholar]
- Vu, M. T. , Adalı T., Ba D., et al. 2018. “A Shared Vision for Machine Learning in Neuroscience.” Journal of Neuroscience 38, no. 7: 1601–1607. 10.1523/JNEUROSCI.0508-17.2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ward J. H. Jr. 1963. “Hierarchical Grouping to Optimize an Objective Function.” Journal of the American Statistical Association 58, no. 301: 236–244. 10.1080/01621459.1963.10500845. [DOI] [Google Scholar]
- Wetherill, R. R. , Squeglia L. M., Yang T. T., and Tapert S. F.. 2013. “A Longitudinal Examination of Adolescent Response Inhibition: Neural Differences Before and After the Initiation of Heavy Drinking.” Psychopharmacology 230, no. 4: 663–671. 10.1007/s00213-013-3198-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Figure S1. Distribution of alcohol consumption (Standard Alcohol Units, SAUs) throughout the population, represented in a histogram. In the X axis, the value of SAUs in the population are represented. In the Y axis, the number of participants in each group is represented.
Figure S2. Graphs of correlations, broken down for all areas of the AAL atlas, between electrophysiological variables (power spectrum and strength) and consumption (SAUs) for the four frequency bands of interest (A: alpha, B: beta, C: gamma, and D: theta). For each, on the left are the correlation graphs for Cluster 1, with: power‐UBEs graph (in blue), strength‐UBEs graph (in green), and the graph showing areas of overlap between both variables. On the right, the correlation graphs for Cluster 2, with: power‐UBEs graph (in blue), strength‐UBEs graph (in green), and the graph showing areas of overlap between both variables. For each of the graphs, beneath the title, the mean rho is displayed, along with its standard deviation.
Figure S3. Visualization of clusterization of random data compared to visualization of real electrophysiological data (theta, alpha, beta, and gamma band, respectively). (A) Visualization of data from Cluster 1. (B) Visualization of data from Cluster 2. (C) Dendrogram and cohesion/coherence parameters of the random data clustering. In blue, the parameters of the first cluster; in red, the parameters of the second cluster.
Figure S4. Representation of different clusterization, based on specific linkage criterion, of gamma frequency band data, visualizing the random or plausible patterns generated by the different criterion. (A) Simple and geometric criterions of linkage; this type of criteria generate random‐like clusterization. (B) Complex criteria, like Ward's minimum variance (the one used on the study). (C) Weighted linkage criterion. While producing Ward‐like clusterization, generates a smoothing of the data, reducing the information given by the clusterization.
Table S1. Sample demographics. Number of participants belonging to each sex, with percentage relative to the total sample; number of participants with country of birth in Spain, and with country of birth other than Spain, with percentage relative to the total sample; number of participants with mother tongue Spanish and with mother tongue other than Spanish, with percentage relative to the total sample; mean age, in years (with standard deviation); mean weight, in kilograms (with standard deviation); mean height, in centimeters (with standard deviation); mean study time, in hours, self‐reported by the participant (with standard deviation); mean level of parental and maternal education (with standard deviation), from 1 to 4, where 1 represents basic levels of education, and 4 represents advanced levels of education; mean health status (with standard deviation), self‐reported by the participants, from 1 to 9, where 1 represents low levels of health, and 9 represents ideal health status; and main scores of the neuropsychological tests run throughout the study, with sensation‐seeking tests (SSS‐V), executive function tests (BDEFS‐20, BRIEF‐SR, DEX) and impulsivity tests (BIS‐11) (reported values are from the total scores of the tests, not sub‐scores of said tests).
Table S2. Metrics of efficiency for the unsupervised machine learning model for each frequency band. For each band, the used metrics were: within‐cluster sum of squares (WCSS), silhouette score, Calinski‐Harabasz index, and Davies‐Bouldin index. Highlighted in grey, for each frequency band and metric, is the most efficient k number of groups.
Data Availability Statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation. The source code of the methods are available in https://scikit‐learn.org/stable/modules/classes.html#module‐sklearn.metrics and https://docs.scipy.org/doc/scipy/index.html, being both public, open access libraries.