Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Jul 21.
Published in final edited form as: Cell. 2024 Dec 19;188(2):515–529.e15. doi: 10.1016/j.cell.2024.11.012

Digital phenotyping from wearables using AI characterizes psychiatric disorders and identifies genetic associations

Jason J Liu 1,2,11, Beatrice Borsari 1,2,11, Yunyang Li 3,12, Susanna X Liu 1,2,12, Yuan Gao 1,2, Xin Xin 1,2, Shaoke Lou 1,2, Matthew Jensen 1,2, Diego Garrido-Martín 4, Terril L Verplaetse 5, Garrett Ash 6,7,8, Jing Zhang 9, Matthew J Girgenti 5, Walter Roberts 5,8,*, Mark Gerstein 1,2,3,8,10,13,*
PMCID: PMC12278733  NIHMSID: NIHMS2042111  PMID: 39706190

Summary

Psychiatric disorders are influenced by genetic and environmental factors. However, their study is hindered by limitations on precisely characterizing human behavior. New technologies such as wearable sensors show promise in surmounting these limitations in that they measure heterogeneous behavior in a quantitative and unbiased fashion. Here, we analyze wearable and genetic data from the Adolescent Brain Cognitive Development (ABCD) study. Leveraging >250 wearable-derived features as digital phenotypes, we show that an interpretable AI framework can objectively classify adolescents with psychiatric disorders more accurately than previously possible. To relate digital phenotypes to the underlying genetics, we show how they can be employed in univariate and multivariate GWAS. Doing so, we identify 16 significant genetic loci and 37 psychiatric-associated genes, including ELFN1 and ADORA3, demonstrating that continuous, wearable-derived features give greater detection power than traditional, case-control GWAS. Overall, we show how wearable technology can help uncover new linkages between behavior and genetics.

In brief

Complex disorders require precise strategies for their characterization. AI-based digital phenotypes from biosensors can be used to predict psychiatric disorders and identify GWAS loci.

Graphical Abstract

graphic file with name nihms-2042111-f0006.jpg

Introduction

Psychiatric disorders of childhood and adolescence currently affect 1 in 7 youths in the United States and globally.1,2 Externalizing disorders such as attention-deficit/hyperactivity disorder (ADHD), and internalizing disorders such as anxiety, are among the most prevalent and represent a wide spectrum of dysfunctional behavior patterns.3 Treatment barriers are complex and multifaceted but major contributors include our limited understanding of psychiatric phenotypes and difficulty identifying youth individuals that experience these disorders.

Traditionally, psychiatric disorders have been conceptualized as categorical macrophenotypes based on clinical manifestations of a disease, which are defined according to the number and type of symptoms and the presence of distress or impairment.46 While this has practical benefits in terms of reliability and ease of diagnosis, it poses several challenges to the research of these disorders and, consequently, to the development of treatments. Furthermore, given the high heritability of psychiatric disorders, dissecting their underlying genetic architecture is of interest to researchers.710 While cost-effective and accurate genotyping technologies in large cohorts of individuals have significantly advanced the field, barriers associated with missing heritability and the need for improved phenotyping strategies are still present.1013 In fact, many psychiatric genome-wide association studies (GWASs) to date rely on subjective, dichotomized (i.e., binary) traits.1418 However, psychiatric disorders are complex and often comorbid, and this high degree of heterogeneity is not always accurately translated into categorical diagnostic labels, which may be defined by arbitrary cut-offs. Digital phenotypes derived from biosensor data can address these challenges by more precisely capturing an underlying macrophenotype and better describing the heterogeneity potentially missed by existing diagnostic categories.12,19,20 Therefore, when compared to macrophenotypes, the quantitative nature of these digital phenotypes could enable improved dissection of the genetic architecture underlying psychiatric disorders.21

To improve our understanding of psychiatric disorders, it is important that we identify digital phenotypes that not only offer a more comprehensive representation of an individual’s behavior with respect to the environment but also relate well with existing clinical definitions and aid in diagnosis. Once identified, these digital phenotypes can also be used to guide more comprehensive studies to identify genetic associations and biomarkers that may ultimately improve precision treatments.12,22

To achieve this goal, it is important to leverage new emerging technologies that can quantitatively assess an individual’s behavioral patterns.23 Wearable sensors such as smartwatches collect data that reflect physical and physiological processes (e.g., movement, pulse, metabolic intake), and can be used to infer higher-order behavioral events (e.g., sleep, exercise) and their temporal dynamics. Because of the documented relationship between such higher-order behavioral events and mental health, and given their low cost and minimal invasiveness, wearable devices have emerged as promising tools for mental health monitoring and psychiatric evaluation.2426 Indeed, there is a rich literature leveraging wearable sensors for the characterization, detection, and even treatment of childhood and adolescent disorders, such as common externalizing (e.g., ADHD) and internalizing (e.g., anxiety disorders) disorders.27

Prior studies have demonstrated the utility of wearables for detecting key behavioral and physiological traits in youths with psychiatric disorders.27 For example, markers of physical fitness measured by these devices have been associated with psychopathology and may offer insights into treatment interventions.28,29 Consequently, wearable biosensors show promise for capturing digital phenotypes relevant to behavior and psychiatric disorders, ultimately enabling improved GWASs. However, significant computational challenges remain in using AI to generate digital phenotypes that leverage the temporal nature of the raw wearable data and describe the full spectrum of a given psychiatric disorder.27 Moreover, further curation of these digital phenotypes is necessary to identify genetic associations that have clinical and biological relevance.

To address these limitations, we developed an AI modeling framework that flexibly leverages data from wearable devices to generate digital phenotypes in the form of static and dynamic digital features. We establish the validity of these features as digital phenotypes by classifying externalizing and internalizing disorders with an accuracy beyond baseline expectation, and even surpassing the performance of some other gold-standard digital phenotypes, such as functional magnetic resonance imaging (fMRI) measurements.3033 Interpretability modules in our AI framework enable us to identify key temporal and physiological insights between clinical diagnosis and wearable-derived features, further supporting their validity as digital phenotypes. We further curate these digital phenotypes and employ them in GWAS models to identify genetic associations and biomarkers that capture the continuous spectrum of psychiatric disorders and behavioral patterns. Finally, we identify 16 significant loci, several of which overlap previously reported genetic variants associated with behavioral traits and mental illnesses and are proximal to genes with a documented role in neurodevelopmental and psychiatric disorders.

In sum, this work shows how wearable devices can advance our understanding of psychiatric disorders by establishing a more objective and dimensional approach that can ultimately lead to improved treatments in precision psychiatry.

Results

Leveraging the Adolescent Brain Cognitive Development cohort

To improve our understanding of psychiatric disorders, we leveraged and analyzed a dataset from a cohort of US adolescents recruited by the NIH Adolescent Brain Cognitive Development Consortium (ABCD) project, consisting of clinical, wearable, and genetic data (Figure 1 and Figure S1A and Data S1; STAR MethodsDataset Description” section).34 The ABCD cohort consists of a total of 11,878 adolescents (5,682 males and 6,196 females), of age between nine and fourteen years and belonging to four different ethnicities. We identified nine categories of psychiatric phenotypes (Table S1), which were established using a gold standard parent diagnostic semi-structured interview (Kiddie Schedule for Affective Disorders and Schizophrenia-5).35 The healthy controls represented adolescents who did not meet the criteria for any of those nine psychiatric disorders. We defined these clinical labels as the categorical macrophenotypes in the study (Figure 1A and Figure 1B). Our modeling framework also included extensive covariates typical of psychiatric studies, such as demographics, cognitive tests (e.g., NIH Toolbox), and behavioral checklists (Figure 2A and Data S2 and Data S3-S7).

Figure 1. Leveraging clinical, digital, and genetic data of the ABCD cohort to improve characterization of psychiatric disorders.

Figure 1.

A) Framework schematic describing how digital phenotypes from wearable-derived data are leveraged to better understand the association between macrophenotype and genotype. The link between digital phenotype and macrophenotype serves as construct validity and aid in diagnostics. Wearable GWAS is performed through genotype-to-digital-phenotype association studies.

B) The Adolescent Brain Cognitive Development (ABCD) cohort contains 11,878 individuals spanning nine different categorical macrophenotypes based on clinical diagnosis from the Kiddie Schedule for Affective Disorders and Schizophrenia-5. A breakdown of the counts of each disorder is shown in the bottom bar graph, with anxiety disorder and ADHD being the most prevalent. “Bipolar” refers to bipolar or psychotic disorders. (Details in Table S1.)

C) Digital data from FitBit biosensors are collected for 5,339 individuals. The collected time series data are then processed into dynamic and static features, with information spanning various physiological and higher-order processes.

D) Genetic data are collected by the ABCD consortium through Smokescreen genotyping array. Imputed genotypes are used for downstream GWAS analyses. The genotype arrays are subjected to best-practice processing and QC to ensure included individuals and SNPs are of high quality. PCA performed on 8,791 individuals and 157,556 genotyped SNPs reveals distinct ancestral clusters across the cohort and the inferred genotype principal components (PCs) are used as covariates in downstream analyses. (Details in Data S31-S34.)

See also Figure S1.

Figure 2. Workflow for data processing, feature engineering, and model architecture.

Figure 2.

A) ABCD cohort metadata including various demographic features, cognitive test scores, and clinical characteristics are used as covariates and represent the input features used in our baseline comparison model. Features shown in this plot correspond to the filtered set of individuals with wearable data. (Details in Data S1-S5.)

B) Digital data collected by wearable biosensors are used to generate dynamic features after signal processing and imputation steps. Together with the processed covariates, these time series features represent the input features for the dynamic model. (Details in Data S10-S11 and Data S14.)

C) Summary statistics applied to digital data collected by wearables are used to generate a total of 258 static features. In addition to the covariates, these are the input features used in the static model. The static model leverages the machine learning framework, XGBoost, for downstream tasks such as wearable combination score generation and classification. (Details in Data S9 and Table S2.)

D) Hierarchical clustering of the static features yields seven distinct physiological clusters of wearable data. (Details in Data S12-S13.)

E) The dynamic model is based on the Xception deep learning framework, and uses the generated 48 channels from the dynamic features and covariates as input into a convolution-like model. The architecture consists of six inception layers and residual connections. Global average pooling and a fully connected layer allow for similar downstream tasks as mentioned in C).

See also Figure S2.

Generating digital phenotypes from wearable-derived data

We processed data obtained from FitBit smartwatches, which comprise measurements of heart rate, calories, activity intensity, steps, metabolic equivalents (METs), sleep level and sleep intensity (Figure 1C and Figure S1B and Data S8-S9; STAR MethodsDataset Description” section). These measurements quantify an individual’s physiological processes and their real-time changes in response to environmental stimuli, and can thus provide key information about an individual’s behavior.

To reconstruct the full spectrum of an individual’s behavioral functioning from these data, we applied two different feature engineering techniques, allowing us to generate wearable-derived dynamic and static features, which we consider as digital phenotypes. The dynamic features preserve the time-varying nature of the original data as a time series, enabling sequential and temporal patterns of the data to be retained. In contrast, the static features summarize patterns of the digital data and produce time-invariant, quantitative features that are commonly used in downstream modeling.25,36

To generate dynamic features, we performed signal imputation and processing after filtering the individuals with sparse data, and obtained 48 channels of time series (Figure 2B and Figure S2A and Data S10-11; STAR MethodsDataset Description” section). Compared to the static features, this further processing allowed us to preserve both local and global temporal patterns potentially relevant to characterizing behavior and neurological response to stimuli.20,37

To generate static features, we first collected a total of 49 FitBit summary-based features (Data S8-S9; STAR MethodsMachine Learning Classifier” section). We next applied descriptive statistics (e.g. mean, median, etc.) to each of these features and generated a total of 258 static features for each individual (Figure 2C and Table S2).25,38 We then grouped these static features into seven main clusters, each of which summarizes different aspects of physiological and behavioral processes, such as heart rate, sleep duration and quality, metabolic intake or physical activity (Figure 2D and Data S12-S13; STAR MethodsMachine Learning Classifier” section).

Altogether, static and dynamic features represent the physiological and behavioral profiles of the adolescents, and can be leveraged as digital phenotypes in a wide range of analyses to better characterize psychiatric phenotypes. For instance, we generated wearable combination scores, performed macrophenotype classification, and assessed model interpretability. In particular, the wearable combination scores integrate our digital features using non-linear models to predict the macrophenotype (Figure 2E; STAR MethodsModel Training and Evaluation” section). Practically, these scores summarize the likelihood that an individual has a given disorder and further enable biomarker identification via wearable GWAS.

Classifying psychiatric macrophenotypes using wearable-derived digital phenotypes

To demonstrate the validity of static and dynamic features as clinically relevant digital phenotypes and to evaluate their utility as a diagnostic tool, we employed these features in an array of classification tasks to identify individuals with either an externalizing (ADHD) or internalizing (anxiety) disorder from their typically developing peers. We selected ADHD and anxiety due to their high prevalence in adolescents, which is mirrored in the cohort (Figure 1B and Table S1).39

We applied a gradient boosting machine learning algorithm, XGBoost, for classification tasks using static features (Figure 2C and Figure 2D; STAR MethodsMachine Learning Classifier” section).40 On the other hand, to fully leverage the time series nature of the dynamic features, we used a convolutional neural network for time series, featuring depthwise separable convolution, called Xception (Figure 2E and Figure S2B; STAR MethodsMultichannel Time Series Classifier” section).41 Variable convolutional filters and residual (skip) connections, coupled with efficient parametrization, allow short and long time series patterns of physiology and behavior to be optimally leveraged when performing downstream classification of psychiatric disorders. In both modeling approaches we also considered our full list of covariates (Figure 2A and Figure S2A and Table S2 and Data S14-S17, S40-S44, S50). To assess the benefit, in terms of model performance, of including wearable-derived data, we also trained a baseline model using just the covariates, which served as a comparison to the models including static or dynamic features (Data S18-S19). In practice, this comparison allowed us to determine whether wearable-derived features can improve diagnostic accuracy relative to that achievable using only a widely used broadband behavior rating scale.

After data filtering, we first used static features to classify 216 individuals with ADHD (an externalizing disorder) versus 1,737 of their typically developing peers (healthy controls) (Figure 3A and Figure S3 and Data S20-S22; STAR MethodsModel Training and Evaluation” section). Using static features with XGBoost, we achieved an average area under the receiver operating characteristic curve (AUROC) of 0.87 and precision of 0.79. When using the dynamic features and Xception, we were able to achieve an average AUROC of 0.89 and precision of 0.83. The baseline model consisting of only the covariates achieved an average AUROC of 0.83, suggesting that the inclusion of wearable-derived features facilitates a clinically meaningful improvement in diagnostic accuracy. This improvement between the baseline and dynamic features model demonstrates statistical significance (one-sided t-test between baseline model and dynamic features model, p value = 0.0022).

Figure 3. Performance and interpretability of psychiatric phenotype classification models.

Figure 3.

A-B) Model performance for baseline, static, and dynamic models employed for classifying individuals with ADHD (blue, top) or individuals with anxiety disorder (purple, bottom) versus healthy controls. P values were calculated using one-sided t-test. (Details in Data S20-S23.)

C-D) Feature importance based on ablation studies for the dynamic model for ADHD (blue, top) and anxiety disorder (purple, bottom) classification. Wearable-derived dynamic features are shown in red font and clinical features (covariates) are shown in black font. Feature importance is equivalent to the decrease in model performance (AUROC) after removal of the given feature. (Details in Data S24-S28.)

E-F) Temporal importance during a 48-hour period for dynamic features in ADHD (blue, top) or anxiety disorder (purple, bottom) classification based on the GRAD-CAM interpretability module. Importance is represented as the GRAD-CAM score, based on each time point’s contribution towards model performance. (Details in Data S29-S30.)

See also Figure S3.

Second, we evaluated the performance of our model using static or dynamic features in the classification of 666 individuals diagnosed with anxiety disorder (internalizing disorder) versus 1,737 of their typically developing peers (healthy controls) (Figure 3B and Data S22-S23; STAR MethodsModel Training and Evaluation” section). Here, we again repeated the use of the same modeling framework, i.e., static features with XGBoost and dynamic features with Xception, and compared it to the baseline covariate model. We found that static and dynamic features achieve an average AUROC of 0.69 and 0.71 and precision of 0.64 and 0.68, respectively. In both models, the performance was greater than that of the baseline model (average AUROC of 0.67), with the dynamic features model showing the largest and most significant improvement in performance (one-sided t-test between baseline model and dynamic feature model, p value = 0.00016). Overall, the fact that the models using dynamic features achieved the highest performance suggests that the temporal patterns intrinsic to wearable-derived data are useful towards understanding human behavior.

Interpreting wearable features prioritized by the deep learning model

Deep learning methods are typically characterized by complex internal structures that cannot be easily interpreted by humans. While maximizing the classification accuracy is one crucial aspect for characterizing complex phenotypes, it is also critical to understand which features are most important for the classification task. To this end, we utilized ablation techniques to determine the relative contribution of each individual feature to model performance (Figure 3C3D and Data S24-S28; STAR MethodsModel Interpretability” section). For the ADHD classification task, heart rate was the most important feature (largest change in AUROC), followed by other dynamic features (i.e., sleep, steps, METs) as well as covariates such as demographics, family history, and cognitive scores from picture memory and stop-signal reaction time tests (Figure 3C and Data S24-S27).

On the other hand, the ablation study for the anxiety classification task revealed a different set of important features. In this case, sleep quality and stage, calories, and step count were the most important dynamic features, whereas heart rate features, which were extremely important for classifying ADHD, were not prioritized in the anxiety model (Figure 3D and Data S28). Additionally, while the anxiety model prioritized some covariates that were relevant also for the ADHD model (e.g., sex, family history, and family divorce), cognitive scores from tests such as picture memory did not appear to be important for the identification of individuals diagnosed with anxiety, consistent with theory-driven accounts of neurocognitive aspects of anxiety disorders.42

Additionally, we assessed the importance that dynamic features at various timepoints of the day have on model performance. Specifically, we calculated the relative importance of each time point using gradient-weighted class activation mapping (Grad-CAM) and ablation techniques (Figure 3E-F and Data S29-S30; STAR MethodsModel Interpretability” section).43 For ADHD, we observed enriched significance of the heart rate dynamic feature around the early afternoon, potentially suggesting stronger behavioral differences between adolescents with ADHD and their typically developing peers (healthy controls) during this time of day (Figure 3E and Data S29). This is consistent with clinical research demonstrating time-of-day effects on ADHD symptom expression.44 In contrast, sleep-related dynamic features during the night are much more informative in classifying anxiety, consistent with clinical expectations (Figure 3F and Data S30).45 Together, these results suggest a role for wearable-derived features to not only serve as digital phenotypes, but also to more closely reveal insights into the behavioral and physiological temporal patterns related to categorical macrophenotypes.

Using wearable-derived features as digital phenotypes in GWAS for ADHD

Our AI modeling framework used wearable-derived features as predictors (independent variables) of psychiatric macrophenotypes. Our accurate predictions suggest that these quantitative features could be useful for studying other aspects of psychiatric disorders, such as their underlying genetic architecture. Therefore, we leveraged these features as digital phenotypes (dependent/response variable) in the following GWAS to identify genetic associations relevant to psychiatric conditions (Figure 1D, Figure 4 and Data S31-S36; STAR MethodsQuality Control of Genetic Data” and “Covariates included in the GWAS” sections). We focused specifically on ADHD (STAR MethodsGWASs for ADHD” section), given the higher predictive power observed with our models (Figures 3A3B) and its higher estimated heritability compared to anxiety (>75% vs. 30–60%).8,9 We selected 1,191 individuals (137 individuals with ADHD and 1,054 healthy control individuals) with genetic and wearable data available. We first treated the clusters of wearable-derived features as a digital phenotype (i.e., the response variable) and assessed whether genetic variants impact these features differently in ADHD vs. control individuals. Specifically, we performed a continuous multivariate GWAS regressing the vector of wearable features on the genotype, the covariates, and an interaction term between the genotype and an individual’s macrophenotype (ADHD or control) (Figure S4A).46 By including this interaction term, we were able to identify genetic variants that not only have a significant impact on the digital phenotype, but are also relevant to ADHD. We identified two genome-wide significant (p value < 5·10−8) loci and six psychiatry-related genes (Figure 4B and Table 1 and Table S3 and Data S37-S39). One of these loci is located within a cluster of genes relevant for ADHD (ELOVL5, FBXO9, CILK1).47,48 Following the continuous multivariate GWAS, we performed a post-hoc test to determine which of the wearable features within a cluster drives the significant association (Figure S4B and Table S2). We found that individuals with ADHD carrying the CC genotype at rs186003 (chr6:53,320,326) reported lower amounts of sedentary time compared to ADHD individuals carrying the AA genotype (Figure 4B). This difference was not observed among the three genotype groups in control individuals, suggesting that the effect of the genotype on the wearable feature is specific to the ADHD condition. Overall, this hierarchical testing strategy (i.e., multivariate GWAS followed by a post-hoc test) allowed us to reduce the multiple-testing correction burden, resulting in increased statistical power.

Figure 4. Manhattan plots summarizing the results of multivariate and univariate GWAS for ADHD.

Figure 4.

A) Left panel: Schematic describing for a given SNP the frequency of healthy controls or individuals with ADHD for each genotype. Right panel: Resulting Manhattan plot from a case-control GWAS on 1,191 individuals from the ABCD cohort. We employed the clinical diagnosis label as the binary univariate response variable for the GWAS (nADHD = 137, nControl = 1,054). No genetic variants passed the genome-wide significance threshold (p value < 5·10−8; blue line). Genetic variants with a suggestive p value (< 10−5) are represented as green dots. In all panels, proximal genes related to ADHD are highlighted in dark blue, and genes related to other psychiatric disorders are highlighted in pink (evidence obtained from OpenTargets). Brain-related traits associated with genetic variants overlapping the genome-wide significant loci are highlighted in orange. GWAS associations were obtained from the EBI-NHGRI GWAS catalog. A detailed list of genome-wide significant loci for all panels is provided in Table 1, Table S3 and Table S4. In this figure, we only show results related to autosomal chromosomes. (Details in Data S50-S52.)

B) Left panel: Schematic describing for a given SNP, the relationship between a multivariate set of n wearable-derived features (dependent/response variable) and an interaction term represented by the genotype and the disorder status of the individuals (independent/predictor variable). Right: Resulting Manhattan plot using clusters of wearable-derived features as the multivariate response variable in a GWAS that encodes the interaction term genotype:disorder (where disorder is a binary feature such as 0 = Control, 1 = ADHD; gxm). The GWAS was performed on the same set of 1,191 individuals as in panel A. We identified 2 and 174 loci passing the p value thresholds of 5·10−8 and 1·10−5, respectively. Locus chr6:53,240,429–53,356,412 is proximal to genes CILK1, ELOVL5, FBXO9 (highlighted in dark blue), which have been associated with ADHD previously. The inset panel shows that individuals with ADHD exhibit different levels of residualized (i.e., covariate-adjusted) sedentary time (maximum) depending on the genotype at lead variant rs186003 (chr6:53,320,326). In contrast, healthy control individuals show no difference among genotype groups (***: p < 0.001, **: 0.001 ≤ p < 0.01, *: 0.01 ≤ p < 0.05, ns: p ≥ 0.05; two-sided Wilcoxon Rank-Sum test) (Details in Data S38-S39 and Table S2).

C) Left: Schematic showing the relationship between the wearable combination score (dependent/response variable) and genotype (independent/predictor variable). Right: Resulting Manhattan plot using the wearable combination scores (trained on classification of individuals with ADHD) as the response variable in a GWAS for ADHD. The GWAS was performed on the same set of 1,191 individuals as in panel A. We identified 10 and 414 loci passing the p value thresholds of 5·10−8 and 1·10−5, respectively. Loci chr1:111,372,165–111,482,359, chr17:7,101,607–7,101,608, and chr17:32,256,997–32,283,356 are proximal to genes ADORA3 (72 Kb), DLG4 (86 Kb) and PSMD11 (174 Kb) (highlighted in dark blue) respectively, which have been previously associated with ADHD.

See also Figure S4.

Table 1.

Results for the 16 genetic loci identified by the continuous univariate and multivariate GWASs.

Locus Chr Start End Lead Variant Position P value Genes GWAS Method Phenotype
1 1 111,372,165 111,482,359 rs114081965 111,372,166 1.11E-08 TMIGD3, ADORA3, RAP1A, CHI3L2 Continuous Univar. ADHD
2 3 161,873,055 161,927,820 rs79203233 161,909,261 3.89E-08 -
3 4 184,417,766 184,424,056 rs1425551 184,421,904 2.15E-08 IRF2, CASP3, PRIMPOL
4 10 121,524,611 121,582,200 rs140794722 121,524,612 2.31E-09 FGFR2
5 11 38,982,793 39,384,610 rs151239852 39,273,497 3.80E-08 -
6 14 26,214,683 26,216,532 rs149074469 26,216,532 2.95E-08 NOVA1
7 14 62,903,677 63,014,798 rs143225169 63,014,798 3.89E-08 KCNH5, RHOJ
8 17 7,101,607 7,101,608 rs11653054 7,101,608 8.51E-09 CLEC10A, DLG4
9 17 32,256, 997 32,283,356 rs6505293 32,270,863 1.12E-08 RHBDL3, RHOT1, C17orf75, ZNF207, PSMD11, LRRC37B, CDK5R1, MYO1D
10 19 4,495,610 4,495,611 rs150855276 4,495,611 2.75E-08 -
11 11 103,922,953 104,046,796 rs75092661 104,046,796 1.55E-08 AMY1C Continuous Multivar.
12 6 53,240,429 53,356,412 rs186003 53,320,326 3.46E-08 GCLC, CILK1, ELOVL5, FBXO9, GCM1
13 6 1,102,252 1,130,154 rs742521 1,129,893 5.20E-09 - Continuous Multivar. Behavioral
14 7 1,789,321 1,791,353 rs113525298 1,791,353 5.09E-09 MAD1L1, ELFN1, PSMG3, MAFK
15 14 23,392,601 23,418,974 rs365990 23,392,602 5.33E-09 MYH6, CMTM5, IL25, BCL2L2, BCL2L2-PABPN1
16 16 79,283,253 79,302,474 rs8051625 79,288,217 6.04E-09 WWOX

For each locus we report the genomic coordinates in human assembly GRCh38, the lead variant rsID with corresponding genomic position and p value, the GWAS method (continuous multivariate or continuous univariate), and the phenotype (ADHD or behavioral traits) associated with the GWAS. Brain- or neuropsychiatry-related genes proximal to the locus are also listed in Tables S3-S5.

In addition to using clusters of wearable-derived features as a multivariate response variable for GWAS, we also conducted another type of GWAS using, as a response variable, the wearable combination scores derived from our AI framework, similar to previous genetic studies leveraging ML-derived scores.49,50 These scores combine wearable-derived features into a single continuous variable that summarizes an individual’s likelihood for ADHD. When using these scores in a continuous univariate GWAS, we identified 10 significant loci and 21 psychiatric or brain-related genes (Figure 4C and Table 1 and Table S4 and Data S40-S41). Three of the identified genes (ADORA3, PSMD11 and DLG4) have been previously associated with ADHD, bolstering the overall functional significance of the results.48,51,52 Furthermore, several of these loci overlap with previously reported GWAS SNPs related to ADHD, neuroticism, sleep disruption and other clinically relevant traits (Figure 4C and Data S42-S43). In comparison to either of the above two GWASs, when performing a traditional case‒control (i.e., binary univariate) GWAS for ADHD on the same set of individuals using the binary diagnostic label (presence/absence of disorder) as response variable, we did not identify any significant loci (Figure 1A and Figure 4A). This result is consistent with the higher statistical power of continuous measurements over dichotomized (i.e., binary) traits (Figure S4C; STAR MethodsStatistical Power of Binary vs. Continuous Traits” section).19,49,5355

Employing digital phenotypes to detect genetic associations with behavioral traits

While the GWASs above focus on a particular disorder (ADHD), it is also possible to directly use the full set of wearable-derived features from the pooled set of individuals across all disorders and controls, as a way to represent the continuum of psychiatric disorders and mental states. In fact, these features can collectively capture behavioral patterns by measuring physiological processes and their real-time changes in response to environmental stimuli, and are not restricted to a specific cohort of individuals.56,57 Therefore, we next performed a multivariate GWAS where we regressed the vector of wearable features on the genotype of each genetic variant (Figure 5A and Figure S5A; STAR MethodsContinuous multivariate GWAS for behavioral traits” section), employing a larger cohort spanning healthy controls and individuals with any psychiatric disorder (n = 2,410).46 Similar to the previous multivariate GWAS, we performed a post-hoc test to identify which features within a particular cluster drive the significant genetic association. In this case, we identified four significant loci and ten genes with a documented role in neurodevelopmental and psychiatric disorders (Figure 5A and Table 1 and Table S2 and Table S5 and Data S44-S48; STAR MethodsStatistical Significance and Functional Dissection of GWAS” section). Many of these loci overlap with previously identified GWAS SNPs related to heart and brain traits (Figure 5A and Data S49). This aligns with the close association between physiological functions, the central nervous system, and individual behavior.

Figure 5. Exploring the genetic-physiological-psychiatric axis with wearable GWAS.

Figure 5.

A) Using the 258 wearable-derived static features as a continuous multivariate response variable, the GWAS was performed by pooling a set of 2,410 individuals (both healthy controls and individuals with any disorder). We identified 4 and 198 loci passing the p value thresholds of 5·10−8 and 1·10−5, respectively. A detailed list of genome-wide significant loci is provided in Table 1 and Table S5. Neuropsychiatric-related genes proximal to the identified loci are highlighted in pink. Brain-, and heart-related traits with associated variants overlapping these 4 loci are highlighted in orange.

B) Left panel: rs365990 (chr14:23,392,602, A/G) is located in exon 25 of MYH6 and is associated with changes in wearable-derived heart rate features (multivariate GWAS p value = 5.33E-09). The boxplots show distributions of covariate-adjusted mean and interday coefficient of variation (CV) for heart rate across genotype groups at rs365990 (AA n individuals = 1,228; AG n individuals = 1,509; GG n individuals = 519). p values for each pairwise comparison are also displayed, encoded as follows: ***: p < 0.001, **: 0.001 ≤ p < 0.01, *: 0.01 ≤ p < 0.05, ns: p ≥ 0.05 (two-sided Wilcoxon Rank-Sum test). For visualization purposes, outliers are not shown. Right panel: enrichment, displayed as odds-ratio (log2(OR); y-axis) of the minor allele (G) in individuals with different psychiatric disorders (x-axis) compared to healthy controls. OR estimates and 95% confidence interval (error bar) are displayed. The red horizontal dashed line indicates no enrichment. The G allele is significantly more enriched in individuals with bipolar/psychotic disorder compared to healthy controls (two-sided Fisher test p value: 8.00E-03; FDR-adjusted p value: 7.00E-02).

C) Similar representation for rs113525298 (chr7:1,791,353; AA n individuals = 2,294; AG n individuals = 101; GG n individuals = 15). rs113525298 is located 125 Kb from ELFN1, a gene that encodes for a postsynaptic protein involved in the temporal dynamics of interneuron recruitment.82,83 Elfn1 mutant mice exhibit hyperactivity that is treatable by psychostimulant medication.68,69 The G allele at rs113525298 is associated with increased minimum number of first-out-of-bed minutes and decreased minimum number of total-vigorously-active minutes (multivariate GWAS p value = 5.09E-09), and is significantly more enriched in healthy controls compared to individuals with ADHD (two-sided Fisher test p value: 9.00E-04; FDR-adjusted p value: 6.00E-03). (Details in Data S45-48 and Table S2).

See also Figure S5.

To further investigate the loci identified by the behavioral GWAS, we dissected the variants using a battery of publicly available genomic resources.58,59 Many of these loci overlap either GTEx expression quantitative trait loci (eQTLs) or ENCODE candidate cis-regulatory elements (cCREs), suggesting a link between the biochemical activity of these variants and their functional impact on the macrophenotype (Table S5; STAR MethodsStatistical Significance and Functional Dissection of GWAS” section). We also explored the impact of these loci beyond behavioral traits and their relationship with clinical psychopathology. For example, behavioral traits significantly associated with a specific genetic variant may correlate with clinical symptoms of a specific psychiatric cohort. Indeed, in some cases we show that the genetic variant in question is also differentially enriched between that specific psychiatric cohort and healthy individuals. For instance, we found the minor allele (G) at rs365990 to be significantly associated with an increase in mean heart rate and a decrease in interday heart rate variation (Figure 5B, left). The variant, missense for MYH6, had been previously linked to atrial fibrillation, ventricular tachycardia and resting heart rate, and the entire locus shows a significant enrichment of chromatin features in heart samples compared to other tissues (Figure S5B).6063 We also found the same allele to be enriched in the bipolar/psychotic disorder cohort compared to healthy controls (Figure 5B, right). This cohort included youth meeting criteria for bipolar or unspecified psychotic spectrum disorder, and such severe pathology is known to be associated with characteristic irregularities in heart activity.6466 SNP rs365990 is also a GTEx eQTL for the CMTM5 gene (Figures S5C-S5D), which is highly expressed in brain subregions and has been implicated in stress response and childhood adversity, further supporting the relevance of this locus for psychiatric conditions in addition to heart pathophysiology.59,67 In a similar fashion, we explored variant rs113525298. The minor allele at rs113525298 is associated with prolonged periods in bed and shorter vigorously active time during the day, and it appears at a lower frequency in the ADHD cohort compared to healthy individuals (Figure 5C). This suggests a potential protective role of the allele against hyperactivity disorders, further supported by the proximity of the SNP to ELFN1, previously implicated in the pathophysiology of ADHD.68,69

Overall, these results highlight how wearable-derived features can be leveraged as digital phenotypes in GWAS and enable the identification of genetic variants relevant to clinical psychiatry, with significant effects on exhibited behaviors among adolescents.

Discussion

Psychiatric disorders have been traditionally described with diagnostic categories based on retrospective self-report of symptom sets. However, current efforts in the field are increasingly leveraging new technologies to transition from retrospective self-reporting and fixed symptom sets to more dimensional conceptualizations. This aims to capture the complex and heterogeneous nature of psychiatric disorders for more accurate research into their underlying structure.6 One approach to enhancing dimensional models is the use of quantitative phenotypes. Although quantitative phenotypes have been derived from cellular, tissue, and organ levels of information, computational strategies that generate useful quantitative phenotypes in the behavioral domain are currently limited. Wearable biosensors such as smartwatches offer an opportunity to objectively study psychiatric disorders in a non-invasive way by measuring their underlying physiological and behavioral foundations over time.

Towards this end, we used wearable data to generate static and dynamic features that were employed by our AI modeling framework as digital phenotypes to distinguish between adolescents with and without psychiatric disorders. Models utilizing these wearable-derived digital phenotypes performed comparably to those based on more expensive data sources such as fMRI measurements.32,70 To gain critical theoretical insights and inform treatment development efforts, we augmented the modeling framework with interpretability modules, allowing us to pinpoint temporal and functional regions of the time series that were highly correlated with overall diagnostic status.71 These interpretability modules have the potential to facilitate mechanistic studies that offer deeper insight into the underlying complexities of these disorders. For example, our interpretability modules revealed that heart rate time series held high importance in predicting ADHD. This finding aligns with the clinical manifestation of ADHD–affected children are characterized by episodes of heightened arousal that are often incongruent with environmental demands.72 Conversely, the interpretability modules identified sleep intensity and quality as key predictors in our anxiety disorder models, in line with disruptions in sleep patterns and circadian rhythms commonly observed in youth with anxiety disorders.73

Wearable-derived digital phenotypes are not just effective for detecting the presence of psychiatric disorders in individuals, but they also serve as a valuable research tool for understanding the correspondence between behavior patterns and molecular attributes. This comprehensive approach helps to uncover the foundational elements of pathological behavior patterns. In this context, we focused on establishing links with genetics. Specifically, we showed that these digital phenotypes can serve as response variables in GWAS models. Their continuous nature enhances statistical power compared to categorical diagnostic labels.49 Furthermore, we took advantage of the features’ correlated structure to create multivariate response variables and implemented a hierarchical testing strategy that increased the statistical power of our GWAS. This strategy is advantageous because it mitigates the multiple-testing correction burden that arises from independently evaluating numerous features.74 Conversely, from a biological standpoint, these wearable GWASs allowed us to explore triaxial associations encompassing genetic, physiological, and psychiatric factors. Utilizing our framework, we successfully identified a significant association between a missense variant of the MYH6 gene, which encodes the cardiac muscle myosin, and heart rate patterns. Heart activity receives complex inputs from the CNS, which implies behavioral influence and, in combination with our GWAS, supports the notion of a gene-behavior-disorder pathway.75 Building on this finding, we discovered enrichment of the same genetic variant among individuals with bipolar/psychotic disorders, psychiatric conditions known to be associated with characteristic irregularities in heart activity.64 While additional research is needed to confirm such associations, our findings resonate with the objectives of the RDoC initiative.6 Specifically, wearable-derived digital phenotypes serve as objective markers of behavior, bridging lower-level biological systems like genetics to broader psychiatric disorders.

Although we report several associations between genotype, digital phenotypes, and psychiatric macrophenotypes, this study did not consider causal relationships between digital phenotypes and psychiatric macrophenotypes. For example, in some cases the digital phenotype could be situated at an intermediate level in the causal chain between genotype and macrophenotype. However, in other cases the psychiatric macrophenotype could lead to secondary changes in the digital phenotype. Moreover, this relationship could involve bidirectional associations or positive feedback loops.45 Finally, other scenarios involving environmental or non-genetic exposures (e.g., medication) could also play a role in the interplay of digital phenotype and macrophenotype (Figure S1). Further experimentation and analyses, such as Mendelian randomization, will be required to establish the structure of causal chains linking genotype, digital phenotype, and macrophenotype.

While we have employed these wearable-derived digital phenotypes in a targeted research context (i.e., to enhance a psychiatric GWAS), their broad applicability makes them promising for other domains of health research. For example, the scores generated by our AI-modeling framework could be used to assess disorder severity, and the genetic variants identified by our wearable GWAS could be employed to construct more comprehensive polygenic risk scores for behavioral and psychiatric disorders. Unlike other diseases (e.g., cancer) where objective biomarkers are common, psychiatry faces a significant barrier in treatment due to the lack of objective and sensitive screening methods.76 Therefore, these physiological and genetic features could be leveraged as objective biomarkers to subtype patients more accurately within diagnostic categories, which in turn could help move towards precision treatment delivery in psychiatry. This approach holds promise in identifying early markers of treatment efficacy, potentially offering a more dynamic assessment of therapeutic impact. Additionally, the integration of interpretability modules provides an avenue to correlate physiological and behavioral characteristics with higher-order pathological traits. Such correlations may reveal the physiological and behavioral foundations of specific psychiatric disorders, ultimately guiding the development of treatment strategies in research settings and personalized application of treatments in clinical practice.

We anticipate that further development of our AI modeling framework, coupled with an expanded array of wearable devices, could transform how psychiatric disorders are measured and understood in both research and clinical settings. This could lead to more nuanced digital phenotypes and open additional avenues for the study of human behavior.

Limitations of the Study

While the study underscores the transformative potential of wearable devices and AI frameworks in illuminating complex behavioral and psychiatric traits, we note a few limitations. We acknowledge the modest sample size of our cohort, particularly in specific disorder groups. However, we anticipate that this will become less constraining in the future, as other consortia like All of Us hold promise for bolstering sample size and diversity.77,78 We also acknowledge the potential influence of external factors, such as medications, on patterns derived from wearable devices. Although these confounding factors are unlikely to have impacted the results of our study, given the limited fraction of individuals in the ABCD cohort who received outpatient mental health care79,80, we note that future studies should consider how treatments potentially affect digital phenotyping and the interpretation of genetic correlates. Additionally, data derived from consumer-grade wearable devices may have potential inaccuracies, such as actigraphy-based sleep monitoring, thus it is important to take this factor into account when interpreting the presented results. As biosensor and wearable technologies continue to develop, we expect measurement accuracy and validity to improve, expanding the applicability of the presented framework. Finally, the current lack of regulatory recognition of consumer-grade wearable devices as medical devices, in part due to the need for standardization of measurements across various devices, indicates their application in this context should be interpreted as exploratory. Despite these caveats of imperfect criterion validity, wearable data can still have high predictive utility, as evidenced by the usage of similar consumer-grade wearable devices by other research groups and consortia27,81, and by our results showing an improvement in predicting individuals with psychiatric disorders and increased power to identify genetic biomarkers. Addressing these limitations would further advance our understanding of complex behavioral and psychiatric traits and optimize the usage of wearable technologies in research and clinical practice.

Resource Availability

Lead contact

Further information and requests for resources should be directed to and will be fulfilled by the lead contact, Mark Gerstein (mark@gersteinlab.org).

Materials availability

This study did not generate new unique reagents.

Data and code availability

STAR Methods

EXPERIMENTAL MODELS AND STUDY PARTICIPANT DETAILS

The Adolescent Brain Cognitive Development (ABCD) study is a comprehensive longitudinal project initiated in 2015 with the purpose of characterizing the neural, cognitive, and behavioral aspects of adolescent development. Commissioned by a consortium of U.S. federal agencies, ABCD investigators deeply phenotyped a large and representative sample of children aged 9–14 with plans to track their development into early adulthood. The ABCD dataset incorporated multimodal brain imaging data, substance use history, behavioral and psychological measures, genetic data, and an all-encompassing collection of demographic, physical health and activity, mental health, and environmental information, including data derived from wearable devices.

METHOD DETAILS

Dataset Description (related to “Leveraging the Adolescent Brain Cognitive Development cohort” and “Generating digital phenotypes from wearable-derived data” in the main text, Figure 1, Figure 2, Figure S1, Figure S2, and Table S1)

Clinical Diagnoses

Clinical diagnoses were operationalized using the parent report version of the Kiddie Schedule for Affective Disorders and Schizophrenia (KSADS). The KSADS is a gold standard semi-structured diagnostic interview that is used to establish a broad range of clinical diagnoses in children and adolescents. It has been previously used to define clinical groups in case/control studies conducted with data from the ABCD study.70

Cohort Definitions

We identified several clinical groups of interest in order to evaluate our framework across different forms of psychopathology. The nonclinical comparison cohort was composed of youth who did not meet any current diagnostic criteria for any disorder on their most recent administration of the parent report KSADS. Similar diagnostic categories, based on ICD10 diagnostic codes, were combined to create cohorts with sufficient sample sizes for our modeling framework. Each clinical cohort was composed of the following diagnostic groups, based on the currently reported symptom sets: Anxiety Disorder, Attention-Deficit/Hyperactivity Disorder (ADHD), Obsessive/Compulsive Disorder (OCD), Panic Disorder, Sleep-related Disorders, Bipolar/Psychotic Disorders, Eating Disorders, Depressive Disorders, Post-Traumatic Stress Disorder (PTSD) (Figure 1B, Table S1, and Data S2).

Preprocessing and Quality Control of Wearable Device Data

We commenced by combining, into a single dataframe, data from seven distinct wearable-derived modalities (heart rate, calories, intensity, steps, METs, sleep level, and sleep intensity) collected for 5,339 individuals, resulting in highly sparse data structures (Figure 1C and Figure S1B). We excluded individuals with at least one missing wearable modality, leaving us with 3,538 participants. To address the impact of missing values on further analysis, we implemented a two-stage quality control procedure. In the initial phase, for each data modality we examined all potential time windows for two selected days in each week. Our objective was to balance the maximization of data inclusion with the assurance of its quality. We pinpointed the time window that offered the best alignment—that is, the period which had the highest number of valid measurements across all modalities. This procedure enabled us to determine the optimal time window for downstream analysis, ensuring both a sufficient sample size and a high quality of data. In the next stage, we established a criterion that each day must have at least 60% valid measurements within the identified optimal window for a given individual. We based this decision on thresholding approaches consistent with literature utilizing wearables in health research and with the recommendations of the ABCD wearable working group.9597 Participants who did not meet this standard were removed from our dataset. We provide a visual representation of the data processing and QC steps in Figure S2A and Data S1 and Data S10-S11.

Imputation

Missing values are still existent in the resulting QC-controlled time windows. To handle the data missingness, we devised an imputation strategy for categorical and quantitative data modalities, respectively. For the categorical data, we introduced a 'Not Recorded' category into the frame for imputation and subsequently applied label encoding. For the quantitative data, we used the 'drift' method from the sktime package (v0.19.1) with its default settings.86 Recognizing that these imputation strategies may not be adept at capturing non-polynomial dynamics, we further included an indicator time series for each data modality:

Tindicator(i)=1(T(i)=NA)

where 1() is the indicator function and T(i)=NA indicates the data at time step i is missed.

We concatenated the indicator time series with the imputed time series along the channel dimension. The indicator time series serves as a mask that shows where imputations have been made, while the imputed time series contains both the actual and imputed data. By including this additional indicator time series, we are effectively providing the model with the flexibility to learn an adaptive imputation strategy, where the model can learn how to treat imputed data points based on the surrounding, non-imputed data.

The indicator time series are referred to as ‘flag’ channels in the model. Every time series modality generates a corresponding binary ‘*_flag’ channel after imputation. For example, ‘heart_rate_flag’ indicates the imputation status of the heart rate data, where 1 indicates that the data at that time step is imputed whereas 0 denotes an actual recorded measurement. Since data missingness could be related to behavioral patterns, missingness indicators can help in capturing this relationship. By encompassing *_flag channels, the model could recognize and weigh imputed values during training and enables recurrent neural networks to effectively model and handle missing data in clinical time series.84

Machine Learning Classifier (related to “Generating digital phenotypes from wearable-derived data” and “Classifying psychiatric macrophenotypes using wearable-derived digital phenotypes” in the main text, Figure 2, Figure 3, Figure S2, and Figure S3)

Problem Formulation

We first formulated the phenotype classification as a canonical machine learning task with manually engineered features, which is outlined as follows. Given an input for a set of features, X, machine learning classification (MLC) targets an output value y which represents the macrophenotype of the subject:

XRN×dy

Here, N is the number of individuals and d is the number of features. Specifically, we chose the curated XGBoostRegessor model implemented in the xgboost package40 (v1.7.5) as our backbone ML models, i.e.:

XGBoostRegressor(X)y

XGBoost (eXtreme Gradient Boosting) has emerged as an effective machine learning framework, noted for its optimized speed, scalability, and robustness.40 As a variant of gradient boosted decision trees, XGBoost is tailored for efficiency and demonstrates consistent performance across diverse machine learning applications. Central to XGBoost is its adeptness at engineering trees which pinpoint and rectify residuals from prior iterations, continually refining model accuracy. In this work, we take advantage of the strengths that XGBoost offers, guided by a carefully crafted set of features.

Feature Engineering

Our feature engineering for the XGBoost model is elaborated below. Specifically, the time-invariant wearable features Xw were primarily derived from summary statistics of the time-series wearable data. We identified seven clusters of time-invariant wearable features from a total of 258 features. We further included curated covariates Xcov as additional features to supplement the time-invariant wearables features. Covariates used for machine learning model include demographic background (sex, race, age at second-year follow-up, divorced parents, parents’ level of education, parent income, adoption) family history of psychiatric illness (bipolar disorder, schizophrenia, antisocial behavior, nervous breakdown, psychiatric treatment, hospital admission, suicide), cognitive scores (flanker test, picmemory, process speed, reading score, stop reaction time, etc.), and child behavioral checklist (CBCL internalizing and externalizing scores) (Figure 2A2C, and Data S1-S2 and Data S14). We also considered wear time and sports participation in a subset of our analyses (Data S15-S16). Our complete feature set encompasses both covariates and wearable-derived static features:

X=ConcatXcov,Xw1,Xw2,,Xw7

where Concat() denotes concatenation on the feature dimension. This enabled us to characterize a nuanced interplay of wearable features with individual covariates, substantially accentuating the power of our model.

Clustering of Wearable-derived Static Features

We considered the 258 wearable static features in a subset of 2,410 ABCD individuals with complete genetic, wearable and covariate information. We computed Pearson’s r correlation coefficients between all possible pairs of features and used these correlation values as distance measures to perform hierarchical clustering (R function “hclust” & clustering method “Complete”). We also performed k-means clustering of the correlation matrix by varying the number of clusters from two to twenty (R function “kmeans”, with nstart = 10), and chose an optimal number of seven clusters based on the elbow curve of the total within-cluster sum of squares. A heatmap representation of the seven clusters is shown in Figure 2D, and the list of static features for each cluster is provided in Data S12-S13.

Class Balancing

Imbalanced training labels, where one class substantially outnumbers the other (e.g. 1,737 control individuals versus 216 ADHD individuals), pose a substantial impact on the model performance. To address this issue and ensure a more robust model, we implemented stochastic downsampling techniques on classes with higher representation in each run of the model. To formalize this, we assume two classes, A and B, where |A| and |B| represent the number of instances in each class. Assuming |A||B|, we calculate the ratio r:

r=|B||A|

We then randomly select a subset A from A such that:

A=r×|A|

The downsampled dataset will then consist of A and B:

Downsampled Dataset=AB

Multichannel Time Series Classifier (related to “Classifying psychiatric macrophenotypes using wearable-derived digital phenotypes” in the main text, Figure 2, Figure 3, Figure S2, and Figure S3)

Problem Formulation

We formulate the phenotype classification as a multichannel time series classification problem which is described as follows. Given an input multichannel time series X:

X=ConcatXw,Xwindicator,Xcov,

where XWRN×cw×L, XWindicator{0,1}N×cw×L, XcovRN×cw×L. Here, N is the number of samples, cw the number of wearable modalities, L the number of measurements, XW the multichannel wearables data, XWindicator the multichannel indicator data (See section Imputation), and Xcov covariates (detail in next section). The multi-channel time series classification (MCTSC) targets an output value y which represents the macrophenotype of the subject:

XRN×C×Ly

where C=2×cw+ccov is the number of input time-series channels. We further define a parameterized model which maps X to the output y:

fθ(X)y

where f represents the mapping function, which is parameterized by θ. To optimize θ, we employed cross-entropy loss as the objective function, which is defined as:

CE(y,y^):=k=1Kyklogy^k

where K denotes the number of classes. We employed a label smoothing regularizer to the ground truth label:

ykls=yk(1α)+αK

Here, α is a smoothing parameter (we chose 0.1). This label smoothing technique helps to prevent the model from becoming too confident about the class labels, which could potentially bolster its generalization ability.

Covariate Integration

In order to integrate both covariates and time-series data for classification, we adapted the same covariates used for XGBoost feature engineering into a time-series format. Essentially, we transformed these variables into time-invariant sequences, where the value for each covariate remains the same at every time step. The transformed time-series covariates were then merged with wearable sensor data along the channel dimension. This approach allows the model to capture potential interactions between covariates and wearable measures, wherein the model can adjust its weights accordingly if a certain covariate influences the interpretation of the wearable data.

Xception Encoder

The XceptionTime encoder harnesses the power of one-dimensional convolutional neural networks (1d-CNNs) as its underlying architecture.41 The model is structured with convolutional filters of various sizes, which are sequentially followed by MaxPooling, Batch Normalization, and ReLU activation functions, which form residual connections. Formally:

Hbottleneckl=Conv1DHl1
HMaxConvPooll=Conv1DMaxPoolHl1
ΔHl=kXceptionConvkHbottleneckl,HMaxConvPooll
ΔHl=BatchNormΔHl
ΔHl=ReLUΔHl
Hl=Hl1+ΔHl

Here, Conv1D() denotes 1d convolution, XceptionConv() represents depthwise separable convolution, BatchNorm() represents Batch Normalization, ReLU() represents ReLU activation function, and aggregates feature maps from convolution filters of different sizes. A visual representation of the model could be found in Figure 2 and Figure S2B. In summary, the input feature maps are first projected to a bottleneck features map where the number of input channels is much larger than the number of output channels. A sequential operation of max pooling and 1d convolution is then performed on the input features maps to increase the expressivity of the model. The variation in the size of Xception convolutional filters gives rise to multi-level receptive fields, allowing the model to aggregate and process information at different levels of granularity or resolution. Such a property is particularly advantageous when dealing with data from wearable devices, as wearable data often exhibits both local patterns (i.e., minute-by-minute changes) and global trends (i.e., hourly or daily rhythms).

The XceptionTime encoder introduces a modification to the vanilla 1d convolution model by substituting the 1d convolution with a 1d depth-wise separable convolution. The operation can be broken down into two steps:

XceptionConvHl1:=PointwiseConvDepthwiseConvHl1

In contrast to the traditional convolution operation, the depth-wise separable convolution first applies a convolutional filter to each channel individually. This is followed by a 1x1 pointwise convolution module, which performs a linear combination of the outputs across channels. This process reduces the computational complexity of the model while still allowing for complex feature extraction. These steps are described in detail below.

Depthwise Convolution:

This applies a single filter to each input channel which can be expressed as:

Fcd=HcKcd

where Fcd is the output feature map for channel c after the depthwise convolution, Hc is the input feature map for channel c, and Kcd is the depthwise filter (or kernel) for channel c. ∗ denotes the convolution operation.

Pointwise Convolution:

This operation combines the outputs from the depthwise convolution across channels:

Fp=cYcdKp

where Fp is the output feature map after the pointwise convolution, Ycd is the input feature map for channel c, Kp is the pointwise filter, which has a spatial dimension of 1x1 and operates across all channels, and is used to denote the aggregation of feature maps from all channels.

Model Training and Evaluation (related to “Generating digital phenotypes from wearable-derived data” and “Classifying psychiatric macrophenotypes using wearable-derived digital phenotypes” in the main text, Figure 2, Figure 3, Figure S2, and Figure S3)

Training Details

We split the dataset into 70% training set and 30% test set. We ran different iterations with 10 random seeds, and the final results were calculated as the mean of the 10 runs. This helps to mitigate the risk of overfitting on a specific split and provides a more robust estimate of the model performance. We used ADAM98 as the optimizer for training, with 1x10−3 as the initial learning rate. The neural network model is trained on an NVIDIA V100 32GB graphical processing unit using the PyTorch and tsai deep learning libraries.85,99

Wearable Combination Scores

In our study, we computed wearable combination scores RSRN×|K| by extracting the final layer of the deep learning model, specifically the softmax probability. For the XGBoost model, we leverage the pred_prob method implemented in the XGBoost library. Specifically:

RS(j):=P(y=jx)=efjkKefk

where fRN×K is either the XceptionTime logits in the XceptionTime model, or the sum of outputs from all trees in the XGBoostRegressor model. The softmax function, used in the final layer of the deep learning model, returns probabilities for each category in a multi-class problem that sum up to 1. Similarly, XGBoosťs predict_proba method generates class probabilities as output. These scores can serve as a measure of the likelihood associated with each class or outcome. We utilized these scores as the response variable in our subsequent GWAS study (see Methods section “Continuous and binary univariate GWASs for ADHD”). This approach not only bridged the gap between deep learning modeling and GWAS but also significantly enhanced the power of our GWAS.

Model Interpretability (related to “Interpreting wearable features prioritized by the deep learning model” in the main text, Figure 3 and Figure S3)

Ablation Method for Step and Feature Importance

The ablation method we present was used to measure the importance of features in a dataset. Ablation methods are based on randomly rearranging the values of a feature or a group of features across all subjects in the dataset, and then calculating an importance score based on the decrease in a chosen metric. In our case, we utilized the Area Under the Receiver Operating Characteristic Curve (AUROC) as the metric to calculate this score. The rationale behind this method is that if a feature is important for model predictions, shuffling the values of that feature should disrupt the model’s ability to make accurate predictions, leading to a decrease in the chosen performance metric. The larger the decrease, the more important the feature is considered to be.

The ablation importance score can be applied to calculate both feature importance and step importance. For step importance, the implementation is slightly different. Instead of shuffling individual features, we shuffled the values within selected windows of the time series. The time series was divided into windows of a chosen length (in our case, 1 hour), and these windows were then shuffled across all subjects, allowing us to assess the importance of information at different time steps or periods. If the model performance significantly decreases when the values within a certain time window are shuffled, the information within that time window is important for the model predictions. These analyses are shown in Data S24-S30.

Grad-CAM

The weighted class activation mapping (CAM) method is a well-established technique for examining how a trained model makes its predictions.43 In the context of time-series data, it can highlight which time steps are particularly influential in the model's decision-making process.

We first computed the gradient of the score for the predicted class y with respect to the feature map of the first layer activations A0c,i of a convolutional layer. This gradient, denoted as yA0c,i, provides a measure of how a small change in the activation of the convolutional layer could affect the final prediction of the model. To convert these gradients into a measure of importance for each channel (indexed by c), we employed a global average pooling, which calculates the average of all gradients across the sequence length (indexed by i). This resulted in a set of channel-wise gradient averages, denoted as αc. Mathematically, this is expressed as:

αc:=1Zi=1LyA0c,i

where Z is a normalization constant, typically the total number of elements in the layer, and L is the length of the sequence.

We next generated the Gradient-weighted Class Activation Mapping (Grad-CAM). This is a visual representation of the importance of each time step for the model's predictions. The Grad-CAM, denoted as LGMi, is defined as:

LGMi=ReLUcαcA0c,i

where the ReLU (Rectified Linear Unit) function is used to ensure that only features with a positive influence on the class of interest result in high activation. Essentially, this means that only the time steps that positively contribute to the model's decision will have high importance scores.

Finally, for each time step, we computed the average Grad-CAM scores across the entire test set. This allowed us to determine which time steps in the input data were most influential in the model predictions.

Quality Control of Genetic Data (related to “Using wearable-derived features as digital phenotypes in GWAS for ADHD” and “Employing digital phenotypes to detect genetic associations with behavioral traits” in the main text, Figure 1, Figure 4, Figure 5, Figure S4, and Figure S5)

We obtained genotyped and imputed genetic data for 11,099 individuals as part of the ABCD Data Release 3 (https://abcdstudy.org/scientists/data-sharing/). We used the genotyped data to infer population stratification and the imputed data to perform the different GWASs described below (see Methods sections “GWASs for ADHD” and “Continuous Multivariate GWAS for behavioral traits”).

We applied a quality control (QC) protocol on the genotyped data.100 Specifically, we performed the QC steps described in https://github.com/MareesAT/GWA_tutorial/blob/master/1_QC_GWAS.zip (file “1_Main_script_QC_GWAS.txt”) using PLINK.101 Briefly, of the initial set of 516,598 variants, we kept those with a missingness rate across individuals < 0.02 (n = 481,920). Of the initial set of 11,099 individuals, we kept those with a missingness rate across variants < 0.02 (n = 10,660). Next, we considered only variants located on autosomal chromosomes (n = 470,076), those with Minor Allele Frequency (MAF) > 0.01 (n = 427,704), and those that did not deviate from Hardy-Weinberg equilibrium (p value ≥ 10−10; n = 370,002). These variants were pruned to a final set of 156,556 variants (window size = 50; number of variants to shift the window at each step = 5; multiple correlation coefficient 0.2). We computed the heterozygosity rate for each individual using the pruned set of variants and kept individuals with a heterozygosity rate deviating ≤ 3 standard deviations from the mean (n = 10,467). We also used pruned variants to assess cryptic relatedness by identifying groups of individuals with Proportion Identity-By-Descent (pi_hat) > 0.2. For every group of related individuals, we then selected the individual with the lowest variant missingness rate, leaving a total of 8,816 individuals. We used PLINK to perform a Principal Component Analysis (PCA) on the 156,556 pruned genotyped variants from the 8,816 selected individuals. We report the PCA results in Figure 1D, where each individual (dot) is colored based on the ethnicity score group information provided by the ABCD metadata, available for 8,791 individuals (see also Data S31-S34).

We filtered the imputed genetic variants for MAF > 0.01 and estimated imputation accuracy R2>0.3, and we obtained a final set of 11,954,686 variants for the GWAS analysis (Data S35). We also computed distributions of R2 for all (genotyped and imputed) variants, and of empirical leave-one-out R2ER2 for genotyped variants (Data S36).

Covariates included in the GWAS (related to “Using wearable-derived features as digital phenotypes in GWAS for ADHD” and “Employing digital phenotypes to detect genetic associations with behavioral traits” in the main text, Figure 1, Figure 4, Figure 5, Figure S4, and Figure S5)

We considered five different groups of covariates: basic (sex, age at second-year follow-up, first five genotype PCs), behavioral (CBCL internalizing and externalizing scores, DSM internalizing and externalizing scores), family history of psychiatric illness (bipolar disorder, schizophrenia, antisocial behavior, nervous breakdown, psychiatric treatment, hospital admission, suicide), family situation (divorced parents, parents’ level of education, family income, adoption), and other (ACS raked propensity score, DNA extraction batch). 3,579 of the previously selected 8,791 individuals reported complete information for these 24 covariates.

GWASs for ADHD (related to “Using wearable-derived features as digital phenotypes in GWAS for ADHD” in the main text, Figure 4, Table 1, Figure S4, Table S2, Table S3, and Table S4)

For the three GWASs for ADHD described below (continuous multivariate, continuous univariate, and binary univariate), we focused on a subset of 1,191 individuals that were either diagnosed with ADHD (n = 137) or belonged to the non-clinical control group (n = 1,054). We used only the set of basic covariates (sex, age, first five population structure PCs), as these were also the covariates used in previous GWASs for ADHD.14,15

Continuous multivariate GWAS for ADHD

In this case we performed a battery of GWAS runs using each of the 7 clusters of 258 wearable-derived static features as a multivariate response variable (i.e., multivariate digital phenotype; Figure 4B; see Methods section “Clustering of Wearable-derived Static Features ”). Since the 258 features are not germane to ADHD, in order to assess the relevance of the significant genetic associations to ADHD, we specifically introduced an interaction term Genotype:Disorder as described in the formula below:

Multivariate Digital PhenotypeCovariates+Disorder+Genotype+Genotype:Disorder

where “Genotype” g corresponds to the genotype group of an individual at a particular genetic variant, and “Disorder” m corresponds to the status of the individual (0 = control individual; 1 = individual with ADHD) (Figure S4A). In this context, a significant interaction (g×m) refers to a genetic effect on the multivariate digital phenotype (d) that differs between individuals with ADHD and those in the control group. Besides using each of the seven clusters of features as a multivariate response variable, we also conducted a GWAS using the 14 heart-related static features as the multivariate response variable (features: InterdayCV, InterdaySD, IntradayCV_mean, IntradayCV_median, IntradayCV_sd, IntradayMean_mean, IntradayMean_median, IntradayMean_sd, IntradaySD_mean, IntradaySD_median, IntradaySD_sd, Mean, Median, STD). Therefore, we ran a total of eight multivariate GWASs (one for heart rate features and one for each of the seven clusters of features).

We used the Multivariate Asymptotic Non-parametric Test of Association R package (MANTA, https://github.com/dgarrimar/manta) to test for association between genetic variants and the multivariate wearable trait.46 We performed all the analyses within a containerized Nextflow88 pipeline, available at https://github.com/dgarrimar/mvgwas-nf. We employed the option --interaction of the mvgwas-nf pipeline to encode the interaction term in the GWAS. Since MANTA is a non-parametric method, normalization of the multivariate wearable trait was not required. We considered variants that reported a genome-wide significant p value (p < 5·10−8) for the Genotype:Disorder interaction term. More specifically, for these variants we required both the ADHD and control groups to have at least ten individuals in at least two of the three genotype groups (in the case of variants located on chromosome X and tested in the male cohort, we required at least ten individuals to be present in both the “0” and “2” genotype groups). Additionally, variants on autosomal chromosomes (or on chromosome X when tested in the female cohort) that reported less than ten individuals in one of the three genotype groups either in the ADHD or control cohorts were tested again for association after excluding the genotype group with less than ten individuals. These variants were reported in the final list of significant loci only if they also reached a Genotype:Disorder p value < 5 ·10−8 in this second association test. After performing the different GWAS runs, we used FUMA for loci definition (reference panel population: “1000G Phase3 ALL”).89 Results related to the continuous multivariate GWAS for ADHD are shown in Figure 4B, Table 1 (GWAS Method: “Continuous multivar.” and Phenotype: “ADHD”), Table S2, Table S3, and Data S37. As MANTA p values do not come from a normal distribution, we employed λX (instead of the commonly used λG) to estimate the genomic inflation factor.102

Previous studies have reported heteroscedasticity as a potential source of p value inflation in GWASs testing SNP-environment interactions compared to the marginal SNP tests.103,104 To rule out this possibility in our analyses, we evaluated the heteroscedastic behavior of the two clusters of wearable-derived static features for which we reported genome-wide significant loci (i.e., Clusters 3 and 5; Table S3 and Data S37). Specifically, we considered all the genetic variants that reported a GWAS association p value < 10−2 for the interaction term Genotype:Disorder. For each of these variants we then performed a permutation test of multivariate homogeneity of variances considering the six Genotype:Disorder groups (i.e., three genotype groups X two disorder categories), using the R package vegan.90 First, for a given variant and for the six Genotype:Disorder groups of individuals, we computed the distances to the group centroids defined in the space of the covariate-adjusted (i.e., residualized) features (R function betadisper). Next, we performed a permutation-based test to evaluate if one or more groups is more variable than the others (R function permutest, n permutations = 10,000). We found that only 2.4% and 0.83% of variants reported a p value < 10−3 (i.e., rejected the null hypothesis of homogeneity of variances) for Clusters 3 and 5, respectively. This suggests that heteroscedasticity is unlikely to significantly bias the association p values for the Genotype:Disorder interaction term in our multivariate GWAS for ADHD.

Since the multivariate GWAS does not indicate which specific features within a given cluster are significantly associated with the genetic variant, we complemented these results with a post-hoc univariate test (Figure S4C). For a given genome-wide significant locus, we considered each feature in the significantly-associated cluster and compared the distributions of the feature’s residualized value among the three genotype groups (two-sided Wilcoxon Rank-Sum test). We performed pairwise comparisons among the three genotype groups independently for individuals with ADHD and control individuals. For each of the two genome-wide significant loci, we provide in Table S2 the final list of features that reported a Benjamini-Hochberg adjusted p value < 0.1 in at least one of the three pairwise comparisons performed in the ADHD cohort (see also Data S38-S39).

Continuous and binary univariate GWASs for ADHD

We obtained, for each individual, ten different ADHD wearable combination scores based on the XGBoost and Xception predictive models (see Methods section “Wearable Combination Scores”). Specifically, we used combination scores from the following six models: baseline model using CBCL externalizing score (“CBCL ext.”); baseline model using CBCL internalizing score (“CBCL int.”); XGBoost model using wearable features (“XGB”); XGBoost model using wearable features and CBCL scores (“XGB + CBCL”); Xception model using wearable features (“Xception”); and Xception model using wearable features and CBCL scores (“Xception + CBCL”). For models “XGB”, “XGB + CBCL”, “Xception” and “Xception + CBCL”, we also implemented the “liability-CC” trait methodology.105 This methodology consists of converting the predictive modeling combination score of the cases (i.e., individuals with ADHD) to a value of 1, while keeping the original combination scores for the controls. These four additional types of scores are labeled as “XGB v2”, “XGB + CBCL v2”, “Xception v2” and “Xception + CBCL v2”. We performed ten GWAS runs to test for associations between genetic variants and each of these ten combination scores (continuous univariate GWAS; Figure 4C). For each run, we defined a model that included the combination score (dscore) as a univariate response variable, and the genotype and covariates as independent variables, as described by the formula below (see also Figure S4A):

Wearable Combination Score~Covariates+Genotype

We also performed a GWAS testing for association between genetic variants and ADHD diagnosis, encoded as a binary outcome (ADHD = 1, control = 0; univariate binary GWAS; Figure 4A, Figure S4A), as described by the formula below:

ADHD Diagnosis~Covariates+Genotype

We used PLINK87 to perform both the continuous univariate and the binary univariate GWASs, and the FUMA platform89 for loci definition (reference panel population: “1000G Phase3 ALL”). Results related to the continuous univariate GWAS for ADHD are shown in Figure 4C, Table 1 (GWAS Method: “Continuous univar.”), Data S40-S41, and Table S4.

Statistical Power of Binary vs. Continuous Traits (related to “Using wearable-derived features as digital phenotypes in GWAS for ADHD” in the main text and Figure S4)

To compare the statistical power of genetic association testing using binary and continuous traits, we simulated a cohort of 1,500 individuals. In each individual i, we generated biallelic SNPs with a binomial model (i.e., the genotype at each SNP followed a binomial distribution, with the number of trials equal to 2 and probability of success on each trial equal to a given MAF). We chose the cohort size to approximate the number of individuals (n = 1,191) in the univariate GWAS for ADHD described above (Methods section “Continuous and binary univariate GWASs for ADHD”). For each individual i, we then simulated a continuous trait Ci as the sum of the genotype effect (b) at a given SNP with genotype xi (0, 1, or 2) and random noise ei:

Ci=xib+ei

where

b~U(0,1)
e~N(0,1)

We also simulated a binary trait Bi for each individual i, following

Bi=1ifCi>median(C),otherwise0

where C is the vector of simulated continuous traits for the entire cohort.

For a particular genotype effect b, we ran 10,000 simulations. Under this scenario, we estimated the power of the simulated continuous and binary traits as the fraction of significant (i.e. Benjamini-Hochberg adjusted p value < 0.05) linear and logistic regression tests, respectively. We employed linear and logistic regression as implemented in the R functions “lm” (library “stats”) and “glm” (family = binomial; library “MASS”), respectively. Overall, we simulated 50 different values of b across six different MAFs (Figure S4C).

Continuous multivariate GWAS for behavioral traits (related to “Employing digital phenotypes to detect genetic associations with behavioral traits” in the main text, Figure 5, Table 1, Figure S5, Table S2, and Table S5)

This second type of GWAS consists in testing the association between genetic variants and the multivariate digital phenotype, which we consider a proxy for behavioral patterns in the general population. Therefore, for this GWAS we pooled all the individuals with complete genetic, wearable, and covariate data independently of their diagnosis. Since a more heterogeneous group of individuals was employed in this analysis, we included the full set of 24 covariates in the GWAS model (see Methods section “Covariates included in the GWAS”).

As in the case of the multivariate GWAS for ADHD, we conducted a battery of eight GWAS runs using as multivariate digital phenotype either the 14 heart-rate related features (available for 3,256 individuals), or the 7 clusters of 258 static features (available for 2,410 individuals). For each run, we defined a model that included the cluster of wearable features as multivariate response variable (d), and the genotype g and covariates c as independent variables, as described by the formula below (Figure S5A):

Multivariate Digital Phenotype~Covariates+Genotype

We performed the GWAS runs with the containerized Nextflow pipeline https://github.com/dgarrimar/mvgwas-nf requiring for a given genetic variant a minimum number of ten individuals per genotype group. After performing the different GWAS runs, we used FUMA for loci definition (reference panel population: “1000G Phase3 ALL”). These results are shown in Figure 5A, Table 1 (GWAS Method: “Continuous multivar.”, Phenotype: “Behavior”), Data S44, and Table S5. We complemented the results from the multivariate GWAS with post-hoc univariate tests on the four genome-wide significant loci, as described for the continuous multivariate GWAS for ADHD (Figure S4B, Figure 5B-C, Data S45-S48, and Table S2).

Statistical Significance and Functional Dissection of GWAS (related to “Using wearable-derived features as digital phenotypes in GWAS for ADHD” and “Employing digital phenotypes to detect genetic associations with behavioral traits” in the main text, Figure 4, Figure 5, Figure S4, Figure S5, Table S3, Table S4, and Table S5)

Genome-wide vs. Study-wide Significance

We selected the conventional genome-wide significant p value threshold of 5·10−8 to identify significant loci from all GWAS runs. However, in line with previous GWAS studies74, we also considered a study-wide significance threshold to account for the fact that multiple GWAS runs were performed. In our case, the study-wide significant thresholds are 5·10−9 (5·10−8 / 10 GWAS runs) for the continuous univariate GWAS for ADHD, and 6.25 · 10−9 (5·10−8 / 8 GWAS runs) for the continuous multivariate GWASs for ADHD and behavioral traits. Based on these thresholds, one locus from the continuous univariate GWAS for ADHD and all four loci from the multivariate GWAS for behavioral traits would pass the study-wide significance threshold. Similar to other GWASs74, we also considered a suggestive p value threshold of 1·10−5 (Figures 4A4C, 5A, and Data S40).

Overall, our hierarchical two-step test of association strategy (i.e., a multivariate GWAS followed by post-hoc univariate tests) offers a number of advantages compared to running multiple univariate GWASs on individual features. First, it helps alleviate some of the multiple testing correction burden. If we conducted univariate GWASs for each of the 258 features, the study-wide significance threshold would be 1.94·10−10 (similar to what is reported in a recent study about the gut microbiome that performed univariate GWAS runs on each of 257 metagenomic features).74 This threshold would be at least one-order of magnitude more stringent than the current study-wide threshold (6.25·10−9; Figure S4B). Second, the multivariate test also allows for increased association power by leveraging the correlated structure of the wearable features within each cluster, as explained in.46,106

Chromosome X

In all our analyses the genotypes of variants on chromosome X are encoded for both males and females on a 0...2 scale, with males encoded as either 0 or 2, and females encoded as either 0, 1, or 2, to account for X chromosome inactivation. This is the chromosome X encoding recommended in107 and also implemented by default in the PLINK2 tool.

We performed a sex-stratified analysis on chromosome X across all four types of GWASs. Specifically, variants located on chromosome X were tested for significant associations independently in the male and female cohorts, as recommended in.108 We employed the same GWAS tools and the same set of covariates as described for autosomal chromosomes in the above sections (“GWASs for ADHD” and “Multivariate GWAS for behavioral traits”), with the exception of sex which was not included as a covariate. Results related to chromosome X GWASs are shown in Data S50-S52.

Neuropsychiatry-related Proximal Genes and eGenes

For each genome-wide significant locus, we retrieved the ten closest genes when considering a window of ± 250 Kb from the center of the locus, using the GENCODE109 human genome annotation version 41. Next, we labeled as “neuropsychiatric-related” those proximal genes that are associated with psychiatric disorders according to OpenTargets (https://platform.opentargets.org/).110 We further intersected our catalog of genome-wide significant loci with previous eQTL catalogs59,111113 using bedtools intersect (v2.30.0)93, and identified a subset of proximal neuropsychiatric-related genes with eQTLs overlapping our list of loci. We labeled these genes as “neuropsychiatric-related proximal eGenes” (Tables S3-S5).

Chromatin Dissection of locus chr14:23392601–23418974

We first performed an exploratory analysis by intersecting our two lists of significant loci with the ENCODE4 registry of candidate cis-regulatory elements (cCREs) (https://www.encodeproject.org/search/?type=Annotation&encyclopedia_version=current&annotation_type=candidate+Cis-Regulatory+Elements&annotation_type=chromatin+state&annotation_type=representative+DNase+hypersensitivity+sites&status=released&encyclopedia_version=ENCODE+v4) (Tables S3-S5).58 Given the documented role of locus chr14:23392601–23418974 in heart-related traits and diseases, we next evaluated the enrichment of heart-specific epigenetic features (nucleosome positioning, histone modifications, and transcription factor (TF) binding) at this locus. We downloaded peak calling files for DNase-seq, ATAC-seq, ChIP-seq (histone marks & TFs) and Mint-ChIP for histone marks available for human biosamples from the ENCODE portal (https://www.encodeproject.org/metadata/?control_type%21=%2A&status=released&perturbed=false&assay_title=Histone+ChIP-seq&assay_title=TF+ChIP-seq&assay_title=DNase-seq&assay_title=ATAC-seq&assay_title=Mint-ChIP-seq&files.file_type=bigBed+narrowPeak&replicates.library.biosample.donor.organism.scientific_name=Homo+sapiens&type=Experiment&files.analyses.status=released&files.preferred_default=true; access date: 09/27/2022).58,114 We then grouped human biosamples based on their “biosample ontology organ slim” (https://www.encodeproject.org/report/?type=Experiment&control_type!=*&status=released&perturbed=false&assay_title=TF+ChIP-seq&assay_title=Histone+ChIP-seq&assay_title=DNase-seq&assay_title=ATAC-seq&assay_title=Mint-ChIPseq&replicates.library.biosample.donor.organism.scientific_name=Homo+sapiens&field=biosample_ontology.organ_slims&field=biosample_ontology.cell_slims&field=biosample_ontology.system_slims&field=%40id&field=biosample_ontology.term_name). To test the tissue-specific enrichment of chromatin features in a particular organ, we computed the number of times any of the five significant variants at the locus overlapped a peak from experiments in that organ compared to all other organs (two-sided Fisher’s exact test, Benjamini-Hochberg adjusted p value < 0.1) (Figure S5B). For this analysis, we counted only once those overlaps involving variants that are < 100 bp apart.

Exploring the Genetic-behavioral-psychiatric Axis

For each of the two examples shown in Figure 5B-5C (left panels), we evaluated the enrichment of the minor allele (in both cases the G allele) in individuals within a specific psychiatric cohort vs. non-clinical control individuals (two-sided Fisher exact test; Figure 5B-5C right panels). Given the reduced number of individuals with GG genotype for SNP rs113525298 (n = 15), the enrichment of the minor allele was tested by merging individuals with AG and GG genotypes. For SNP rs365990, the enrichment was computed only in individuals with GG genotype. For all tests, we required at least one individual to be present in every cell of the 2x2 matrix employed for the Fisher’s exact test (a=n individuals with minor allele AND part of the psychiatric cohort; b=n individuals without minor allele AND part of the psychiatric cohort; c=n individuals with minor allele AND part of the healthy controls; d=n individuals without minor allele AND not part of the healthy controls).

Intersection of genome-wide significant loci with the GWAS Catalog

To assess the clinical relevance of our GWAS loci, we intersected them with variants from the NHGRI-EBI GWAS catalog (https://www.ebi.ac.uk/gwas/; access date: 05/16/2023). For the loci identified by the GWASs for ADHD (both continuous multivariate and continuous univariate), we considered overlaps with brain- or neuropsychiatric-related GWAS hits (Figure 4B4C and Data S42). For the loci identified by the multivariate GWAS for behavioral traits, we considered any overlaps with heart-, sleep-, metabolism- or physical activity-related GWAS hits, since our wearable-derived features are mostly related to measurements of heart rate, sleep quality, metabolic intake and physical movement. Additionally, given the presence of individuals with psychiatric disorders in the ABCD cohort, we also considered intersections with brain- or neuropsychiatric-related GWAS hits (Figure 5A and Data S49). We acknowledge that colocalization analysis would be the most appropriate way to compute these intersections, and we performed this analysis for loci identified by the continuous univariate GWAS for ADHD (see Methods section “Colocalization analysis”). However, MANTA does not provide estimates of variant effect sizes that can be directly employed in co-localization analysis. For this reason, we evaluated the strength of these intersections against a null distribution. Specifically, we computed the proportion of GWAS variants associated with a particular trait that overlap our significant loci, and compared it to the proportions observed across 10,000 random sets of genomic loci with the same size and chromosome location. We report the percentile of our GWAS enrichments compared to the null distribution in Data S42 (univariate GWAS for ADHD) and Data S49 (multivariate GWAS for behavioral traits).

Colocalization Analysis

We performed colocalization analysis using the R package coloc92 (function coloc.abf, default parameters) on the results obtained from the continuous univariate GWAS for ADHD. Specifically, we focused on two of the seven overlapping brain-related traits with available GWAS summary statistics (Figure 4C and Data S43), and tested the hypothesis of signal co-localization between our ADHD wearable combination scores at the intersecting loci. Locus chr17:32256997:32283356 reported a posterior probability of 0.99 for signal co-localization with a locus previously associated with chronotype measurement.115 We also tested locus chr7:68219282:68338849 (suggestive association at p value < 10−5) for co-localization with a previously reported locus for ADHD.14 In this case, given that the two traits being tested are the same, we set all three parameters p1, p2 and p12 equal to 1·10−5, and reported a posterior probability of 0.25.

QUANTIFICATION AND STATISTICAL ANALYSIS

Quantitative and statistical methods are described above within the context of individual analyses in the method details section.

Supplementary Material

6. Table S5. Genome-wide significant loci identified by the continuous multivariate GWAS for behavioral traits, related to Figure 5, Table 1, and STAR MethodsContinuous multivariate GWAS for behavioral traits” & “Statistical Significance and Functional Dissection of GWAS” sections.

Expanded version of Table 1 for the results derived from the continuous multivariate GWAS for behavioral traits using the pooled set of individuals. For each locus we report genomic coordinates in assembly GRCh38, the lead variant rsID with corresponding genomic position and p value, and the cluster of wearable-derived static features used as multivariate response variable (“GWAS run”). Additionally, we annotate whether the locus overlaps an ENCODE4 candidate cis-regulatory element (cCRE), and its proximal brain or neuropsychiatry-related genes and eGenes (i.e., proximal genes with an eQTL overlapping the locus).

4. Table S3. Genome-wide significant loci identified by the continuous multivariate GWAS for ADHD, related to Figure 4, Table 1, and STAR MethodsGWASs for ADHD” & “Statistical Significance and Functional Dissection of GWAS” sections.

Expanded version of Table 1 for the results derived from the continuous multivariate GWAS for ADHD. For each locus we report genomic coordinates in assembly GRCh38, the lead variant rsID with corresponding genomic position and p value, and the cluster of wearable-derived static features used as multivariate response variable (“GWAS run”). Additionally, we annotate whether the locus overlaps an ENCODE4 candidate cis-regulatory element (cCRE), and its proximal brain or neuropsychiatry-related genes and eGenes (i.e., proximal genes with an eQTL overlapping the locus).

5. Table S4. Genome-wide significant loci identified by the continuous univariate GWAS for ADHD, related to Figure 4, Table 1, and STAR MethodsGWASs for ADHD” & “Statistical Significance and Functional Dissection of GWAS” sections.

Expanded version of Table 1 for the results derived from the continuous univariate GWAS for ADHD. For each locus we report genomic coordinates in assembly GRCh38, the lead variant rsID with corresponding genomic position and p value, and the wearable combination score used as variable response (“GWAS run”). Additionally, we annotate whether the locus overlaps an ENCODE4 candidate cis-regulatory element (cCRE), and its proximal brain or neuropsychiatry-related genes and eGenes (i.e. proximal genes with an eQTL overlapping the locus).

2. Table S1. Psychiatric cohorts and sample sizes, related to Figure 1 and STAR MethodsDataset Description” section.

The nine distinct psychiatric disorders cohorts considered in our study, along with their sample sizes estimated before the QC preprocessing and filtering.

3. Table S2. Catalogs of wearable-derived static features and genetic variants identified by the wearable GWASs, related to Figure 2, Figure 4, Figure 5, Table 1, Figure S4, Figure S5, Table S3, Table S5, and STAR MethodsMachine Learning Classifier”, “GWASs for ADHD”, and “Continuous Multivariate GWAS for Behavioral Traits” Sections.

A) 258 Wearable-derived Static Features. This table shows the static features derived from the Fitbit data. The individual-level features listed in the first column are calculated directly from the data of individual participants based on summarization of individual-level measurements. In contrast, the population-level summary statistics are compiled from data across all cohorts’ participants. It contains the mean, standard deviation (sd), median, minimum (min), maximum (max), range, skewness, kurtosis, and standard error (se) of each feature.

B) ANOVA on wearable features and covariates across different psychiatric disorder groups. This table shows the result of an ANOVA test, focusing on wearable features and covariates across different psychiatric disorder groups. The table shows results that are statistically significant, suggesting the importance of using covariates in our downstream modeling analyses. Only comparisons reporting p value < 0.05 and R-square > 0.05 are shown.

C) Significant features identified by the post-hoc test for the continuous multivariate GWAS hits for ADHD. Specifically, we report results of the post-hoc univariate test conducted on the two significant loci reported in Table S3. For each locus, we list the wearable-derived static features from the corresponding cluster that show significant differences among genotype groups of individuals, considering the genotype of the individuals at the lead variant for the locus (see Table S3). For a description of each feature see Data S8-S9. We performed pairwise comparisons separately for individuals with ADHD and control individuals. For each comparison we report the feature, the two genotype groups used for the comparison, the p value (two-sided Wilcoxon Rank-Sum test), the Benjamini-Hochberg False Discovery Rate, and the disease group (ctrl = control individuals; ADHD = individuals with ADHD). We included features that show an FDR value < 0.1 in at least one of the three ADHD comparisons. See also Data S38-S39.

D) Significant features identified by the post-hoc test for the continuous multivariate GWAS for behavioral traits. Specifically, we report results of the post-hoc univariate test conducted on the four significant loci reported in Table S5. For each locus, we list the wearable-derived static features from the corresponding cluster that show significant differences among genotype groups of individuals, considering the genotype of the individuals at the lead variant for the locus (see Table S5). For a description of each feature see Data S8-S9. For each comparison we report the feature, the two genotype groups used for the comparison, the p value (two-sided Wilcoxon Rank-Sum test), and the Benjamini-Hochberg False Discovery Rate. We included features that showed an FDR value < 0.1 in at least one of the three genotype comparisons. See also Data S45-S48.

1

Document S1. Data S1-S52.

7

Figure S1. Overview of the study design, related to Figure 1 and STAR MethodsDataset Description” section.

A) Schematic describing potential causal relationships linking environment, genotype, macrophenotype (psychiatric diagnosis), and digital phenotype. The direction of the arrows signifies causal direction. Solid lines in the figure denote causal directions explored in this study, and dotted lines represent other possible directions of causality. Pathways with both solid and dotted lines may show bidirectional causal association (e.g. positive feedback loops).

B) Example of raw wearable-derived dynamic data, showcasing time-series profiles of the seven channels collected over a selected period of 96 hours for three individuals of the ABCD cohort.

Figure S2. Processing pipeline and deep learning architecture, related to Figure 2 and STAR MethodsDataset Description” and “Multichannel Time Series Classifier” sections.

A) Flowchart of data preprocessing, imputation and covariates integration. First, seven modalities of data are combined and transformed into n data frames corresponding to n samples. An iterative examination is performed to find out all possible time windows with continuous time-series data avoiding noise or outliers, and the optimal time window is selected based on statistical power and data quality. Some samples are filtered out due to inadequate measurements compared to the defined threshold. During the data imputation process, missing categorical data is marked as ‘not recorded’ and missing quantitative data is imputed through sk-time drift method. To indicate whether data points at the current time step are imputed or not, a binary indicator channel, ‘*_flag’, is added to each modality. This channel enables the model to distinguish between imputed and non-imputed data, improving the model’s performance.84 Covariate data is also converted into time series format as time-invariant values. Finally, digital signature, imputation indicator and covariates are combined for modeling.

B) Deep learning architecture of XceptionTime Encoder. The architecture starts with 1d-CNNs with various sizes of convolutional filters to create multi-level receptive fields. It is followed by MaxPooling to reduce dimensionality and Batch Normalization to normalize the activations. Finally, the ReLU activation functions are applied to add non-linearity.

Figure S3. Model comparisons and loss curves, related to Figure 3 and STAR MethodsMachine Learning Classifier” and “Multichannel Time Series Classifier” sections.

A) Comparison of AUROC performance for predictive models of ADHD. The comparison includes XGBoost model (details in Data S20-S28), MiniRocket, a recurrent neural network (RNN) architecture, and the best performing Xception model. MiniRocket is a non-deep learning algorithm that uses convolutional filters (varying in weights, sizes, strides, dilations, and paddings) along the original dataset to generate deterministic summary statistics of the transformed time-series. The summary statistics are then fed into a logistic regression model.

B) Training and validation loss over 30 epochs of the ADHD Xception model. The close alignment of training and validation loss demonstrates that the model generalizes well to unseen data and has a low risk of overfitting.

Figure S4. Statistical details of the multivariate and univariate GWASs for ADHD, related to Figure 4, Table 1, Table S3, Table S4, and STAR MethodsGWASs for ADHD” and “Statistical Significance and Functional Dissection of GWAS” sections.

A) Schematic outlining the types of association tests implemented in each GWAS. On the left (Case-Control GWAS), we use a traditional framework to test for associations between genetic variants (genotype, g) and ADHD diagnosis (macrophenotype, m), while adjusting for covariates c. Here, m is a binary univariate response variable (presence or absence of ADHD), and g and c serve as independent variables. On the right (Wearable GWAS), we perform two types of association tests. 1) Continuous & Multivariate Response: In this case, the response variable is a vector d of wearable-derived features, which are not directly related to ADHD. To evaluate the relevance of genetic associations to ADHD, we introduce an interaction term g×m, which allows us to assess genetic effects on wearable-derived features specifically in the context of ADHD. Here, the wearable feature vector d serves as the continuous multivariate response, while g, m, g×m, and c serve as the independent variables. 2) Continuous & Univariate Response: In this case, the response variable is the wearable-derived combination score dscore, which represents a non-linear combination of wearable features. This score is generated through our supervised modeling framework aimed at classifying individuals with and without ADHD, thus making dscore implicitly related to m. (Details in Data S37, S40-S43.)

B) Hierarchical Two-step Test of association implemented for the multivariate GWAS. Consider p wearable features (in our case p = 258) measured across n individuals. When testing for genetic associations with each independent feature (Feature-wise Test, left panel), the study-wide significance threshold is 5108/p (i.e., the genome-wide significance divided by the number of independent tests). However, it is possible to reduce this multiple testing correction burden by performing a Hierarchical Two-Step Test (right panel). In step 1, p features are grouped into Q clusters of correlated features, and we test the association between each multivariate cluster of features q and g. The correlated nature of the features within a particular cluster enhances association power. The study-wide significance threshold in this case becomes 5108/Q. Given that Qp, this strategy translates to an improvement of ~1.5 orders of magnitude in the stringency of the threshold, compared to the Feature-wise Test. Since Step 1 does not specify which individual features within the cluster drive the significant association with the variant, in Step 2 we perform post-hoc univariate tests to independently test each of the r features within the significant cluster and assess differences in a given wearable feature between individuals carrying different genotypes (two-sided Wilcoxon Rank-Sum test with Benjamini-Hochberg FDR correction). (Details in Data S38-S39, S45-S48, and Table S2.)

C) Dot plot showing the power (y axis) of continuous and binary response variables for increasing genotype effects (beta coefficient; x axis). The type of response is color-coded to reflect the binary and continuous nature of macrophenotype and digital phenotype, respectively. Power is defined as the proportion (%) of linear (continuous) and logistic (binary) regression tests with BH-adjusted p value < 0.05 out of 10,000 simulations. We simulated SNP genotypes considering six different Minor Allele Frequency (MAF) values.

Figure S5. Dissecting genetic associations with behavioral traits, related to Figure 5, Table 1, Table S5, and STAR MethodsContinuous multivariate GWAS for behavioral traits”, “Statistical Significance and Functional Dissection of GWAS” sections.

A) Schematic describing the Wearable GWAS for behavioral traits, where the wearable feature vector d serves as the continuous multivariate response, and the genotype g and covariates c serve as the independent variables. (Details in Data S44, S49-S52.)

B) Enrichment of chromatin features at GWAS locus for heart rate features (chr14:23,392,601–23,418,974). Right panel: Barplot showing the number of epigenomics ENCODE experiments (x axis) with peaks at the locus across different organs and tissues (y axis). Only organs / tissues with a Benjamini-Hochberg adjusted p value < 0.1 (two-sided Fisher test) are shown. The type of experiment is color-coded (purple: DNase-seq; blue: ATAC-seq; magenta: Mint ChIP-seq; orange: Histone ChIP-seq). Left panel: Dot plot showing the log2 odds-ratio (OR) of the significance of peak enrichment across different biosamples.

C-D) GTEx expression levels of proximal genes and eGenes associated with locus chr14:23,392,601–23,418,974. Scatterplot showing the expression level (y axis, log2 scale) of proximal genes (upper panel) and GTEx eGenes (lower panel) across 54 GTEx tissues (x axis). In this case we show the two eGenes (CMTM5 and HOMEZ) that have GTEx eQTLs coinciding with four of the five GWAS variants within the locus. (Details in Table S5). Human tissues are color-coded based on the GTEx palette.

Key Resources Table

REAGENT or RESOURCE SOURCE IDENTIFIER
Antibodies
Bacterial and virus strains
Biological samples
ABCD Smokescreen Genotyping Array ABCD Study abcdstudy.org
Chemicals, peptides, and recombinant proteins
Critical commercial assays
Deposited data
Wearable GWAS summary statistics This paper https://doi.org/10.15154/4mbh-8x45
ABCD Wearable Data ABCD Study abcdstudy.org
ABCD Clinical Data ABCD Study abcdstudy.org
Experimental models: Cell lines
Experimental models: Organisms/strains
Oligonucleotides
Recombinant DNA
Software and algorithms
XceptionTime Chollet et al.41 https://github.com/timeseriesAI/tsai/blob/main/tsai/models/XceptionTime.py
XGBoost Chen et al.40 https://github.com/timeseriesAI/tsai https://xgboost.readthedocs.io/en/stable/
tsai (v0.3.1) Oguiza et al.85 https://github.com/timeseriesAI/tsai
sktime (v0.14.1) Löning et al.86 https://www.sktime.net/en/stable/
PLINK (v2.0) Chang et al.87 https://www.cog-genomics.org/plink/2.0/
Nextflow (v22.04.0) Di Tommaso et al.88 https://www.nextflow.io/
MANTA (v1.0.0) Garrido-Martín et al.46 https://github.com/dgarrimar/manta
FUMA (v1.5.2) Watanabe et al.89 https://fuma.ctglab.nl/
vegan (v2.6–4) Oksanen et al.90 https://cran.r-project.org/web/packages/vegan
MASS (v7.3–60) Venables and Ripley91 https://cran.r-project.org/web/packages/MASS/index.html
coloc (v5.2.2) Giambartolomei et al.92 https://cran.r-project.org/web/packages/coloc/index.html
bedtools (v2.30.0) Quinlan and Hall93 https://bedtools.readthedocs.io/en/latest
CMplot Yin et al.94 https://github.com/YinLiLin/CMplot
Wearable modeling and GWAS workflow This paper https://github.com/gersteinlab/ABCD
Other

Highlights.

  • Uniform processing of wearable & genomic data and integration with AI modeling and GWAS

  • AI framework uses wearable digital phenotypes to better predict psychiatric disorders

  • Univariate & multivariate digital phenotypes can act as a continuous response for GWAS

  • Wearable GWAS detects a larger number of loci compared to traditional case-control GWAS

Acknowledgements

This study was supported by the Albert L. Williams Professorship funds at Yale University and the National Institute on Alcohol Abuse and Alcoholism (R01AA031959, K23AA026890, and U54AA027989).

Data used in the preparation of this article were obtained from the Adolescent Brain Cognitive DevelopmentSM (ABCD) Study (https://abcdstudy.org), held in the NIMH Data Archive (NDA). This is a multisite, longitudinal study designed to recruit more than 10,000 children age 9–10 and follow them over 10 years into early adulthood. The ABCD Study® is supported by the National Institutes of Health and additional federal partners under award numbers U01DA041048, U01DA050989, U01DA051016, U01DA041022, U01DA051018, U01DA051037, U01DA050987, U01DA041174, U01DA041106, U01DA041117, U01DA041028, U01DA041134, U01DA050988, U01DA051039, U01DA041156, U01DA041025, U01DA041120, U01DA051038, U01DA041148, U01DA041093, U01DA041089, U24DA041123, U24DA041147. A full list of supporters is available at https://abcdstudy.org/federal-partners.html. A listing of participating sites and a complete listing of the study investigators can be found at https://abcdstudy.org/consortium_members/. ABCD consortium investigators designed and implemented the study and/or provided data but did not necessarily participate in the analysis or writing of this report. This manuscript reflects the views of the authors and may not reflect the opinions or views of the NIH or ABCD consortium investigators.

Footnotes

Author Contributions

Conceptualization: JJL, BB, WR, MG

Methodology: JJL, BB, YL, SXL, YG, DGM, TLV, GA, JZ, MJG, WR, MG

Investigation: JJL, BB, YL, SXL, YG

Visualization: JJL, BB, YL, SXL, YG

Funding acquisition: WR, MG

Data curation: TLV, WR, MG

Supervision: WR, MG

Writing – original draft: JJL, BB, WR, MG

Writing – review & editing: JJL, BB, YL, SXL, YG, XX, SKL, MJ, DGM, TLV, GA, JZ, MJG, WR, MG

ADDITIONAL RESOURCES

Additional information regarding the ABCD Study can be found at abcdstudy.org.

Declaration of Interests

The authors declare they have no competing interests.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  • 1.Zablotsky B, Terlizzi EP, and National Center for Health Statistics (U.S.) Mental health treatment among children aged 5–17 years : United States, 2019. NCHS data brief,. [PubMed] [Google Scholar]
  • 2.UNICEF (2021). Impact of COVID-19 on poor mental health in children and young people ‘tip of the iceberg’ – UNICEF. https://www.unicef.org/philippines/press-releases/impact-covid-19-poor-mental-health-children-and-young-people-tip-iceberg-unicef#:~:text=According%20to%20the%20latest%20available,needs%20and%20mental%20health%20funding.
  • 3.CDC (2023). Data and Statistics on Children’s Mental Health. https://www.cdc.gov/childrensmentalhealth/data.html.
  • 4.McGorry PD, and Nelson B. (2019). Transdiagnostic psychiatry: premature closure on a crucial pathway to clinical utility for psychiatric diagnosis. World Psychiatry 18, 359–360. 10.1002/wps.20679. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Hartmann JA, McGorry PD, Destree L, Amminger GP, Chanen AM, Davey CG, Ghieh R, Polari A, Ratheesh A, Yuen HP, and Nelson B. (2020). Pluripotential Risk and Clinical Staging: Theoretical Considerations and Preliminary Data From a Transdiagnostic Risk Identification Approach. Front Psychiatry 11, 553578. 10.3389/fpsyt.2020.553578. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Insel T, Cuthbert B, Garvey M, Heinssen R, Pine DS, Quinn K, Sanislow C, and Wang P. (2010). Research domain criteria (RDoC): toward a new classification framework for research on mental disorders. Am J Psychiatry 167, 748–751. 10.1176/appi.ajp.2010.09091379. [DOI] [PubMed] [Google Scholar]
  • 7.Burmeister M, McInnis MG, and Zollner S. (2008). Psychiatric genetics: progress amid controversy. Nat Rev Genet 9, 527–540. 10.1038/nrg2381. [DOI] [PubMed] [Google Scholar]
  • 8.Grimm O, Kranz TM, and Reif A. (2020). Genetics of ADHD: What Should the Clinician Know? Curr Psychiatry Rep 22, 18. 10.1007/s11920-020-1141-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Purves KL, Coleman JRI, Meier SM, Rayner C, Davis KAS, Cheesman R, Baekvad-Hansen M, Borglum AD, Wan Cho S, Jurgen Deckert J, et al. (2020). A major role for common genetic variation in anxiety disorders. Mol Psychiatry 25, 3292–3303. 10.1038/s41380-019-0559-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Andreassen OA, Hindley GFL, Frei O, and Smeland OB (2023). New insights from the last decade of research in psychiatric genetics: discoveries, challenges and clinical implications. World Psychiatry 22, 4–24. 10.1002/wps.21034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Szatmari P, Maziade M, Zwaigenbaum L, Merette C, Roy MA, Joober R, and Palmour R. (2007). Informative phenotypes for genetic studies of psychiatric disorders. Am J Med Genet B Neuropsychiatr Genet 144B, 581–588. 10.1002/ajmg.b.30426. [DOI] [PubMed] [Google Scholar]
  • 12.Sanchez-Roige S, and Palmer AA (2020). Emerging phenotyping strategies will advance our understanding of psychiatric genetics. Nat Neurosci 23, 475–480. 10.1038/s41593-020-0609-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Wendt FR, Pathak GA, Tylee DS, Goswami A, and Polimanti R. (2020). Heterogeneity and Polygenicity in Psychiatric Disorders: A Genome-Wide Perspective. Chronic Stress (Thousand Oaks) 4, 2470547020924844. 10.1177/2470547020924844. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Demontis D, Walters GB, Athanasiadis G, Walters R, Therrien K, Nielsen TT, Farajzadeh L, Voloudakis G, Bendl J, Zeng B, et al. (2023). Genome-wide analyses of ADHD identify 27 risk loci, refine the genetic architecture and implicate several cognitive domains. Nat Genet 55, 198–208. 10.1038/s41588-022-01285-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Demontis D, Walters RK, Martin J, Mattheisen M, Als TD, Agerbo E, Baldursson G, Belliveau R, Bybjerg-Grauholm J, Baekvad-Hansen M, et al. (2019). Discovery of the first genome-wide significant risk loci for attention deficit/hyperactivity disorder. Nat Genet 51, 63–75. 10.1038/s41588-018-0269-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Stein MB, Levey DF, Cheng Z, Wendt FR, Harrington K, Pathak GA, Cho K, Quaden R, Radhakrishnan K, Girgenti MJ, et al. (2021). Genome-wide association analyses of post-traumatic stress disorder and its symptom subdomains in the Million Veteran Program. Nat Genet 53, 174–184. 10.1038/s41588-020-00767-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Schizophrenia Working Group of the Psychiatric Genomics, C. (2014). Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427. 10.1038/nature13595. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Grove J, Ripke S, Als TD, Mattheisen M, Walters RK, Won H, Pallesen J, Agerbo E, Andreassen OA, Anney R, et al. (2019). Identification of common genetic risk variants for autism spectrum disorder. Nat Genet 51, 431–444. 10.1038/s41588-019-0344-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Flint J, Timpson N, and Munafo M. (2014). Assessing the utility of intermediate phenotypes for genetic mapping of psychiatric disease. Trends Neurosci 37, 733–741. 10.1016/j.tins.2014.08.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Leuchter AF, Hunter AM, Krantz DE, and Cook IA (2014). Intermediate phenotypes and biomarkers of treatment outcome in major depressive disorder. Dialogues Clin Neurosci 16, 525–537. 10.31887/DCNS.2014.16.4/aleuchter. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Meyer-Lindenberg A, and Weinberger DR (2006). Intermediate phenotypes and genetic mechanisms of psychiatric disorders. Nat Rev Neurosci 7, 818–827. 10.1038/nrn1993. [DOI] [PubMed] [Google Scholar]
  • 22.Babu M, Lautman Z, Lin X, Sobota MHB, and Snyder MP (2024). Wearable Devices: Implications for Precision Medicine and the Future of Health Care. Annu Rev Med 75, 401–415. 10.1146/annurev-med-052422-020437. [DOI] [PubMed] [Google Scholar]
  • 23.Koppe G, Guloksuz S, Reininghaus U, and Durstewitz D. (2019). Recurrent Neural Networks in Mobile Sampling and Intervention. Schizophr Bull 45, 272–276. 10.1093/schbul/sby171. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Gomes N, Pato M, Lourenco AR, and Datia N. (2023). A Survey on Wearable Sensors for Mental Health Monitoring. Sensors (Basel) 23. 10.3390/s23031330. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Dunn J, Kidzinski L, Runge R, Witt D, Hicks JL, Schussler-Fiorenza Rose SM, Li X, Bahmani A, Delp SL, Hastie T, and Snyder MP (2021). Wearable sensors enable personalized predictions of clinical laboratory measurements. Nat Med 27, 1105–1112. 10.1038/s41591-021-01339-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Insel TR (2017). Digital Phenotyping: Technology for a New Science of Behavior. JAMA 318, 1215–1216. 10.1001/jama.2017.11295. [DOI] [PubMed] [Google Scholar]
  • 27.Welch V, Wy TJ, Ligezka A, Hassett LC, Croarkin PE, Athreya AP, and Romanowicz M. (2022). Use of Mobile and Wearable Artificial Intelligence in Child and Adolescent Psychiatry: Scoping Review. J Med Internet Res 24, e33560. 10.2196/33560. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Damme K, Vargas T, Walther S, Shankman S, and Mittal V. (2023). Physical and Mental Health in Adolescence: Novel Insights from a transdiagnostic examination of FitBit data in the ABCD Study. Res Sq. 10.21203/rs.3.rs-3270112/v1. [DOI] [PMC free article] [PubMed]
  • 29.Nagata JM, Alsamman S, Smith N, Yu J, Ganson KT, Dooley EE, Wing D, Baker FC, and Pettee Gabriel K. (2023). Social epidemiology of Fitbit daily steps in early adolescence. Pediatr Res 94, 1838–1844. 10.1038/s41390-023-02700-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Cronbach LJ, and Meehl PE (1955). Construct validity in psychological tests. Psychol Bull 52, 281–302. 10.1037/h0040957. [DOI] [PubMed] [Google Scholar]
  • 31.Jablensky A. (2016). Psychiatric classifications: validity and utility. World Psychiatry 15, 26–31. 10.1002/wps.20284. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Sen B, Borle NC, Greiner R, and Brown MRG (2018). A general prediction model for the detection of ADHD and Autism using structural and functional MRI. PLoS One 13, e0194856. 10.1371/journal.pone.0194856. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Kim WP, Kim HJ, Pack SP, Lim JH, Cho CH, and Lee HJ (2023). Machine Learning-Based Prediction of Attention-Deficit/Hyperactivity Disorder and Sleep Problems With Wearable Data in Children. JAMA Netw Open 6, e233502. 10.1001/jamanetworkopen.2023.3502. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Volkow ND, Koob GF, Croyle RT, Bianchi DW, Gordon JA, Koroshetz WJ, Perez-Stable EJ, Riley WT, Bloch MH, Conway K, et al. (2018). The conception of the ABCD study: From substance use to a broad NIH collaboration. Dev Cogn Neurosci 32, 4–7. 10.1016/j.dcn.2017.10.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.van Dijk MT, Murphy E, Posner JE, Talati A, and Weissman MM (2021). Association of Multigenerational Family History of Depression With Lifetime Depressive and Other Psychiatric Disorders in Children: Results from the Adolescent Brain Cognitive Development (ABCD) Study. JAMA Psychiatry 78, 778–787. 10.1001/jamapsychiatry.2021.0350. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Mishra T, Wang M, Metwally AA, Bogu GK, Brooks AW, Bahmani A, Alavi A, Celli A, Higgs E, Dagan-Rosenfeld O, et al. (2020). Pre-symptomatic detection of COVID-19 from smartwatch data. Nat Biomed Eng 4, 1208–1220. 10.1038/s41551-020-00640-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Liu J, Spakowicz DJ, Ash GI, Hoyd R, Ahluwalia R, Zhang A, Lou S, Lee D, Zhang J, Presley C, et al. (2021). Bayesian structural time series for biomedical sensor data: A flexible modeling framework for evaluating interventions. PLoS Comput Biol 17, e1009303. 10.1371/journal.pcbi.1009303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Bent B, Wang K, Grzesiak E, Jiang C, Qi Y, Jiang Y, Cho P, Zingler K, Ogbeide FI, Zhao A, et al. (2020). The digital biomarker discovery pipeline: An open-source software platform for the development of digital biomarkers using mHealth and wearables data. J Clin Transl Sci 5, e19. 10.1017/cts.2020.511. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Salari N, Ghasemi H, Abdoli N, Rahmani A, Shiri MH, Hashemian AH, Akbari H, and Mohammadi M. (2023). The global prevalence of ADHD in children and adolescents: a systematic review and meta-analysis. Ital J Pediatr 49, 48. 10.1186/s13052-023-01456-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Chen TQ, and Guestrin C. (2016). XGBoost: A Scalable Tree Boosting System. Kdd'16: Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, 785–794. 10.1145/2939672.2939785. [DOI] [Google Scholar]
  • 41.Chollet F. (2017). Xception: Deep Learning with Depthwise Separable Convolutions. Proc Cvpr Ieee, 1800–1807. 10.1109/Cvpr.2017.195. [DOI]
  • 42.Kofler MJ, Soto EF, Fosco WD, Irwin LN, Wells EL, and Sarver DE (2020). Working memory and information processing in ADHD: Evidence for directionality of effects. Neuropsychology 34, 127–143. 10.1037/neu0000598. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, and Batra D. (2017). Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. Ieee I Conf Comp Vis, 618–626. 10.1109/Iccv.2017.74. [DOI]
  • 44.Imeraj L, Antrop I, Roeyers H, Swanson J, Deschepper E, Bal S, and Deboutte D. (2012). Time-of-day effects in arousal: disrupted diurnal cortisol profiles in children with ADHD. J Child Psychol Psychiatry 53, 782–789. 10.1111/j.1469-7610.2012.02526.x. [DOI] [PubMed] [Google Scholar]
  • 45.Alvaro PK, Roberts RM, and Harris JK (2013). A Systematic Review Assessing Bidirectionality between Sleep Disturbances, Anxiety, and Depression. Sleep 36, 1059–1068. 10.5665/sleep.2810. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Garrido-Martin D, Calvo M, Reverter F, and Guigo R. (2023). A fast non-parametric test of association for multiple traits. Genome Biol 24, 230. 10.1186/s13059-023-03076-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Henriquez-Henriquez M, Acosta MT, Martinez AF, Velez JI, Lopera F, Pineda D, Palacio JD, Quiroga T, Worgall TS, Deckelbaum RJ, et al. (2020). Mutations in sphingolipid metabolism genes are associated with ADHD. Transl Psychiatry 10, 231. 10.1038/s41398-020-00881-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Karlsson Linner R, Mallard TT, Barr PB, Sanchez-Roige S, Madole JW, Driver MN, Poore HE, de Vlaming R, Grotzinger AD, Tielbeek JJ, et al. (2021). Multivariate analysis of 1.5 million people identifies genetic associations with traits related to self-regulation and addiction. Nat Neurosci 24, 1367–1376. 10.1038/s41593-021-00908-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Cosentino J, Behsaz B, Alipanahi B, McCaw ZR, Hill D, Schwantes-An TH, Lai D, Carroll A, Hobbs BD, Cho MH, et al. (2023). Inference of chronic obstructive pulmonary disease with deep learning on raw spirograms identifies new genetic loci and improves risk models. Nat Genet 55, 787–795. 10.1038/s41588-023-01372-4. [DOI] [PubMed] [Google Scholar]
  • 50.Alipanahi B, Hormozdiari F, Behsaz B, Cosentino J, McCaw ZR, Schorsch E, Sculley D, Dorfman EH, Foster PJ, Peng LH, et al. (2021). Large-scale machine-learning-based phenotyping significantly improves genomic discovery for optic nerve head morphology. Am J Hum Genet 108, 1217–1230. 10.1016/j.ajhg.2021.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Chen YC, Sudre G, Sharp W, Donovan F, Chandrasekharappa SC, Hansen N, Elnitski L, and Shaw P. (2018). Neuroanatomic, epigenetic and genetic differences in monozygotic twins discordant for attention deficit hyperactivity disorder. Mol Psychiatry 23, 683–690. 10.1038/mp.2017.45. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Fan Z, Qian Y, Lu Q, Wang Y, Chang S, and Yang L. (2018). DLGAP1 and NMDA receptor-associated postsynaptic density protein genes influence executive function in attention deficit hyperactivity disorder. Brain Behav 8, e00914. 10.1002/brb3.914. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Royston P, Altman DG, and Sauerbrei W. (2006). Dichotomizing continuous predictors in multiple regression: a bad idea. Stat Med 25, 127–141. 10.1002/sim.2331. [DOI] [PubMed] [Google Scholar]
  • 54.Breitling LP, and Brenner H. (2010). Odd odds interactions introduced through dichotomisation of continuous outcomes. J Epidemiol Community Health 64, 300–303. 10.1136/jech.2009.089458. [DOI] [PubMed] [Google Scholar]
  • 55.Schmitz S, Adams R, and Walsh C. (2012). The use of continuous data versus binary data in MTC models: a case study in rheumatoid arthritis. BMC Med Res Methodol 12, 167. 10.1186/1471-2288-12-167. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Schandry R. (1981). Heart beat perception and emotional experience. Psychophysiology 18, 483–488. 10.1111/j.1469-8986.1981.tb02486.x. [DOI] [PubMed] [Google Scholar]
  • 57.Hsueh B, Chen R, Jo Y, Tang D, Raffiee M, Kim YS, Inoue M, Randles S, Ramakrishnan C, Patel S, et al. (2023). Cardiogenic control of affective behavioural state. Nature 615, 292–299. 10.1038/s41586-023-05748-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Consortium EP, Moore JE, Purcaro MJ, Pratt HE, Epstein CB, Shoresh N, Adrian J, Kawli T, Davis CA, Dobin A, et al. (2020). Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699–710. 10.1038/s41586-020-2493-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Consortium GT (2020). The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330. 10.1126/science.aaz1776. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Landrum MJ, Chitipiralla S, Brown GR, Chen C, Gu B, Hart J, Hoffman D, Jang W, Kaur K, Liu C, et al. (2020). ClinVar: improvements to accessing data. Nucleic Acids Res 48, D835–D844. 10.1093/nar/gkz972. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Fu T, Chen M, Xu L, Gong J, Zheng J, Zhang F, and Ji N. (2021). Association of the MYH6 Gene Polymorphism with the Risk of Atrial Fibrillation and Warfarin Anticoagulation Therapy. Genet Test Mol Biomarkers 25, 590–599. 10.1089/gtmb.2021.0025. [DOI] [PubMed] [Google Scholar]
  • 62.Rangaraju A, Krishnan S, Aparna G, Sankaran S, Mannan AU, and Rao BH (2018). Genetic variants in post myocardial infarction patients presenting with electrical storm of unstable ventricular tachycardia. Indian Pacing Electrophysiol J 18, 91–94. 10.1016/j.ipej.2018.01.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Eijgelsheim M, Newton-Cheh C, Sotoodehnia N, de Bakker PI, Muller M, Morrison AC, Smith AV, Isaacs A, Sanna S, Dorr M, et al. (2010). Genome-wide association analysis identifies multiple loci related to resting heart rate. Hum Mol Genet 19, 3885–3894. 10.1093/hmg/ddq303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Faurholt-Jepsen M, Kessing LV, and Munkholm K. (2017). Heart rate variability in bipolar disorder: A systematic review and meta-analysis. Neurosci Biobehav Rev 73, 68–80. 10.1016/j.neubiorev.2016.12.007. [DOI] [PubMed] [Google Scholar]
  • 65.Carr O, de Vos M, and Saunders KEA (2018). Heart rate variability in bipolar disorder and borderline personality disorder: a clinical review. Evid Based Ment Health 21, 23–30. 10.1136/eb-2017-102760. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Stautland A, Jakobsen P, Fasmer OB, Osnes B, Torresen J, Nordgreen T, and Oedegaard KJ (2022). Heart rate variability as biomarker for bipolar disorder. medRxiv, 2022.2002.2014.22269413. 10.1101/2022.02.14.22269413. [DOI]
  • 67.Dieckmann L, Cole S, and Kumsta R. (2020). Stress genomics revisited: gene co-expression analysis identifies molecular signatures associated with childhood adversity. Transl Psychiatry 10, 34. 10.1038/s41398-020-0730-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Hesdorffer DC, Ludvigsson P, Olafsson E, Gudmundsson G, Kjartansson O, and Hauser WA (2004). ADHD as a risk factor for incident unprovoked seizures and epilepsy in children. Arch Gen Psychiatry 61, 731–736. 10.1001/archpsyc.61.7.731. [DOI] [PubMed] [Google Scholar]
  • 69.Dolan J, and Mitchell KJ (2013). Mutation of Elfn1 in mice causes seizures and hyperactivity. PLoS One 8, e80491. 10.1371/journal.pone.0080491. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Wang Z, Zhou X, Gui Y, Liu M, and Lu H. (2023). Multiple measurement analysis of resting-state fMRI for ADHD classification in adolescent brain from the ABCD study. Transl Psychiatry 13, 45. 10.1038/s41398-023-02309-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Obermeyer Z, Powers B, Vogeli C, and Mullainathan S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science 366, 447–453. 10.1126/science.aax2342. [DOI] [PubMed] [Google Scholar]
  • 72.Dane AV, Schachar RJ, and Tannock R. (2000). Does actigraphy differentiate ADHD subtypes in a clinical research setting? J Am Acad Child Adolesc Psychiatry 39, 752–760. 10.1097/00004583-200006000-00014. [DOI] [PubMed] [Google Scholar]
  • 73.Alfano CA, Ginsburg GS, and Kingery JN (2007). Sleep-related problems among children and adolescents with anxiety disorders. J Am Acad Child Adolesc Psychiatry 46, 224–232. 10.1097/01.chi.0000242233.06011.8e. [DOI] [PubMed] [Google Scholar]
  • 74.Kurilshikov A, Medina-Gomez C, Bacigalupe R, Radjabzadeh D, Wang J, Demirkan A, Le Roy CI, Raygoza Garay JA, Finnicum CT, Liu X, et al. (2021). Large-scale association analyses identify host factors influencing human gut microbiome composition. Nat Genet 53, 156–165. 10.1038/s41588-020-00763-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Beauchaine TP, and Thayer JF (2015). Heart rate variability as a transdiagnostic biomarker of psychopathology. Int J Psychophysiol 98, 338–350. 10.1016/j.ijpsycho.2015.08.004. [DOI] [PubMed] [Google Scholar]
  • 76.Singh I, and Rose N. (2009). Biomarkers in psychiatry. Nature 460, 202–207. 10.1038/460202a. [DOI] [PubMed] [Google Scholar]
  • 77.Bianchi DW, Brennan PF, Chiang MF, Criswell LA, D'Souza RN, Gibbons GH, Gilman JK, Gordon JA, Green ED, Gregurick S, et al. (2024). The All of Us Research Program is an opportunity to enhance the diversity of US biomedical research. Nat Med 30, 330–333. 10.1038/s41591-023-02744-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.All of Us Research Program Genomics, I. (2024). Genomic data in the All of Us Research Program. Nature 627, 340–346. 10.1038/s41586-023-06957-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Olfson M, Wall MM, Wang S, Laje G, and Blanco C. (2023). Treatment of US Children With Attention-Deficit/Hyperactivity Disorder in the Adolescent Brain Cognitive Development Study. JAMA Netw Open 6, e2310999. 10.1001/jamanetworkopen.2023.10999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Duffy KA, Gandhi R, Falke C, Wiglesworth A, Mueller BA, Fiecas MB, Klimes-Dougan B, Luciana M, and Cullen KR (2023). Psychiatric Diagnoses and Treatment in Nine- to Ten-Year-Old Participants in the ABCD Study. JAACAP Open 1, 36–47. 10.1016/j.jaacop.2023.03.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Master H, Annis J, Huang S, Beckman JA, Ratsimbazafy F, Marginean K, Carroll R, Natarajan K, Harrell FE, Roden DM, et al. (2022). Association of step counts over time with the risk of chronic disease in the All of Us Research Program. Nat Med 28, 2301–2308. 10.1038/s41591-022-02012-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Girgenti MJ, Wang J, Ji D, Cruz DA, Traumatic Stress Brain Research, G., Stein MB, Gelernter J, Young KA, Huber BR, Williamson DE, et al. (2021). Transcriptomic organization of the human brain in post-traumatic stress disorder. Nat Neurosci 24, 24–33. 10.1038/s41593-020-00748-7. [DOI] [PubMed] [Google Scholar]
  • 83.Datta D, Arion D, Roman KM, Volk DW, and Lewis DA (2017). Altered Expression of ARP2/3 Complex Signaling Pathway Genes in Prefrontal Layer 3 Pyramidal Cells in Schizophrenia. Am J Psychiatry 174, 163–171. 10.1176/appi.ajp.2016.16020204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Lipton ZC, Kale DC, and Wetzel R. (2016). Modeling Missing Data in Clinical Time Series with RNNs. arXiv:1606.04130. 10.48550/arXiv.1606.04130. [DOI]
  • 85.Oguiza I. (2022). tsai - A state-of-the-art deep learning library for time series and sequential data. https://github.com/timeseriesAI/tsai.
  • 86.Löning M, Bagnall A, Ganesh S, Kazakov V, Lines J, and Király FJ (2019). sktime: A Unified Interface for Machine Learning with Time Series. 33rd Conference on Neural Information Processing Systems (NeurIPS; 2019). [Google Scholar]
  • 87.Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, and Lee JJ (2015). Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7. 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, and Notredame C. (2017). Nextflow enables reproducible computational workflows. Nat Biotechnol 35, 316–319. 10.1038/nbt.3820. [DOI] [PubMed] [Google Scholar]
  • 89.Watanabe K, Taskesen E, van Bochoven A, and Posthuma D. (2017). Functional mapping and annotation of genetic associations with FUMA. Nat Commun 8, 1826. 10.1038/s41467-017-01261-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Oksanen J, Simpson GL, Blanchet FG, Kindt R, Legendre P, Minchin PR, O'Hara RB, Solymos P, Stevens MHH, Szoecs E, et al. (2024). vegan: Community Ecology Package.
  • 91.Venables WN, and Ripley BD (2002). Modern Applied Statistics with S, Fourth Edition (Springer; ). [Google Scholar]
  • 92.Giambartolomei C, Vukcevic D, Schadt EE, Franke L, Hingorani AD, Wallace C, and Plagnol V. (2014). Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet 10, e1004383. 10.1371/journal.pgen.1004383. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Quinlan AR, and Hall IM (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842. 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Yin L, Zhang H, Tang Z, Xu J, Yin D, Zhang Z, Yuan X, Zhu M, Zhao S, Li X, and Liu X. (2021). rMVP: A Memory-efficient, Visualization-enhanced, and Parallel-accelerated Tool for Genome-wide Association Study. Genomics Proteomics Bioinformatics 19, 619–628. 10.1016/j.gpb.2020.10.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Lederer L, Breton A, Jeong H, Master H, Roghanizad AR, and Dunn J. (2023). The Importance of Data Quality Control in Using Fitbit Device Data From the Research Program. JMIR Mhealth Uhealth 11, e45103. 10.2196/45103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Chan A, Chan D, Lee H, Ng CC, and Yeo AHL (2022). Reporting adherence, validity and physical activity measures of wearable activity trackers in medical research: A systematic review. Int J Med Inform 160, 104696. 10.1016/j.ijmedinf.2022.104696. [DOI] [PubMed] [Google Scholar]
  • 97.Bagot KS, Matthews SA, Mason M, Squeglia LM, Fowler J, Gray K, Herting M, May A, Colrain I, Godino J, et al. (2018). Current, future and potential use of mobile and wearable technologies and social media data in the ABCD study to increase understanding of contributors to child health. Dev Cogn Neurosci 32, 121–129. 10.1016/j.dcn.2018.03.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Kingma DP (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  • 99.Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin ZM, Gimelshein N, Antiga L, et al. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. Adv Neur In 32. [Google Scholar]
  • 100.Marees AT, de Kluiver H, Stringer S, Vorspan F, Curis E, Marie-Claire C, and Derks EM (2018). A tutorial on conducting genome-wide association studies: Quality control and statistical analysis. Int J Methods Psychiatr Res 27, e1608. 10.1002/mpr.1608. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101.Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, and Sham PC (2007). PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81, 559–575. 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Manrai AK, Ioannidis JPA, and Patel CJ (2019). Signals Among Signals: Prioritizing Nongenetic Associations in Massive Data Sets. Am J Epidemiol 188, 846–850. 10.1093/aje/kwz031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Almli LM, Duncan R, Feng H, Ghosh D, Binder EB, Bradley B, Ressler KJ, Conneely KN, and Epstein MP (2014). Correcting systematic inflation in genetic association tests that consider interaction effects: application to a genome-wide association study of posttraumatic stress disorder. JAMA Psychiatry 71, 1392–1399. 10.1001/jamapsychiatry.2014.1339. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Voorman A, Lumley T, McKnight B, and Rice K. (2011). Behavior of QQ-plots and genomic control in studies of gene-environment interaction. PLoS One 6, e19416. 10.1371/journal.pone.0019416. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Yang L, Sadler MC, and Altman RB (2023). Genetic association studies using disease liabilities from deep neural networks. medRxiv. 10.1101/2023.01.18.23284383. [DOI] [PMC free article] [PubMed]
  • 106.Stephens M. (2013). A unified framework for association analysis with multiple related phenotypes. PLoS One 8, e65245. 10.1371/journal.pone.0065245. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.Wise AL, Gyi L, and Manolio TA (2013). eXclusion: toward integrating the X chromosome in genome-wide association analyses. Am J Hum Genet 92, 643–647. 10.1016/j.ajhg.2013.03.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.Sun L, Wang Z, Lu T, Manolio TA, and Paterson AD (2023). eXclusionarY: 10 years later, where are the sex chromosomes in GWASs? Am J Hum Genet 110, 903–912. 10.1016/j.ajhg.2023.04.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Frankish A, Carbonell-Sala S, Diekhans M, Jungreis I, Loveland JE, Mudge JM, Sisu C, Wright JC, Arnan C, Barnes I, et al. (2023). GENCODE: reference annotation for the human and mouse genomes in 2023. Nucleic Acids Res 51, D942–D949. 10.1093/nar/gkac1071. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Ochoa D, Hercules A, Carmona M, Suveges D, Baker J, Malangone C, Lopez I, Miranda A, Cruz-Castillo C, Fumis L, et al. (2023). The next-generation Open Targets Platform: reimagined, redesigned, rebuilt. Nucleic Acids Res 51, D1353–D1359. 10.1093/nar/gkac1046. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111.Vosa U, Claringbould A, Westra HJ, Bonder MJ, Deelen P, Zeng B, Kirsten H, Saha A, Kreuzhuber R, Yazar S, et al. (2021). Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat Genet 53, 1300–1310. 10.1038/s41588-021-00913-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112.Wang D, Liu S, Warrell J, Won H, Shi X, Navarro FCP, Clarke D, Gu M, Emani P, Yang YT, et al. (2018). Comprehensive functional genomic resource and integrative model for the human brain. Science 362. 10.1126/science.aat8464. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113.Bryois J, Calini D, Macnair W, Foo L, Urich E, Ortmann W, Iglesias VA, Selvaraj S, Nutma E, Marzin M, et al. (2022). Cell-type-specific cis-eQTLs in eight human brain cell types identify novel risk genes for psychiatric and neurological disorders. Nat Neurosci 25, 1104–1112. 10.1038/s41593-022-01128-z. [DOI] [PubMed] [Google Scholar]
  • 114.Rozowsky J, Gao J, Borsari B, Yang YT, Galeev T, Gursoy G, Epstein CB, Xiong K, Xu J, Li T, et al. (2023). The EN-TEx resource of multi-tissue personal epigenomes & variant-impact models. Cell 186, 1493–1511 e1440. 10.1016/j.cell.2023.02.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115.Jansen PR, Watanabe K, Stringer S, Skene N, Bryois J, Hammerschlag AR, de Leeuw CA, Benjamins JS, Munoz-Manchado AB, Nagel M, et al. (2019). Genome-wide analysis of insomnia in 1,331,010 individuals identifies new risk loci and functional pathways. Nat Genet 51, 394–403. 10.1038/s41588-018-0333-3. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

6. Table S5. Genome-wide significant loci identified by the continuous multivariate GWAS for behavioral traits, related to Figure 5, Table 1, and STAR MethodsContinuous multivariate GWAS for behavioral traits” & “Statistical Significance and Functional Dissection of GWAS” sections.

Expanded version of Table 1 for the results derived from the continuous multivariate GWAS for behavioral traits using the pooled set of individuals. For each locus we report genomic coordinates in assembly GRCh38, the lead variant rsID with corresponding genomic position and p value, and the cluster of wearable-derived static features used as multivariate response variable (“GWAS run”). Additionally, we annotate whether the locus overlaps an ENCODE4 candidate cis-regulatory element (cCRE), and its proximal brain or neuropsychiatry-related genes and eGenes (i.e., proximal genes with an eQTL overlapping the locus).

4. Table S3. Genome-wide significant loci identified by the continuous multivariate GWAS for ADHD, related to Figure 4, Table 1, and STAR MethodsGWASs for ADHD” & “Statistical Significance and Functional Dissection of GWAS” sections.

Expanded version of Table 1 for the results derived from the continuous multivariate GWAS for ADHD. For each locus we report genomic coordinates in assembly GRCh38, the lead variant rsID with corresponding genomic position and p value, and the cluster of wearable-derived static features used as multivariate response variable (“GWAS run”). Additionally, we annotate whether the locus overlaps an ENCODE4 candidate cis-regulatory element (cCRE), and its proximal brain or neuropsychiatry-related genes and eGenes (i.e., proximal genes with an eQTL overlapping the locus).

5. Table S4. Genome-wide significant loci identified by the continuous univariate GWAS for ADHD, related to Figure 4, Table 1, and STAR MethodsGWASs for ADHD” & “Statistical Significance and Functional Dissection of GWAS” sections.

Expanded version of Table 1 for the results derived from the continuous univariate GWAS for ADHD. For each locus we report genomic coordinates in assembly GRCh38, the lead variant rsID with corresponding genomic position and p value, and the wearable combination score used as variable response (“GWAS run”). Additionally, we annotate whether the locus overlaps an ENCODE4 candidate cis-regulatory element (cCRE), and its proximal brain or neuropsychiatry-related genes and eGenes (i.e. proximal genes with an eQTL overlapping the locus).

2. Table S1. Psychiatric cohorts and sample sizes, related to Figure 1 and STAR MethodsDataset Description” section.

The nine distinct psychiatric disorders cohorts considered in our study, along with their sample sizes estimated before the QC preprocessing and filtering.

3. Table S2. Catalogs of wearable-derived static features and genetic variants identified by the wearable GWASs, related to Figure 2, Figure 4, Figure 5, Table 1, Figure S4, Figure S5, Table S3, Table S5, and STAR MethodsMachine Learning Classifier”, “GWASs for ADHD”, and “Continuous Multivariate GWAS for Behavioral Traits” Sections.

A) 258 Wearable-derived Static Features. This table shows the static features derived from the Fitbit data. The individual-level features listed in the first column are calculated directly from the data of individual participants based on summarization of individual-level measurements. In contrast, the population-level summary statistics are compiled from data across all cohorts’ participants. It contains the mean, standard deviation (sd), median, minimum (min), maximum (max), range, skewness, kurtosis, and standard error (se) of each feature.

B) ANOVA on wearable features and covariates across different psychiatric disorder groups. This table shows the result of an ANOVA test, focusing on wearable features and covariates across different psychiatric disorder groups. The table shows results that are statistically significant, suggesting the importance of using covariates in our downstream modeling analyses. Only comparisons reporting p value < 0.05 and R-square > 0.05 are shown.

C) Significant features identified by the post-hoc test for the continuous multivariate GWAS hits for ADHD. Specifically, we report results of the post-hoc univariate test conducted on the two significant loci reported in Table S3. For each locus, we list the wearable-derived static features from the corresponding cluster that show significant differences among genotype groups of individuals, considering the genotype of the individuals at the lead variant for the locus (see Table S3). For a description of each feature see Data S8-S9. We performed pairwise comparisons separately for individuals with ADHD and control individuals. For each comparison we report the feature, the two genotype groups used for the comparison, the p value (two-sided Wilcoxon Rank-Sum test), the Benjamini-Hochberg False Discovery Rate, and the disease group (ctrl = control individuals; ADHD = individuals with ADHD). We included features that show an FDR value < 0.1 in at least one of the three ADHD comparisons. See also Data S38-S39.

D) Significant features identified by the post-hoc test for the continuous multivariate GWAS for behavioral traits. Specifically, we report results of the post-hoc univariate test conducted on the four significant loci reported in Table S5. For each locus, we list the wearable-derived static features from the corresponding cluster that show significant differences among genotype groups of individuals, considering the genotype of the individuals at the lead variant for the locus (see Table S5). For a description of each feature see Data S8-S9. For each comparison we report the feature, the two genotype groups used for the comparison, the p value (two-sided Wilcoxon Rank-Sum test), and the Benjamini-Hochberg False Discovery Rate. We included features that showed an FDR value < 0.1 in at least one of the three genotype comparisons. See also Data S45-S48.

1

Document S1. Data S1-S52.

7

Figure S1. Overview of the study design, related to Figure 1 and STAR MethodsDataset Description” section.

A) Schematic describing potential causal relationships linking environment, genotype, macrophenotype (psychiatric diagnosis), and digital phenotype. The direction of the arrows signifies causal direction. Solid lines in the figure denote causal directions explored in this study, and dotted lines represent other possible directions of causality. Pathways with both solid and dotted lines may show bidirectional causal association (e.g. positive feedback loops).

B) Example of raw wearable-derived dynamic data, showcasing time-series profiles of the seven channels collected over a selected period of 96 hours for three individuals of the ABCD cohort.

Figure S2. Processing pipeline and deep learning architecture, related to Figure 2 and STAR MethodsDataset Description” and “Multichannel Time Series Classifier” sections.

A) Flowchart of data preprocessing, imputation and covariates integration. First, seven modalities of data are combined and transformed into n data frames corresponding to n samples. An iterative examination is performed to find out all possible time windows with continuous time-series data avoiding noise or outliers, and the optimal time window is selected based on statistical power and data quality. Some samples are filtered out due to inadequate measurements compared to the defined threshold. During the data imputation process, missing categorical data is marked as ‘not recorded’ and missing quantitative data is imputed through sk-time drift method. To indicate whether data points at the current time step are imputed or not, a binary indicator channel, ‘*_flag’, is added to each modality. This channel enables the model to distinguish between imputed and non-imputed data, improving the model’s performance.84 Covariate data is also converted into time series format as time-invariant values. Finally, digital signature, imputation indicator and covariates are combined for modeling.

B) Deep learning architecture of XceptionTime Encoder. The architecture starts with 1d-CNNs with various sizes of convolutional filters to create multi-level receptive fields. It is followed by MaxPooling to reduce dimensionality and Batch Normalization to normalize the activations. Finally, the ReLU activation functions are applied to add non-linearity.

Figure S3. Model comparisons and loss curves, related to Figure 3 and STAR MethodsMachine Learning Classifier” and “Multichannel Time Series Classifier” sections.

A) Comparison of AUROC performance for predictive models of ADHD. The comparison includes XGBoost model (details in Data S20-S28), MiniRocket, a recurrent neural network (RNN) architecture, and the best performing Xception model. MiniRocket is a non-deep learning algorithm that uses convolutional filters (varying in weights, sizes, strides, dilations, and paddings) along the original dataset to generate deterministic summary statistics of the transformed time-series. The summary statistics are then fed into a logistic regression model.

B) Training and validation loss over 30 epochs of the ADHD Xception model. The close alignment of training and validation loss demonstrates that the model generalizes well to unseen data and has a low risk of overfitting.

Figure S4. Statistical details of the multivariate and univariate GWASs for ADHD, related to Figure 4, Table 1, Table S3, Table S4, and STAR MethodsGWASs for ADHD” and “Statistical Significance and Functional Dissection of GWAS” sections.

A) Schematic outlining the types of association tests implemented in each GWAS. On the left (Case-Control GWAS), we use a traditional framework to test for associations between genetic variants (genotype, g) and ADHD diagnosis (macrophenotype, m), while adjusting for covariates c. Here, m is a binary univariate response variable (presence or absence of ADHD), and g and c serve as independent variables. On the right (Wearable GWAS), we perform two types of association tests. 1) Continuous & Multivariate Response: In this case, the response variable is a vector d of wearable-derived features, which are not directly related to ADHD. To evaluate the relevance of genetic associations to ADHD, we introduce an interaction term g×m, which allows us to assess genetic effects on wearable-derived features specifically in the context of ADHD. Here, the wearable feature vector d serves as the continuous multivariate response, while g, m, g×m, and c serve as the independent variables. 2) Continuous & Univariate Response: In this case, the response variable is the wearable-derived combination score dscore, which represents a non-linear combination of wearable features. This score is generated through our supervised modeling framework aimed at classifying individuals with and without ADHD, thus making dscore implicitly related to m. (Details in Data S37, S40-S43.)

B) Hierarchical Two-step Test of association implemented for the multivariate GWAS. Consider p wearable features (in our case p = 258) measured across n individuals. When testing for genetic associations with each independent feature (Feature-wise Test, left panel), the study-wide significance threshold is 5108/p (i.e., the genome-wide significance divided by the number of independent tests). However, it is possible to reduce this multiple testing correction burden by performing a Hierarchical Two-Step Test (right panel). In step 1, p features are grouped into Q clusters of correlated features, and we test the association between each multivariate cluster of features q and g. The correlated nature of the features within a particular cluster enhances association power. The study-wide significance threshold in this case becomes 5108/Q. Given that Qp, this strategy translates to an improvement of ~1.5 orders of magnitude in the stringency of the threshold, compared to the Feature-wise Test. Since Step 1 does not specify which individual features within the cluster drive the significant association with the variant, in Step 2 we perform post-hoc univariate tests to independently test each of the r features within the significant cluster and assess differences in a given wearable feature between individuals carrying different genotypes (two-sided Wilcoxon Rank-Sum test with Benjamini-Hochberg FDR correction). (Details in Data S38-S39, S45-S48, and Table S2.)

C) Dot plot showing the power (y axis) of continuous and binary response variables for increasing genotype effects (beta coefficient; x axis). The type of response is color-coded to reflect the binary and continuous nature of macrophenotype and digital phenotype, respectively. Power is defined as the proportion (%) of linear (continuous) and logistic (binary) regression tests with BH-adjusted p value < 0.05 out of 10,000 simulations. We simulated SNP genotypes considering six different Minor Allele Frequency (MAF) values.

Figure S5. Dissecting genetic associations with behavioral traits, related to Figure 5, Table 1, Table S5, and STAR MethodsContinuous multivariate GWAS for behavioral traits”, “Statistical Significance and Functional Dissection of GWAS” sections.

A) Schematic describing the Wearable GWAS for behavioral traits, where the wearable feature vector d serves as the continuous multivariate response, and the genotype g and covariates c serve as the independent variables. (Details in Data S44, S49-S52.)

B) Enrichment of chromatin features at GWAS locus for heart rate features (chr14:23,392,601–23,418,974). Right panel: Barplot showing the number of epigenomics ENCODE experiments (x axis) with peaks at the locus across different organs and tissues (y axis). Only organs / tissues with a Benjamini-Hochberg adjusted p value < 0.1 (two-sided Fisher test) are shown. The type of experiment is color-coded (purple: DNase-seq; blue: ATAC-seq; magenta: Mint ChIP-seq; orange: Histone ChIP-seq). Left panel: Dot plot showing the log2 odds-ratio (OR) of the significance of peak enrichment across different biosamples.

C-D) GTEx expression levels of proximal genes and eGenes associated with locus chr14:23,392,601–23,418,974. Scatterplot showing the expression level (y axis, log2 scale) of proximal genes (upper panel) and GTEx eGenes (lower panel) across 54 GTEx tissues (x axis). In this case we show the two eGenes (CMTM5 and HOMEZ) that have GTEx eQTLs coinciding with four of the five GWAS variants within the locus. (Details in Table S5). Human tissues are color-coded based on the GTEx palette.

Data Availability Statement

RESOURCES