Skip to main content
Communications Biology logoLink to Communications Biology
. 2024 Apr 10;7:435. doi: 10.1038/s42003-024-06091-y

Genetic associations of risk behaviours and educational achievement

Michelle Arellano Spano 1,2,, Tim T Morris 3, Neil M Davies 4,5,6, Amanda Hughes 1,2
PMCID: PMC11006670  PMID: 38600303

Abstract

Risk behaviours are common in adolescent and persist into adulthood, people who engage in more risk behaviours are more likely to have lower educational attainment. We applied genetic causal inference methods to explore the causal relationship between adolescent risk behaviours and educational achievement. Risk behaviours were phenotypically associated with educational achievement at age 16 after adjusting for confounders (−0.11, 95%CI: −0.11, −0.09). Genomic-based restricted maximum likelihood (GREML) results indicated that both traits were heritable and have a shared genetic architecture (Risk h2 = 0.18, 95% CI: −0.11,0.47; education h2 = 0.60, 95%CI: 0.50,0.70). Consistent with the phenotypic results, genetic variation associated with risk behaviour was negatively associated with education (rg = −0.51, 95%CI: −1.04,0.02). Lastly, the bidirectional MR results indicate that educational achievement or a closely related trait is likely to affect risk behaviours PGI (β=−1.04, 95% CI: −1.41, −0.67), but we found little evidence that the genetic variation associated with risk behaviours affected educational achievement (β=0.00, 95% CI: −0.24,0.24). The results suggest engagement in risk behaviour may be partly driven by educational achievement or a closely related trait.

Subject terms: Genetics research, Epidemiology


A genetic study that explores the relationship between risk behaviours engagement in adolescents and educational achievement. This study found that risk behaviours were phenotypically associated with educational achievement at age 16, even after adjusting for confounders.

Introduction

Risk behaviours like alcohol use, smoking and physical inactivity are often first engaged in adolescence and persist into adulthood1 Adolescence is a crucial formative period for an individual’s future well-being; the choices made during this period can have important repercussions later in life2 For example, greater engagement in risk behaviours at a young age is associated with increased risk of injury, substance dependence and lower educational attainment3,4 Evidence suggests that for each additional risk behaviour adolescents partake in, the odds of attaining five A*-C grades (a common marker of enrolment in further education and entry to skilled employment) at age 16 are 23% lower. If causal, risk behaviours in adolescence could, therefore, be a key target for interventions aiming to improve socioeconomic and health outcomes.

Risk behaviours tend to cluster and co-occur within individuals. This clustering can occur because of various reasons. First, engagement in one behaviour can lead to engagement in other risk behaviours, in a process known as co-occurrence5 For example, alcohol use can increase the risk of risky sexual behaviours via inhibition mechanisms affecting an individual’s decision-making processes6 The aforementioned effect, where one behaviour causes the other, was also demonstrated by ref. 7, who observed that early substance use was associated with an increased risk of engaging in premature sexual activity in adolescent girls. Similarly, features of an adolescent’s social and psychological environment, such as peers’ behaviour, can simultaneously influence engagement in multiple risk behaviours (environmental confounding)8 One source of environmental confounding are indirect genetic effects ('dynastic' effects or 'genetic nurture'), which occurs when relatives’ heritable traits affect children’s outcomes through environmental pathways. This bias is particularly evident in genetic studies of intergenerational transmission of education. Genetically influenced traits associated with educational achievement in the parents’ generation may lead to environments which promote educational achievement in children9. Such passive gene-environment correlation can impact the children’s educational achievement via environmental pathways, alongside any effects due to direct genetic inheritance, and inducing confounding through a correlation between genotypes and phenotypes.

The literature has focused on the effect of risk behaviours on various behavioural and social outcomes. These report associations between risk behaviours in adolescence and socioeconomic position later in life10, adult aggression11 and continuity of substance misuse12 However, it is unclear whether risk behaviours causally affect educational achievement or if features of the environment (confounding) influence both13 or educational achievement influencing risk behaviours (reverse causation) 14,15. Genetically informed studies can help overcome these sources of bias and improve our understanding of the causal relationships between education and risk behaviours in adolescents.

This study assessed the bidirectional causal relationships between adolescent risk behaviour and educational achievement. We applied genetic methods to study the genetic architecture of risk behaviours and educational achievement in an English cohort. We implemented a bidirectional Mendelian randomisation (MR) to investigate the causal direction of associations between these traits since a causal effect between education and risk is plausible in either direction. To minimise confounding and reverse causation, we use a polygenic risk indices (PGI) to capture risk and education liability.

Results

Sample description

We began with the original ALSPAC sample of 15,645 pregnancies, which was then restricted to those with genetic data and National Pupil Database linkage available. We subsequently excluded participants with consent withdrawals, participants not alive at 1 year, and those with no recorded sex and no socioeconomic information (maternal education and housing tenure). This process yielded a final analytical sample of 7695 participants, of whom 51% were male and 49% female. The phenotypic and MR analyses were carried out using imputed data on these 7695 participants. Of these, 1583 participants had complete information on all risk behaviours and covariates. This complete case sample was used for GREML analyses. Table 1 in the supplementary material shows the differences across the risk behaviour index and covariates between this complete case sample (N = 1583) and the remainder of the original ALSPAC sample (N = 14,062).

Table 1.

Associations of capped GCSE score with the MRB index, based on imputed data (N = 7695)

Capped GCSE score Model 1a Model 2b Model 3c Model 4d
95% confidence intervals in brackets
MRB Index −0.14 [−0.17, −0.12] −0.12 [−0.14, −0.10] −0.12 [−0.13, −0.10] −0.11[−0.13, −0.09]

aModel 1 is unadjusted for any covariates.

bModel 2 is adjusted for parental socioeconomic position, maternal education (ref:<O level) and sex (ref: male).

cModel 3 is adjusted for parental socioeconomic position, maternal education (ref:<O level), sex (ref: male) and housing tenure (ref: owned).

dModel 4 is adjusted for parental socioeconomic position, maternal education (ref:<O level), sex (ref: male), housing tenure (ref: owned) and cognitive ability.

Phenotypic associations of risk behaviours and educational achievement

Table 1 reports results from models where we regress the capped GCSE score on the MRB Index using imputed data. The first column shows the regression results of the capped GCSE score on the MRB Index unadjusted for any covariate. A standard deviation increase in the MRB Index was phenotypically associated with a 0.14 (95% CI: [0.12, 0.17]) standard deviation decrease in capped GCSE score. After adjusting for sex, parental socioeconomic position and maternal education, a standard deviation increase in the MRB Index corresponds to a 0.12 (95% CI: [0.10, 0.14]) standard deviation decrease in capped GCSE score. This finding suggests that engagement in risk behaviour is associated with lower capped GCSE scores net of covariates. Likewise, results for the fully adjusted binary outcome model suggested the odds of obtaining five or more A*-C GCSEs were 19% (95% CI: [16, 23%]) lower per standard deviation increase (see supplementary, Table 6).

Genotypic associations of risk behaviours and educational achievement

The univariate GREML models show associations between the phenotypes of interest and the genotypic data (Table 2). We observed SNP heritability in the educational achievement of 0.60 (95%CI: [0.50, 0.70]) for the capped GCSE score (continuous measure). The estimated heritability of the MRB Index was lower at 0.18 (95%CI: [−0.11, 0.47]), and the confidence interval crossed the null. These results suggest that considerable variation in the educational achievement measures can be explained by common genetic variation and provide weaker evidence that some variation in the risk behaviour index can be explained by common genetic variation.

Table 2.

GCTA estimates. h2: univariate heritability, rg: genetic correlation

Univariate estimates n h2a SE 95% CI
MRB index 2171 0.18 0.15 −0.11 0.47
Capped GCSE score b 6646 0.60 0.05 0.50 0.70
Bivariate estimates n rgc SE 95% CI
Capped GCSE score: MRB index 4409 −0.51 0.27 −1.04 0.02

ah2 shows the univariate heritability of each item.

bCapped GCSE is a continuous measure of educational achievement.

crg genetic correlation.

The bivariate GREML models show a strong negative genetic correlation between the MRB index and educational achievement of −0.51 (95%CI: [−1.04, 0.02]) for the capped GCSE score. This result suggests considerable genetic overlap between these traits and that genetic variation associated with risk behaviours is also associated with lower educational achievement.

Bidirectional Mendelian randomisation

Figure 1 shows associations between the genetically instrumented MRB index and capped GCSE points score of young people. There was little evidence of an impact of the genetically instrumented MRB Index (F-statistic = 3.44) on capped GCSE score when adjusted for the sex and principal ancestry components ( β^ = −0.06, 95% CI: [−0.27,0.15]), or when additionally adjusted for the maternal risk PGI (β^ = 0.00, 95% CI: [−0.24,0.24]). The results for the binary outcome were similar; there was little evidence that risk behaviours influenced educational achievement adjusted for maternal risk PGI (β^ = −0.02, 95% CI: [−0.14, 0.10]) (Supplementary Fig. 1).

Fig. 1. Association between the young person’s genetically instrumented MRB Index and their educational achievement.

Fig. 1

Error bars represent the 95% confidence intervals.

Figure 2 shows the association between the genetically instrumented capped GCSE score and the MRB index of young people. There was a negative association between genetically instrumented education (F-statistic = 725.58) and MRB index (β^ = −0.75, 95% CI: [−0.97, −0.54]) when adjusting for the sex and principal components of ancestry and when additionally adjusting for the mother’s education PGI (β^ = −1.04, 95% CI: [−1.41, −0.67]). Attenuation with adjustment for the mother’s education PGI were similar for the binary outcome (Supplementary Fig. 2).

Fig. 2. Association between young people’s genetically instrumented educational achievement (capped GCSE score, standardised) and their MRB index.

Fig. 2

Error bars represent the 95% confidence intervals.

Discussion

In a cohort of adolescents, an index of multiple risk behaviours was phenotypically associated with educational achievement at 16 after adjustment for confounders. Genetic analysis using GREML indicated that both traits were heritable and shared genetic architecture, with considerable genetic overlap between the two traits. Consistent with the results of phenotypic models, genetic variation associated with risk behaviours was negatively associated with educational achievement. Furthermore, bidirectional MR suggested that educational achievement affects risk behaviours and that engagement in risk behaviours may be partly driven by an individual’s educational achievement or a closely related trait. In contrast, we found little evidence that genetic variation associated with engagement in risk behaviours causally affected educational achievement, but these estimates were less precise.

A possible explanation for these results is familial factors, such as indirect genetic effects of parents on their children. Indirect genetic effects can occur when the parents’ genetic variants affect the offspring through environmental mechanisms (i.e. not via direct genetic transmission). For example, ref. 16 found that parents’ non-transmitted polygenic indexes were associated with the educational achievement of their children 29.9% as strongly (p = 1.6 × 10−14) as parents transmitted polygenic indexes17 This is consistent with results found in Howe et al.’s (2022) within-sibship GWAS, where the association of genetic variants with educational attainment and phenotypes from population estimates, such as BMI and smoking, may be inflated by indirect genetic effects. However, adjusting our analysis for mothers’ polygenic indexes only modestly attenuated the effects. Additional data is needed to investigate how indirect genetic effects influence these relationships in genotyped mother–father–child trios18

The MRB index had a negative phenotypic association with educational achievement for both achievement measures. We showed a decrease in the capped GCSE score of 0.14 SD (95% CI: [−0.17, −0.12]) per SD higher engagement in risky behaviours; these results were slightly attenuated in the full model when controlled for confounders. The fully adjusted model showed a negative association in the capped GCSE score of 0.12 SD (95% CI: [−0.14, −0.10]). Similar results were observed when exploring the association between the MRB index and the probability of gaining five A*-C grades at GCSE, including in English and Mathematics. These results are consistent with previous results based on the ALSPAC cohort, where multiple risk behaviours were negatively associated with education achievement, presenting a reduction in test scores of 6.31 points (95% CI: [−7.03, −5.58])4

Our estimates of the heritability of educational achievement are in line with those reported by previous studies. Among many others, ref. 19 estimated heritability for educational outcomes of 0.21 for GCSE Mathematics, 0.15 for GCSE English and 0.17 for GCSE Science. Likewise, ref. 20 estimated heritability of reading performance of 0.38 in a genetic study using the Western Reserve Reading Project data in Ohio, USA. Krapohl and Plomin21 estimated heritability of educational attainment of 0.31 in their study of socioeconomic position and offspring education. Our results from bivariate GREML also indicate that engagement in risk behaviours had a strong negative genetic association with educational outcomes at 16 years, with a genetic correlation of −0.51 (95%CI: −1.04, 0.02) for our capped GCSE score and −0.82 (95%CI: −1.68, 0.04) for attaining 5 or more A*-C grades in Mathematics and English.

Our MR results provided little evidence that risk behaviours affected educational achievement (β^ = −0.06, 95% CI: [−0.27,0.15]), with or without adjustment for the maternal risk PGI. In contrast, there was evidence of a causal effect of educational attainment on engagement in risk behaviours (β^ = −0.75, 95% CI: [−0.97, −0.54]). This may be because the MR estimate of the effect of education on risk behaviours was considerably more precise, reflecting an educational attainment PGI which was a much stronger instrument than the PGI for risk behaviours.

The risk behaviour literature shows that the risk behaviours that we considered frequently co-occur and tend to cluster during adolescence22,23. Existing studies investigating clustered risk behaviours focus only on small subsets of behaviours, such as alcohol use and smoking24, failing to account for behaviours such as self-harm and criminal or delinquent behaviour. We consider a wider range of clustered risk behaviours that allows us to capture risk associations with education more comprehensively. While we had insufficient power to draw firm conclusions about the effects of risk behaviours on educational attainment, our results do imply that educational achievement, or a closely related trait, affects risk behaviours. This supports current literature indicating that universal school-based interventions to improve students’ outcomes may have reduced the rates of risk behaviours25. Findings therefore suggest that these interventions could improve student outcomes and lessen the burden on public health services whilst reducing adolescent risk behaviours.

However, there are some limitations to our analysis. Missing data on risk behaviours and confounders reduced power (especially for GCTA analysis, which did not use imputed data) and may have introduced bias. Likewise, although the multiple risk behaviour index comprised a wide range of behaviours, by assigning each risk behaviour the same weight, we assumed that all risk behaviours contribute equally to associations with educational achievement. Horizontal pleiotropy might also have affected our results if genetic variants for educational attainment also affect other traits influencing risk behaviour. It is challenging to investigate further as most pleiotropy robust methods require GWAS summary data rather than individual-level data as used in this study. Future work could, however, employ multivariate Mendelian Randomisation26 to study the direct effect of risk behaviour and educational achievement27. The lack of genetic data on fathers meant we could not adjust for paternal genotype, and indirect genetic effects involving fathers might have influenced our results. However, controlling for maternal genotype only modestly attenuated associations, suggesting that indirect genetic effects were unlikely to explain our findings fully. Nevertheless, assessment of these relationships using well-powered familial analysis, like M-GCTA26 and bigger samples, could shed more light on passive environmental confounding or indirect genetic effects, leading to a better understanding of causation. Furthermore, some of the risk behaviours were measured via questionnaires, which may have introduced recall and desirability biases, where participants might have underreported socially perceived undesirable behaviours. Future work could investigate whether some risk behaviours are more closely linked to education than others. Our study only investigated the association of these phenotypes with common genetic variation, and future studies could investigate the impact of rare genetic variation.

In summary, we explored the genetic architecture of risk behaviour engagement in educational achievement and the bidirectional causal effect of these traits. We found evidence that higher educational achievement, or a closely related trait, will likely reduce risk behaviours. However, we found little evidence that risk behaviours affected educational achievement, although statistical power was limited. Our results add to existing evidence that educational achievement may be an effective intervention target for risky behaviours.

Methods

Study participants

The Avon Longitudinal Study of Parents and Children (ALSPAC) is a prospective birth cohort based in the Bristol and Avon area in the UK. ALSPAC invited pregnant women to participate if they were residents in the area and had expected delivery dates from 1st April 1991 to 31st December 1992. From 14,541 pregnancies initially enrolled, 13,988 children were alive at 1 year of age. When the oldest children were approximately seven, the study attempted to include eligible cases who did not originally participate in the study. The total sample size for analyses using any data collected after the age of seven is 15,447 pregnancies, resulting in 15,658 foetuses. Of these, 14,901 children were alive at one year of age. Details of the enrolment phases are provided elsewhere2830 Consent for biological samples was collected per the Human Tissue Act (2004) (for full information on ALSPAC ethical approval, please see:  http://www.bristol.ac.uk/alspac/researchers/research-ethics/). Informed consent for the use of data collected via questionnaires and clinics was obtained from participants following the recommendations of the ALSPAC Ethics and Law Committee at the time. Ethical approval for the study was obtained from the ALSPAC Law and Ethics Committee and local research ethics committees (NHS Haydock REC: 10/H1010/70). This study has been pre-registered with ALSPAC under proposal number B3557. Completion of individual questionnaires was taken as consent for the data from that questionnaire, with additional written permission from parents for the use of clinic data. At age 16, young people and their parents gave written informed consent for the use of the young person’s genetic information. At age 18, study children were sent ‘fair processing’ materials describing ALSPAC’s intended use of their health and administrative records. They were given clear means to consent or object via a written form. Education data were not extracted for participants who objected or were not sent fair processing materials28,31 This project was registered with ALSPAC under proposal number B3557. ALSPAC has a lower share of ethnic minority participants than the UK population but was otherwise broadly representative at baseline29. All ethical regulations relevant to human research participants were followed.

Sample

Attrition and patterns of missingness across variables reduced the complete case analytical sample from 15,645 participants to 1583 (Fig. 3). Due to attrition, a substantial number of participants originally included in ALSPAC did not have genetic data or linkage to the National Pupil Database (NPD) (N = 7657). We further excluded consent withdrawals(n = 11), participants not alive at 1 year (n = 5), participants with no sex information (n = 135) and participants with no socioeconomic information available (maternal education (n = 115) and housing tenure (n = 27). Thus, we restricted the analytic sample to the 7695 participants alive at 1 yr with genetic and NPD data, sex, and socioeconomic information from infancy (maternal educational qualifications and housing tenure) who had not withdrawn consent. Within this sample, missing data in remaining variables (risk behaviours and other covariates) was imputed. We performed multiple imputation by chained equations32, with 50 imputed datasets created. We used the imputed dataset for phenotypic analyses and bidirectional MR. GREML analyses used the complete case sample of participants with genetic information and complete data on all exposures, outcomes, and covariates (N = 1583). We carried out the phenotypic analysis and MR in both the complete case sample and imputed datasets; we present results on the imputed sample in the main manuscript with complete case analyses given in the supplementary material (Supplementary Tables 37, 1217 and supplementary Figs. 3, 4). For the imputation model, we included marital status, mother’s smoking status, maternal education, housing tenure and parental social class as auxiliary variables. We used logistic regression to impute the risk behaviours, linear and truncated regression for continuous variables and ordered logistic regression to impute categorical variables. Multiple imputation resulted in an imputed sample size of 7695.

Fig. 3. STROBE diagram.

Fig. 3

The diagram describes the selection of the complete case sample and the imputed sample.

Genotyping

ALSPAC children were genotyped using the Illumina HumanHap550 platform, and standard quality control procedures were applied. Individuals were then excluded based on sex mismatch, minimal or excessive heterozygosity, disproportionate individual missingness (>3%) and insufficient sample replication (UBD <0.8). During genetic quality controls, individuals with non-European ancestry were removed, as is often done in genetic studies, to minimise bias introduced by ancestral population stratification. SNPS with a minor allele frequency is <1%, call rate of <95% or evidence of Hardy-Weinberg disequilibrium (pval <5 × 107) were removed. Cryptic relatedness was measured as the proportion of identity by descent (IBD >0.1). Imputation was performed using impute v2.2.2 to the Haplotype Reference Consortium (HRC) panel, and SNPs with poor imputation quality (infoscore <0.08) removed.

Measures

Multiple risk behaviours (MRBs) at age 16

An index of multiple risk behaviours (MRBs) was derived from two main data collections during the participants’ adolescence: a self-completed questionnaire issued during a clinic assessment at age 15 and a self-completed postal questionnaire at age 16. We coded 13 risk behaviours into binary format (no = 0; yes = 1) following ref. 4 and then calculated an MRB index as the total number of risk behaviours each participant had engaged in. The underlying risk behaviours that we used to construct the risk behaviour index were largely already dichotomised (6 out of the 13 risk behaviours). None of the underlying risk behaviours were continuous. This risk behaviour index has been previously used in the literature4,10,33. We further carried out extra analysis to check the consistency of the index; these results are available in the supplementary information (Supplementary Tables 811). We tested the internal consistency of the index based on the Cronbach alpha and Pearson’s correlations, and also carried out a factor analysis. The results based on an updated index excluding two items with the lowest item-test correlation, and using the first factor as the exposure, did not alter conclusions (Supplementary Tables 1417 and Supplementary Figs. 58).

The study website contains details of available data through a searchable data dictionary and variable search tool: http://www.bristol.ac.uk/alspac/researchers/our-data/.

The risk behaviours included in the index were:

Physical inactivity: Participant has typically exercised <5 times per week over the past year.

TV viewing: Participant spent three or more hours watching TV on average daily across the week.

Car passenger risk: The participant had been in a car passenger at least once in their lifetime where the driver (1) had consumed alcohol, (2) did not have a valid licence, or (3) the participant chose not to wear a seat belt last time travelling in a car, van, or taxi.

Scooter risk: Participants reported that they had last ridden a scooter within the previous four weeks and had not used a helmet on the most recent occasions.

Cycle helmet use: If the participant reported that they had last ridden a bicycle within the previous 4 weeks and had not used a helmet on the most recent occasion.

Illicit drug use/solvent use: In the year since their 15th birthday, the participant had either been a regular user (used more than five times) of one or more illicit drugs (excluding cannabis), including amphetamines, ecstasy, lysergic acid diethylamide (LSD), cocaine, ketamine or inhalants including aerosols, gas, solvents, and poppers.

Cannabis use: Participants who reported using cannabis ‘sometime, but less often than once a week’ or more regular use were classified as occasional users.

Regular tobacco use: Participant has never smoked and is regularly smoking at least one cigarette per week.

Hazardous alcohol consumption: In the past year, participants had scored eight or more on the Alcohol Use Disorders Identification Test (AUDIT), indicating hazardous alcohol consumption.

Self-harm: Participant said they had purposely hurt themselves in some way in their lifetime.

Penetrative sex before the age of 16: Participant reported having had penetrative sex in the preceding year and that they were under 16 at the time.

Unprotected sex: Participant engaged in penetrative sex without using contraception on the last occasion they had had sex in the past year.

Criminal and delinquent behaviour: Participant reported that at least once in the past year, they had undertaken at least one of the following: carried a weapon; physically hurt someone on purpose; stolen something; sold illicit substances to another person; damaged property belonging to someone else either by using graffiti, setting fire to it, or destroying or damaging it in another fashion; subjected someone to verbal or physical racial abuse; or been rude/rowdy in a public place.

As each of the risk behaviours can be represented as a binary indicator (see Table 3 for descriptives of individual risk behaviours), we can denote the variable measuring engagement in risk behaviour j for each individual i by the binary indicator as follows:

wij=1ifindividualiengagesinriskbehaviourj,0otherwise
Table 3.

Descriptive statistics for educational achievement measures and MRB index in the imputed sample and individual multiple risk behaviours in the pre-imputed set

Continuous variables N Mean SD Min Max
Capped GCSE scorea 7695 331.19 90.16 −0.52 590.89
5 or more A*-C GCSE’s including English and Mathsb 7695 0.57 0.49 0 1
MRB index 7695 3.48 2.18 0 12.08
Binary variables
Multiple risk behaviours (MRBs) Nc % Engaging
Physical inactivity 3556 74%
TV viewing 3584 21%
Car passenger risk 3547 30%
Scooter risk 3497 20%
Cycle helmet use 3227 24%
Illicit drug use/solvent use 3512 8%
Cannabis use 3578 10%
Regular tobacco use 3579 12%
Hazardous alcohol consumption 3399 36%
Self-harm 3582 19%
Penetrative sex before the age of 16 3933 17%
Unprotected sex 3933 3%
Criminal and delinquent behaviours 4017 47%

aCapped GCSE is a continuous measure of educational achievement.

bAchieved 5 or more is a binary measure of educational achievement.

cPre-imputation sample analysis was restricted to unrelated ALSPAC participants with genetic data and linked GCSE records, alive at 1 year, who had not withdrawn consent, complete sex information and that had enough maternal socioeconomic information. Missing data were imputed using multiple imputation by chained equation.

Since we are looking at the overall engagement across a range of risk behaviours rather than individual effects of each, we then create a new single variable called the multiple risk behaviour index (MRBI), defined for each individual i as the sum of all behaviours, as follows:

MRBIi=j=113wij

The new regressor MRBIi is our exposure of interest summarised in Table 3.

Educational achievement

Information on educational achievement was obtained via record linkage to the National Pupil Database (NPD). Managed by the Department of Education in England, the NPD includes data collected from school students and higher education students from 2 to 21 years. This dataset comprises the most complete and accurate record of compulsory educational achievement available in England. Educational measures were based on participants’ General Certificate of Secondary Education (GCSE) qualifications, which are taken during educational Key Stage 4 when pupils are aged between 14 and 16 years old. At the time, Key Stage 4 marked the end of compulsory education in England. For this analysis, we used two measures of achievement. The first was the capped GCSE score, a continuous measure which sums the student’s eight best grades to obtain a measure of overall achievement commonly used in educational research. Individual GCSE qualifications in each subject contribute 58 points for an A* through to 16 points for a G and 0 for a U (ungraded). Our second measure of educational achievement was a binary indicator of whether participants achieved five or more A*-C grades at GCSEs. We used this as it is the qualification requirement for entry to many post-16 education and training courses.

Polygenic indexes (PGI)

We used the largest existing genome-wide association studies (GWAS) to identify single-nucleotide polymorphisms (SNPs) associated with risk behaviours34 and educational achievement35. After sub-setting GWAS results for both phenotypes to SNPs that were available in ALSPAC, we used the MRInstruments R package to identify SNPs which were independently associated (at p < 5 × 10−8) with the phenotypes (clumping parameters: R2 = 0.01, 10,000 kb). This resulted in 303 SNPs associated with risk behaviour and 3952 SNPs associated with educational achievement. PGIs based on these SNPs were then derived in PLINK 1.9 by summing trait-increasing alleles. SNPs were weighted by each allele’s regression coefficient from the GWAS so that genetic variants with greater effect contributed more to the scores. Finally, scores were standardised for analysis. The children’s educational achievement PGI explained 9.83% of the variation in the capped GCSE score (continuous outcome), while the children’s risk behaviour PGI explained 0.05% of the variation in the MRB index. The mother’s educational achievement PGI explained 6.94% of children’s capped GCSE scores, and the mothers' risk behaviour PGI explained 0.16% of the variation in children’s risk behaviours.

Statistical analysis

In order to explore the association between the MRB index and educational achievement, we carried out three types of analyses. First, we examined phenotypic associations between the MRB index and the continuous and binary measures for educational achievement in the ALSPAC cohort. Secondly, to explore the genetic underpinnings of engagement in risk behaviour and educational achievement, we performed univariate GREML to estimate the heritability of both traits, and bivariate GREML to explore the genetic correlation of these behaviours. GREML analysis was carried out in the complete case sample, as GREML cannot be readily performed using multiply imputed phenotype data. Third, given the possible confounding bias which can affect estimates based on observational data, we used bidirectional MR analyses to estimate causal associations between the MRB index and educational measures in our imputed datasets. Below we expand on these analytical methods.

Phenotypic associations

We used linear and logistic regression to estimate the association of the MRB Index with capped GCSE score (continuous outcome) and gaining five or more GCSE grade A*-C (binary outcome). Base models adjusted for the young person’s sex. Since other factors may confound the association of educational achievement and the number of risk behaviours, we also estimated these associations adjusted for the following potential socioeconomic confounders: parental social class, maternal education, and housing tenure at the time of the child’s birth. Lastly, we estimated a third set of associations adjusted for the child’s cognitive ability. Table 4 shows the summary statistics for these variables in the imputed sample (see supplementary Tables 1, 2 for the complete case sample).

Table 4.

Summary statistics of parental social class, housing tenure, maternal education, and participants’ sex

Variables N %
Parental social class (N = 7695)
  Professional 1113 14.5
  Managerial and technical 3316 43.4
  Skilled non-manual 1929 25.2
  Skilled manual 923 12
  Partially unskilled 365 4.74
  Unskilled 48 0.63
Housing tenure (N = 7695)
  Owned 6127 79.6
  Council rented 782 10.2
  Privately rented 786 10.2
Maternal education (N = 7695)
  <O level 1943 25.3
  O level 2674 34.8
  A level 1897 24.7
  Degree 1182 15.4
Participant’s Sex (N = 7695) 
  Female 3945 51.3
  Male 3750 48.7

Genotypic associations

We conducted genomic-based restricted maximum likelihood (GREML) to examine the genetic overlap between the MRB Index and educational achievement. These models were carried out using Genome-wide Trait Analysis (GCTA)36. GCTA uses a genomic restricted maximum likelihood (GREML) method to estimate the proportion of phenotypic variance that can be statistically explained by all measured genome-wide single-nucleotide polymorphisms (SNPs), known as the SNP-based heritability. GCTA estimates heritability by comparing the genetic similarity of unrelated individuals to their phenotypic similarities. Unrelated participants (defined as more distantly related than second cousins) were determined using Genetic Relatedness Matrices (GRMs)36 If a phenotype can be (in part) explained by genetic variation, then we would expect more genetically similar individuals to be more phenotypically similar37. We first estimated univariate models to test the SNP heritability of the educational outcomes and MRB index, specified as:

y=Xβ+g+ε

where y is the phenotype, X is a series of covariates, g is a normally distributed random effect with variance σg2 and ε is a residual error with variance σϵ2. The SNP-based heritability can then be estimated as the proportion of total phenotypic variance that is attributable to a genotypic variance of the phenotype:

hSNP2=σg2σg2+σϵ2.

To control for differences between ancestral populations in allele distributions which could potentially bias the estimate, the first 20 principal components of inferred population structure were included in the analyses as covariates.

We estimated genetic correlations between the MRB Index and both measures of educational achievement using bivariate GCTA38. Genetic correlations allow us to quantify the overlap in SNPs associated with multiple phenotypes. Specifically for this study, the genetic correlation shows the proportion of the phenotypic correlation between the MRB index and education that is explained by common variation. Genetic correlations are estimated as:

rg=covg(A,B)vargAvarg(B)

where rg is the genetic correlation between phenotypes A and B, varg(A) is the genetic variance of phenotype A, and covg(A,B) is the genetic covariance between phenotypes A and B. Genetic correlations reflect common genetic architecture, where two phenotypes are influenced by the same SNPs. GCTA does not support GREML using multiply imputed phenotype data, so these analyses were performed in the subset of the analytic sample who had complete phenotypic information (N = 1735).

Bidirectional Mendelian randomisation (MR)

Mendelian randomisation (MR) is a statistical method which can evaluate causal effects between purported exposures and outcomes in observational data by using genetic variants as instrumental variables for exposures. MR relies on the random assortment of alleles from parents to children which occurs during gamete formation and conception39. Since the genetic variants associated with the exposure do not change in response to a person’s health or environmental circumstances, associations between exposure-associated genetic variants and the outcome are not affected by classical confounding or reverse causation, which often affects estimates from observational studies40. For MR estimates to be valid, the genetic instruments must meet three assumptions: (1) relevance, it must associate with the exposure, (2) independence, there must be nothing that causes both the instrument and the outcome, and (3) exclusion, the association of the instrument and the outcome must be entirely mediated via the exposure41 We tested the first assumption using partial F-statistics.

For educational and risk behaviours, a causal effect in either direction is plausible, so we used bidirectional MR. Bidirectional MR is an extension of a standard MR analysis which attempts to differentiate whether the exposure is a cause of the outcome, a consequence of the outcome, or if there is a true bidirectional causal effect between them (Fig. 4)42

Fig. 4. Directed acyclic graph of a bidirectional MR presenting the relationship between the MRB Index and educational achievement.

Fig. 4

Panel A depicts the relationship between the MRB index and educational achievement, while Panel B illustrates the bidirectional association between educational achievement and the MRB index. PGI refers to polygenic index and MRBI stands for multiple risk behaviour index.

First, we used MR to estimate the effect of educational achievement on risk behaviours. We used a two-stage least squares instrumental variable model (Stata’s ivreg2) with the risk behaviours index as the outcome and instrumented educational achievement using a polygenic index of SNPs previously associated with years of schooling35 Next, we used MR to estimate the effect of risk behaviours on educational achievement by reversing the outcome and exposure. In this second analysis, the capped GCSE points score was the outcome, and we instrumented the risk behaviours index using a polygenic index of SNPs previously associated with risk-taking behaviour34. For each outcome, two sets of models were run: one which adjusted for the young person’s sex and their first 20 principal components of ancestry, and a model which also adjusted for factors associated with maternal genotype by including the mother’s polygenic index. Likewise, for the binary outcome of obtaining five or more A*-C GCSEs, we used a two-stage least squares instrumental variable model, and again instrumented the risk behaviours index using a polygenic index of SNPs previously associated with risk-taking behaviour.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Supplementary information

Peer Review File (556KB, pdf)
Supplementary material (705KB, pdf)
Reporting Summary (2MB, pdf)

Acknowledgements

We are extremely grateful to all the families who took part in this study, the midwives for their help in recruiting them, and the whole ALSPAC team, which includes interviewers, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionists and nurses. The UK Medical Research Council and Wellcome (Grant ref: 217065/Z/19/Z) and the University of Bristol provide core support for ALSPAC. This publication is the work of the authors and will serve as guarantors for the contents of this paper. A comprehensive list of grant funding is available on the ALSPAC website (http://www.bristol.ac.uk/alspac/external/documents/grant-acknowledgements.pdf). This research is funded by The Medical Research Council (MRC). TTM is funded by the Economic and Social Research Council (ESRC) [ES/W013142/1]. The University of Bristol supports the MRC Integrative Epidemiology Unit [MC_UU_12013/1, MC_UU_12013/9, MC_UU_00011/1]. GWAS data was generated by Sample Logistics and Genotyping Facilities at Wellcome Sanger Institute and LabCorp (Laboratory Corporation of America) using support from 23andMe. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author contributions

M.A.S. performed the analysis and wrote the manuscript. T.T.M., N.M.D. and A.H. conceptualised the study and made critical revisions to the manuscript. All authors have approved this manuscript.

Peer review

Peer review information

This manuscript has been previously reviewed in another Nature Portfolio journal. Communications Biology thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editor: Luke R. Grinham. A peer review file is available.

Data availability

The informed consent obtained from ALSPAC participants does not allow the data to be made freely available through any third-party maintained public repository. Data used for this submission can be made available on request to the ALSPAC Executive. The ALSPAC data management plan describes in detail the policy regarding data sharing, which is through a system of managed open access. Full instructions for applying for data access can be found here: http://www.bristol.ac.uk/alspac/researchers/access/. The GWAS summary statistics for both risk behaviours and educational attainment used in the analyses are available through the Social Science Genetic Association Consortium (SSGAC). Available through the SSGAC website: https://www.thessgac.org/.

Code availability

All the code used to clean and analyse the data for this study is available: https://github.com/MichelleSpano/Risk-behaviours

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

The online version contains supplementary material available at 10.1038/s42003-024-06091-y.

References

  • 1.Teh, C. H. et al. Clustering of lifestyle risk behaviours and its determinants among school-going adolescents in a middle-income country: a cross-sectional study. BMC Public Health19, 1177 (2019). [DOI] [PMC free article] [PubMed]
  • 2.Viner RM, et al. Adolescence and the social determinants of health. Lancet. 2012;379:1641–1652. doi: 10.1016/S0140-6736(12)60149-4. [DOI] [PubMed] [Google Scholar]
  • 3.Strawbridge RJ, et al. Genome-wide analysis of self-reported risk-taking behaviour and cross-disorder genetic correlations in the UK Biobank cohort. Transl. Psychiatry. 2018;8:1–11. doi: 10.1038/s41398-017-0079-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Wright C, Kipping R, Hickman M, Campbell R, Heron J. Effect of multiple risk behaviours in adolescence on educational attainment at age 16 years: a UK birth cohort study. BMJ Open. 2018;8:e020182. doi: 10.1136/bmjopen-2017-020182. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Meader N, et al. A systematic review on the clustering and co-occurrence of multiple risk behaviours. BMC Public Health. 2016;16:1–9. doi: 10.1186/s12889-016-3373-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Brown JL, Gause NK, Northern N. The association between alcohol and sexual risk behaviors among college students: a review. Curr. Addict. Rep. 2016;3:349. doi: 10.1007/s40429-016-0125-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Bellis MA, et al. Sexual uses of alcohol and drugs and the associated health risks: a cross sectional study of young people in nine European cities. BMC Public Health. 2008;8:155. doi: 10.1186/1471-2458-8-155. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Alamian A, Paradis G. Individual and social determinants of multiple chronic disease behavioral risk factors among youth. BMC Public Health. 2012;12:224. doi: 10.1186/1471-2458-12-224. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Havdahl, A. et al. Intergenerational effects of parental educational attainment on parenting and childhood educational outcomes: evidence from MoBa using within-family Mendelian randomization. Preprint at medRxiv10.1101/2023.02.22.23285699 (2023).
  • 10.Kipping RR, Smith M, Heron J, Hickman M, Campbell R. Multiple risk behaviour in adolescence and socio-economic status: findings from a UK birth cohort. Eur. J. Public Health. 2015;25:44–49. doi: 10.1093/eurpub/cku078. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Huesmann LR, Dubow EF, Boxer P. Continuity of aggression from childhood to early adulthood as a predictor of life outcomes: implications for the adolescent-limited and life-course-persistent models. Aggress. Behav. 2009;35:136–149. doi: 10.1002/ab.20300. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Mirza KAH, Mirza S. Adolescent substance misuse. Psychiatry. 2008;7:357–362. doi: 10.1016/j.mppsy.2008.05.011. [DOI] [Google Scholar]
  • 13.Fujiwara T, Kawachi I. Is education causally related to better health? A twin fixed-effect study in the USA. Int. J. Epidemiol. 2009;38:1310–1322. doi: 10.1093/ije/dyp226. [DOI] [PubMed] [Google Scholar]
  • 14.Viinikainen J, et al. Does better education mitigate risky health behavior? A mendelian randomization study. Econ. Hum. Biol. 2022;46:101134. doi: 10.1016/j.ehb.2022.101134. [DOI] [PubMed] [Google Scholar]
  • 15.Li H, et al. Can intelligence affect alcohol-, smoking-, and physical activity-related behaviors? A Mendelian randomization study. J. Intell. 2023;11:29. doi: 10.3390/jintelligence11020029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Kong, A. et al. The nature of nurture: effects of parental genotypes. Sciencehttps://www.science.org (2018). [DOI] [PubMed]
  • 17.Morris TT, Davies NM, Hemani G, Smith GD. Population phenomena inflate genetic associations of complex social traits. Sci. Adv. 2020;6:eaay0328. doi: 10.1126/sciadv.aay0328. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Shore, J. & Janssen, I. Adolescents’ engagement in multiple risk behaviours is associated with concussion. Inj. Epidemiol.7, 6 (2020). [DOI] [PMC free article] [PubMed]
  • 19.Rimfeld K, Kovas Y, Dale PS, Plomin R. Pleiotropy across academic subjects at the end of compulsory education. Sci. Rep. 2015;5:1–12. doi: 10.1038/srep11713. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Petrill SA, et al. Genetic and environmental influences on the growth of early reading skills. J. Child Psychol. Psychiatry. 2010;51:660–667. doi: 10.1111/j.1469-7610.2009.02204.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Krapohl E, Plomin R. Genetic link between family socioeconomic status and children’s educational achievement estimated from genome-wide SNPs. Mol. Psychiatry. 2016;21:437–443. doi: 10.1038/mp.2015.2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Akasaki M, Ploubidis GB, Dodgeon B, Bonell CP. The clustering of risk behaviours in adolescence and health consequences in middle age. J. Adolesc. 2019;77:188–197. doi: 10.1016/j.adolescence.2019.11.003. [DOI] [PubMed] [Google Scholar]
  • 23.Hair EC, Park MJ, Ling TJ, Moore KA. Risky behaviors in late adolescence: co-occurrence, predictors, and consequences. J. Adolesc. Health. 2009;45:253–261. doi: 10.1016/j.jadohealth.2009.02.009. [DOI] [PubMed] [Google Scholar]
  • 24.Bannink R, Broeren S, Heydelberg J, van’t Klooster E, Raat H. Depressive symptoms and clustering of risk behaviours among adolescents and young adults attending vocational education: a cross-sectional study. BMC Public Health. 2015;15:396. doi: 10.1186/s12889-015-1692-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.MacArthur, G. et al. Individual-, family-, and school-level interventions targeting multiple risk behaviours in young people. Cochrane Database Syst. Rev.2018, CD009927 (2018). [DOI] [PMC free article] [PubMed]
  • 26.Sanderson E, Davey Smith G, Windmeijer F, Bowden J. An examination of multivariable Mendelian randomization in the single-sample and two-sample summary data settings. Int. J. Epidemiol. 2019;48:713–727. doi: 10.1093/ije/dyy262. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Davies NM, et al. Multivariable two-sample Mendelian randomization estimates of the effects of intelligence and education on health. Elife. 2019;8:e43990. doi: 10.7554/eLife.43990. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Boyd A, et al. Professionally designed information materials and telephone reminders improved consent response rates: evidence from an RCT nested within a cohort study. J. Clin. Epidemiol. 2015;68:877–887. doi: 10.1016/j.jclinepi.2015.03.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Fraser A, et al. Cohort profile: the Avon Longitudinal Study of Parents and Children: ALSPAC mothers cohort. Int. J. Epidemiol. 2013;42:97–110. doi: 10.1093/ije/dys066. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Northstone, K. et al. Open Peer Review The Avon Longitudinal Study of Parents and Children (ALSPAC): an update on the enrolled sample of index children in 2019. Wellcome Open Res. 10.12688/wellcomeopenres.15132.1 (2019) [DOI] [PMC free article] [PubMed]
  • 31.Teyhan, A., Boyd, A., Wijedasa, D. & MacLeod, J. Early life adversity, contact with children’s social care services and educational outcomes at age 16 years: UK birth cohort study with linkage to national administrative records. BMJ Open9, e030213 (2019). [DOI] [PMC free article] [PubMed]
  • 32.White IR, Royston P, Wood AM. Multiple imputation using chained equations: Issues and guidance for practice. Stat. Med. 2011;30:377–399. doi: 10.1002/sim.4067. [DOI] [PubMed] [Google Scholar]
  • 33.Campbell R, et al. Multiple risk behaviour in adolescence is associated with substantial adverse health and social outcomes in early adulthood: findings from a prospective birth cohort study. Prev. Med. 2020;138:106157. doi: 10.1016/j.ypmed.2020.106157. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Karlsson Linnér R, et al. Genome-wide association analyses of risk tolerance and risky behaviors in over 1 million individuals identify hundreds of loci and shared genetic influences. Nat. Genet. 2019;51:245–257. doi: 10.1038/s41588-018-0309-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Okbay A, et al. Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals. Nat. Genet. 2022;54:437–449. doi: 10.1038/s41588-022-01016-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 2011;88:76–82. doi: 10.1016/j.ajhg.2010.11.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Deary, I. J. et al. Genetic contributions to stability and change in intelligence from childhood to old age. Nature10.1038/nature10781 (2012) [DOI] [PubMed]
  • 38.Lee SH, Yang J, Goddard ME, Visscher PM, Wray NR. Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood. Bioinformatics. 2012;28:2540–2542. doi: 10.1093/bioinformatics/bts474. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Smith GD, Ebrahim S. ‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? Int. J. Epidemiol. 2003;32:1–22. doi: 10.1093/ije/dyg070. [DOI] [PubMed] [Google Scholar]
  • 40.Sanderson E, Davey Smith G, Bowden J, Munafò MR. Mendelian randomisation analysis of the effect of educational attainment and cognitive ability on smoking behaviour. Nat. Commun. 2019;10:2949. doi: 10.1038/s41467-019-10679-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Brumpton B, et al. Avoiding dynastic, assortative mating, and population stratification biases in Mendelian randomization through within-family analyses. Nat. Commun. 2020;11:3519. doi: 10.1038/s41467-020-17117-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Davies NM, Holmes MV, Davey Smith G. Reading Mendelian randomisation studies: a guide, glossary, and checklist for clinicians. BMJ. 2018;362:k601. doi: 10.1136/bmj.k601. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Peer Review File (556KB, pdf)
Supplementary material (705KB, pdf)
Reporting Summary (2MB, pdf)

Data Availability Statement

The informed consent obtained from ALSPAC participants does not allow the data to be made freely available through any third-party maintained public repository. Data used for this submission can be made available on request to the ALSPAC Executive. The ALSPAC data management plan describes in detail the policy regarding data sharing, which is through a system of managed open access. Full instructions for applying for data access can be found here: http://www.bristol.ac.uk/alspac/researchers/access/. The GWAS summary statistics for both risk behaviours and educational attainment used in the analyses are available through the Social Science Genetic Association Consortium (SSGAC). Available through the SSGAC website: https://www.thessgac.org/.

All the code used to clean and analyse the data for this study is available: https://github.com/MichelleSpano/Risk-behaviours


Articles from Communications Biology are provided here courtesy of Nature Publishing Group

RESOURCES