Skip to main content
Nature Communications logoLink to Nature Communications
. 2022 Feb 8;13:751. doi: 10.1038/s41467-021-27942-w

Genomic epidemiology of SARS-CoV-2 in a UK university identifies dynamics of transmission

Dinesh Aggarwal 1,2,3,4,✉,#, Ben Warne 1,3,5,#, Aminu S Jahun 6, William L Hamilton 1,3,4, Thomas Fieldman 1,3, Louis du Plessis 7, Verity Hill 8, Beth Blane 1, Emmeline Watkins 9, Elizabeth Wright 9, Grant Hall 6, Catherine Ludden 1,2, Richard Myers 2, Myra Hosmillo 3,6, Yasmin Chaudhry 6, Malte L Pinckert 6, Iliana Georgana 6, Rhys Izuagbe 6, Danielle Leek 1, Olisaeloka Nsonwu 2, Gareth J Hughes 2, Simon Packer 2, Andrew J Page 10, Marina Metaxaki 1, Stewart Fuller 1, Gillian Weale 11, Jon Holgate 12, Christopher A Brown 13,14; The Cambridge Covid-19 testing Centre; University of Cambridge Asymptomatic COVID-19 Screening Programme Consortium; The COVID-19 Genomics UK (COG-UK) Consortium, Rob Howes 13, Duncan McFarlane 15, Gordon Dougan 1,5, Oliver G Pybus 7, Daniela De Angelis 2,16, Patrick H Maxwell 1,3, Sharon J Peacock 1,3, Michael P Weekes 3,17, Chris Illingworth 16,18,19, Ewan M Harrison 1,2,4,20,, Nicholas J Matheson 1,3,5,21,, Ian G Goodfellow 6,
PMCID: PMC8826310  PMID: 35136068

Abstract

Understanding SARS-CoV-2 transmission in higher education settings is important to limit spread between students, and into at-risk populations. In this study, we sequenced 482 SARS-CoV-2 isolates from the University of Cambridge from 5 October to 6 December 2020. We perform a detailed phylogenetic comparison with 972 isolates from the surrounding community, complemented with epidemiological and contact tracing data, to determine transmission dynamics. We observe limited viral introductions into the university; the majority of student cases were linked to a single genetic cluster, likely following social gatherings at a venue outside the university. We identify considerable onward transmission associated with student accommodation and courses; this was effectively contained using local infection control measures and following a national lockdown. Transmission clusters were largely segregated within the university or the community. Our study highlights key determinants of SARS-CoV-2 transmission and effective interventions in a higher education setting that will inform public health policy during pandemics.

Subject terms: Viral epidemiology, Viral transmission, Viral genetics, SARS-CoV-2, Epidemiology


In this study, Aggarwal and colleagues perform prospective sequencing of SARS-CoV-2 isolates derived from asymptomatic student screening and symptomatic testing of students and staff at the University of Cambridge. They identify important factors that contributed to within university transmission and onward spread into the wider community.

Introduction

The SARS-CoV-2 pandemic has caused substantial morbidity and mortality globally1,2. Universities have been considered conduits for transmission due to extensive social networks of young adults, many of whom live communally, and in-person teaching of large groups3. Outbreaks of SARS-CoV-2 have been observed in a number of higher education institutions, but the drivers for transmission in these settings are poorly understood4. It is speculated that infection dynamics are dependent on transmission chains involving student courses, residence, study year and social networks5. Understanding these dynamics is essential in order to devise effective infection control measures while minimising disruption to teaching, research and the mental health of students and staff6. Furthermore, while university students are less likely to develop severe COVID-19 disease, there is concern that university outbreaks could seed infections in more vulnerable populations, including staff, the local community, and upon returning home to older relatives7. Identifying possible sources of cross-transmission is therefore vital.

Although SARS-CoV-2 genome sequencing has clear utility to identify virus emergence and cryptic transmission8,9, no large-scale genomic studies in university settings have been conducted. The United Kingdom has an extensive community genomics surveillance programme through COG-UK10 which complements traditional contact tracing approaches by providing understanding of circulating viral populations.

We report the results of a genomic epidemiology study of SARS-CoV-2 across a complete term at the University of Cambridge (UoC). Importantly, these findings are from a study period prior to the established circulation of variants of concern and the availability of vaccination, with therefore fewer confounding factors. From 5 October to 6 December 2020, the UoC ran PCR-based symptomatic testing for all staff and students, and offered asymptomatic screening to 15,500 students living in university-managed accommodation. We therefore provide a unique study of SARS-CoV-2 infection that encompasses pre-symptomatic and asymptomatic students11. Positive samples from the UoC were sequenced and compared with systematic surveillance SARS-CoV-2 sequences from the local community. The results were analysed in conjunction with epidemiological data derived from the screening programme and national contact tracing. Overall, we describe introductions of SARS-CoV-2 into a higher education setting, the dynamics of transmission both within the university and between the university and the surrounding community, and the impact of local and national measures to control the spread of SARS-CoV-2 infections.

Results

In total, 972 SARS-CoV-2 cases were identified among university students and staff over the course of term (5 October to 6 December 2020). High-quality genomes were generated from 446/778 (57.3%) positive cases from the university testing programme, from 107/266 (40.2%) cases identified through the Healthcare worker (HCW) screening programme (95 HCWs, 8 students, 4 university staff) and 104 patients identified by hospital testing (71 SARS-CoV-2 positive patients from Cambridge University Hospitals (CUH) and 33 from other medical facilities in Cambridgeshire). A further 797 local cases identified by community testing during the study period were present within the COG-UK dataset, of which 17 were identified as students, 7 as university staff and 26 as HCWs (Fig. 1). Of all identified SARS-CoV-2 cases from Cambridgeshire (university and community) during this period, 8.0% were sequenced (Supplementary Fig. 1).

Fig. 1. Study cohort and available genome sequences.

Fig. 1

*Includes 14 students identified through ad hoc asymptomatic screening conducted as part of an outbreak investigation by the University of Cambridge in conjunction with local public health authorities, responding to increased rates of infection in a block of student accommodation (described in further detail in cluster 2 below). **Includes two students associated with a single sequenced pooled sample (see supplementary methods). CUH Cambridge University Hospitals.

SARS-CoV-2 lineages and transmission clusters

Over the 9-week term, 62 Pango lineages were identified across the university and community (Fig. 2a, c). In the university, 23 Pango lineages were identified, and 438/482 (90.9%) cases were from just 4 lineages (B.1.60.7, B.1.177, B.1.36, B.1.177.16), all of which were detected by the second week of term. Twelve lineages were only observed after the second week of term and accounted for 6.9% cases. By comparison, 57 lineages were identified in the local community over the same 9-week period. Viral genomes containing mutations in the spike protein that have been linked to decreased sensitivity to antibody-mediated immunity or impact viral transmission were observed in the university population: three sequences from the B.1.258 lineage containing the N439K mutation and ∆H69/∆V70; two cases of B.1.1.7/alpha variant and its associated mutations12; and 88 cases of B.1.177 with the A222V mutation13. Of these, Pango lineage B.1.1.7 is most reliably associated with increased transmission14; both cases of B.1.1.7 were amongst postgraduate students with no epidemiological links, during national lockdown, and failed to transmit further within the university.

Fig. 2. Genomic diversity of SARS-CoV-2 in the university and community.

Fig. 2

a Maximum likelihood tree showing that the majority of lineages from university isolates were distinct from community isolates. The node leaves (branch tips) show case location and global PANGO lineage is illustrated in the vertical bar. b Time-scaled coalescent tree including university members and local community isolates from study period with visible segregation between the two groups. College affiliation is shown for university members in the second set of vertical columns, highlighting the ‘top nine’ colleges by cluster 1 prevalence. c Epidemic curves demonstrating a steeper decline in SARS-CoV-2 cases in the University of Cambridge (i) compared to the local community (ii), with associated lineages. Only cases with available genomes are included. University term ran from the week commencing October 5 to the week commencing November 30. The light blue shaded area reflects a 4-week national lockdown in the UK, which was associated with a large fall in COVID-19 cases in University students. Specific lineages highlighted are the four largest lineages within the University (minimum 20 cases over the study period) and the community (minimum 50 cases over the study period). For (i), weekly individual case ascertainment for staff and students testing positive for SARS-CoV-2 through both symptomatic and asymptomatic testing pathways provided at the University of Cambridge is indicated. For (ii), weekly cases with genomes available from the local community are shown. Source data are provided as a Source Data file.

In total, 198 putative transmission clusters were defined by CIVET (https://github.com/artic-network/civet). Only 8/36 clusters with university cases contained five or more university members (range 6–337), which together represented 91.3% of all university cases, signifying that the majority of introductions into UoC did not cause ongoing transmission. To further investigate the largest of these, cluster 1 described below, we identified groups of identical samples (0 SNP differences) which produced 19 additional clusters (a total of 34 clusters with >2 university cases) for further analysis.

Determinants of viral spread across the university

To determine transmission dynamics following introduction into the university, we performed a detailed investigation of the largest genomic cluster (Cluster 1), which accounted for 337/484 (69.6%) sequenced university cases (Fig. 3). This was widely dispersed across the university by the middle of term, affecting students from 29/31 colleges, 28 undergraduate courses and 208 households in university accommodation alone (Fig. 4).

Fig. 3. Emergence and transmission of SARS-CoV-2 in a large university cluster.

Fig. 3

a Time-scaled phylogenetic tree of largest university cluster (cluster 1) derived from the BDSKY model implemented in BEAST 2.6 (Fig. 5). The left-sided heatmap is coloured by case location, and the right-sided heatmap is coloured by student college affiliation, highlighting the top nine colleges by cluster 1 prevalence. Cluster 1 was widely dispersed across the university with limited transmission into the community. b Frequency of Lineage B.1.160.7 (to which cluster 1 belongs) in each region of the UK and the University of Cambridge. Regions are defined as ‘Nomenclature of territorial units for statistics’ (NUTS) regions, where the UK has 9 regions. It is visible that the lineage B.1.160.7 was first sequenced in Wales, and then in the neighbouring South West of England, before becoming prevalent within the University of Cambridge. The lineage remained infrequently detected in the community populating the wider surrounding region (Cambridgeshire, East Anglia, Bedfordshire and Hertfordshire, and Essex, making up East of England) throughout the university term. c A continuous transmission chain of SARS-CoV-2 infections in cluster 1 commenced with a single introduction. Relationships between individuals in cluster 1 were calculated within A2B-COVID. Colours denote potential transmission events from the donor (vertical axis) to the recipient (horizontal axis) that are consistent with transmission12 or which are borderline possibilities (yellow). The plot shows that the data are consistent with a continuous transmission chain of SARS-CoV-2 infections in cluster 1 occurring via a single introduction; there are multiple potential networks of transmission events between these individuals for which each event would be consistent with a statistical model of direct transmission. We note that individuals in this plot are ordered by the date of the first positive COVID test. Source data are provided as a Source Data file.

Fig. 4. Demographics of Cluster 1 across the first university term.

Fig. 4

a Cumulative number of colleges involved in the cluster. Cases included in this cluster were between a number of colleges early during the university term. b Frequency of cases involved in the cluster by year of study. c Frequency of cases involved in the cluster by course type. Source data are provided as a Source Data file.

Cluster 1 was classified as belonging to Pango lineage B.1.160.7. No mutations previously noted to be associated with increased transmissibility were observed in this lineage compared to other genomes in the study. Interrogation of the entire COG-UK dataset of samples from 2020 showed that this lineage was first identified in the UK on 4 October 2020, in Wales, before becoming predominantly sampled in the UoC (Fig. 3b). The B.1.160.7 lineage was not identified in the local community until term week 3 (19–25 October 2020). This was supported by the median estimate of the time to the most common recent ancestor of cluster 1, in comparison to its most closely related cluster from Cambridgeshire community isolates of 165 days (C.I. 127–207) prior to the start of term (6 October 2020). Together, these results suggest the university cases were introduced from outside Cambridgeshire. Additional analysis with A2B-COVID15, which uses genomic data alongside timing of infection data to evaluate plausibility of transmission between individuals, we showed that these sequences were consistent with a single introduction into the university (Fig. 3c).

National and university contact tracing data were used to identify the initial source of dispersion of this cluster. Ten students from the first two weeks of term reported visiting the same nightclub (venue A). Nine individuals either had an isolate from cluster 1 or (in the event that their sample did not yield a high-quality sequence) were household contacts of an individual with a sequenced cluster 1 isolate. No information was available for one student (Supplementary Fig. 5).

Transmission of cluster 1 was sustained from the first week of term until a national lockdown was enforced on 5th November. Students testing positive in the two weeks around lockdown reported common exposure events predominantly linked to nightclub venues (25/59 (42.4%) of exposures external to the university reported by 48 students). Venue A, identified above as the possible source of dispersion of this cluster at the start of term, was also the most common venue identified in the two weeks around lockdown (n = 16). 9/16 cases had sequences in cluster 1, and a further five individuals (where no sequence was available) were household contacts of sequenced cases in cluster 1 (Supplementary Fig. 6).

To determine the impact of lockdown and other control measures within the university, a birth-death skyline model16 was used to measure changes in the effective reproduction number (Re) within cluster 1. The model indicated an initial Re at the start of term that was slightly larger than 1, albeit with wide uncertainty (median 1.14; 95% HPD: 0.27–2.21 on 5 October). Over the next 2 weeks Re continued to rise (median 1.52; 95% HPD 0.94–2.22 on 15 October) followed by a subsequent gradual decline over the next 2 weeks (Fig. 5a). There was a rise immediately prior to the start of lockdown (median 1.55; 95% HPD 1.25–1.86 on 5 November), followed by a steep decrease thereafter (median 0.23; 95% HPD 0.07–0.41 on 19 November) (Fig. 5a), consistent with declining absolute numbers of SARS-CoV-2 infections seen during this time (Fig. 2c). The model estimated the median effective infectious period for individuals in the cluster at 3.03 days (95% HPD: 2.44–3.59 days) (Fig. 5b). As the model does not explicitly incorporate an incubation period and assumes that individuals cannot transmit after being sampled, the effective infectious period represents the mean time from infection until testing positive and assumes perfect infection control measures thereafter. Estimates of Re and the effective infectious period are robust to model parameterisations (Supplementary Figs. 810). Sampling proportion estimates largely overlap with empirical estimates based on the number of positive cases that were sequenced during each week (Fig. 5c). Although sampling proportion estimates are sensitive to the prior specifications, Re estimates are unaffected (Supplementary Fig. 11).

Fig. 5. Effective reproduction number and infectious period of SARS-CoV-2 from a dominant university cluster.

Fig. 5

A 20-epoch birth-death skyline model shows the effect of local infection control measures and the national lockdown on the effective reproduction number (Re), and estimates of the mean effective infectious period as 3.03 (95% HPD = 2.44-3.59) days. a Re posterior estimates (dark shading = 50% HPD; light shading = 95% HPD). The dotted line indicates the start of term and the light blue shaded area the 4-week national lockdown in the UK, which was associated with a large fall in COVID-19 cases in University students. The red dashed line indicates Re= 1. b Effective infectious period posterior estimates (shaded region = 95% HPD; dashed line = median). c Weekly sampling proportion posterior estimates (dark shading = 50% HPD; light shading = 95% HPD). The red dashed line indicates the empirical sampling proportion estimates for each week in term (number of sequenced genomes from all University clusters divided by the number of positive tests among University staff and students). Source data are provided as a Source Data file.

Transmission within university households

There was evidence of transmission of SARS-CoV-2 in student accommodation in 18/34 university clusters. In cluster 1, 169/337 (50.1%) students had a virus genome sequence identical to at least one other student living in the same or neighbouring household (sub-clusters within 0 SNPs ranging between 2 and 11 students).

The largest cluster associated with transmission in accommodation was cluster 2 (lineage B.1.36). By term week 3, this cluster involved 30 students, of which 24 (80%) lived in the same accommodation block in College A and 4 students lived in two separate households in the same college (Supplementary Fig. 12). Interventions from the university, supported by local public health authorities, included isolation of all households in the main accommodation block and individual screening offered to all students. Half of all cases in this cluster were diagnosed by asymptomatic screening. No further genomically-related isolates were identified after term-week 3, indicating a successful intervention, and cessation of transmission.

To quantify the importance of household transmission, a Reed-Frost Chain Binomial Model was employed to estimate the household attack rate. Using A2B-COVID15, we identified 265 households in which the data were consistent with only 1 introduction of SARS-CoV-2. The per household contact probability that an infected person passed on the virus to an uninfected individual within the same household was estimated at 7.8% (95% C.I. 6.9–8.7%).

Further genomic clusters where transmission between household members was implicated are outlined in Supplementary Table 1. They follow similar patterns, with groups of cases confined to a single college not leading to sustained transmission.

Other transmission routes among university members

In addition to household transmission, there was evidence of viral spread between students in the same course and year of study in 14/34 genomic clusters, with the highest proportion being students in their first year of study. In cluster 1, 203/337 (60.2%) students had an identical isolate to at least one other student studying the same course in the same year (cluster size range 2–14 students). Statistical modelling using data from cluster 1 across the term showed a bias towards infections being observed in first year students (p-value = 0.002) (Supplementary Fig. 13, model details in Supplementary Methods). Two further small clusters were comprised of postgraduate students working in the same university department. However, we were not able to determine the probable location of transmission in most cases: there is considerable overlap between course and household clusters, and complex social and study networks exist between students (illustrated in Supplementary Table 1, for example in clusters 3, 4 and 10). Of note, 23/34 clusters with 2 or more genomically linked cases in the dataset contained at least one university member that could not be epidemiologically linked with any other case in their cluster.

The number of SARS-CoV-2 sequences from university staff members were limited in comparison to students (n = 30). There was evidence of transmission between staff members working in the same department, college or ancillary role in four genomic clusters. Two clusters contained staff members who shared the same household. There are 8 clusters involving both university staff and students. However, epidemiological associations between these two groups could only be identified in one cluster: a shared household between a student and staff member working in separate university departments.

Transmission between the university and local community

We next sought to address the degree of transmission between the university and the local community. Two distinct phylogenetic approaches, shown in Fig. 2, demonstrate segregation of the majority of community and university cases into separate clusters and therefore a lack of substantial cross-transmission. 29/198 (14.6%) transmission clusters contained both university and community cases. Only six clusters contained five or more university cases and included three or more community cases.

To identify transmission clusters involving university and hospital (patient and healthcare worker) cases, we ran CIVET (https://github.com/artic-network/civet) separately with these cases for a focused phylogenetic analysis of this setting. Associations were identified between the university and hospital settings, with 17 clusters involving both university members and either patients or staff. Cluster 1 (69.6% of student cases), contained only 1 patient and 1 healthcare worker with no identifiable epidemiological link to students. The remaining 16 clusters comprised 133 individuals, including 26 patients, 55 hospital staff or their family members and 52 university members (including 18 staff and 15 clinical medical students). The second-largest cluster of university members (n = 21 university and hospital cases) included nine medical students, five healthcare workers and two patients. Phylogenetically, the medical students and one of the healthcare workers were closely linked (Supplementary Fig. 14) and analysis of these cases with A2B-COVID15 confirmed the plausibility of transmission. All 9 medical students were on clinical rotations at the time of diagnosis of the index case; 7/9 lived in neighbouring households in the same college and the remaining two were named contacts of the index student. Plausible transmission events between this group and the other cluster members were refuted using A2B-COVID (Supplementary Fig. 14).

To further investigate epidemiological associations in clusters involving university members and the local community, 1243/1455 of the cases sequenced over the sampling period were linked to national contact tracing data (excluding hospital cases). 219 (17.6%) cases reported 127 common exposure events. Cluster 1, representing 69.6% of cases within the university, included only 17/976 (1.7%) community cases; only one community case had a common exposure with a university student, dining at the same restaurant. No other epidemiological links were identified in all other genomic clusters. Transmission suspected in 19 epidemiologically linked clusters defined by common exposures was refuted by phylogenetic variation (i.e. identified in separate transmission clusters as defined by CIVET).

Discussion

We report the first comprehensive and integrated epidemiological and genomic analysis of SARS-CoV-2 transmission in a higher education setting. Following a limited number of introductions, the majority of cases were linked to a single genetic cluster, that was likely to have dispersed across the university following multiple social gatherings at a nightclub. There was considerable transmission associated with student accommodation and student courses, but minimal evidence of transmission within departments, or between students and staff. We observe the great majority of transmissions occur either within the university or within the local community. Finally, we present evidence demonstrating the efficacy of university measures and national lockdown in reducing COVID-19 cases.

Nearly 70% of all university cases belonged to one genetic cluster (cluster 1), introduced into the UoC by the arrival of students and likely forming a single transmission chain. A nightclub was implicated as an important transmission event at the start of term and again prior to lockdown. This corroborates previous studies identifying such venues as a risk factor for substantial SARS-CoV-2 transmission17,18. We urge a cautious approach to the access of such venues during a SARS-CoV-2 pandemic, particularly in the context of a young susceptible student population.

Our data suggest a substantial change in case numbers and the effective reproduction number over the course of the term. This likely reflects a combination of changes in student behaviour and effective interventions to reduce transmission. Overall, we note that incidence and the effective reproductive number within the university are lower than in other higher education settings and the general UK young adult population during the study period19. We highlight a limited number of introductions and low lineage diversity in the university compared to the surrounding community. While the natural extinction of lineages is relatively common20, multiple genetically diverse clusters may be expected given the congregation of students from across the globe (international students make up 35% of students in college accommodation)11. The lack of diversity may reflect the impact of robust and widely implemented university infection control measures maintained throughout the term, full details of which are provided in the Supplementary Materials, but include social distancing, mask wearing and quarantine of international students at the beginning of term.

There was an initial rise in cases over the first two weeks, coinciding with the first week of term and university Freshers week. This is known to be a period of more intense social mixing between students in venues both inside and outside university premises. Between term weeks three and five there was a fall in the effective reproductive number, which coincides with both a reduction in social mixing and the identification of, and subsequent university measures to control, transmission events identified in college residences. In multiple clusters, transmission in student households was successfully interrupted through a combination of measures provided by the university, including rapid case identification through asymptomatic screening, readily available symptomatic testing, contact tracing and comprehensive support provided by colleges for cases and their contacts while in isolation. Further details, including the elaboration of the specific measures to control cluster 2, an outbreak associated with a large accommodation block described above, are provided in the Supplementary Materials. Although we have demonstrated that transmission between students in the same accommodation block is an important factor in the spread of SARS-CoV-2, we report a lower secondary household attack rate (7.8%) than that identified in domestic households (16.6–21.1%) and a lower than expected effective infectious period (3.0 days)21.

University measures may have been less successful in controlling transmission in settings outside colleges. There was a rise in the effective reproduction number coinciding with the announcement of a national lockdown on 31 October, to begin on 5 November 2020. This announcement prior to implementation of a major socially restrictive public health measure, alongside existing Halloween festivities, may have led to increased levels of behaviour associated with a higher risk of transmission. This supports either reducing the time from announcement to implementation of socially restrictive measures, or the need for a targeted public health campaign to limit high-risk activities where this is not possible. In addition, having identified considerable transmission between students on the same course, we suggest that further mitigation of viral spread may be obtained by implementing shared student accommodation based on university courses.

The national lockdown dramatically reduced case numbers within the university, at a faster rate than the local community, demonstrating high levels of compliance from our study population with an effective control strategy. Contemporary studies conducted elsewhere in the UK have demonstrated that adherence to COVID-19 prevention measures, such as national lockdown, are mixed22. Although young age is a risk factor for poor adherence, other associations are less common within the university population, such as having a dependent child in the household, financial hardship and working in a key sector. Although no direct incentives were provided to students, the expectation of individuals to adhere to rules was communicated widely in both national and university media. We also believe that the key to the successful implementation of lockdown was the additional support provided by the collegiate university, ranging from the practical provision of food and drink through to the pastoral and community support provided by established networks of staff, tutors and student representatives.

Finally, we observed limited transmission between the university and the local community. The largest university cluster, accounting for the majority of student infections, was largely phylogenetically distinct from community cases. Further, epidemiological evidence describing common exposures for community and university cases was sparse. However, clinical medical students were disproportionately represented within community clusters. This is an important epidemiological link between secondary care and the university; we highlight this group as being at-risk for both acquisition and transmission of SARS-CoV-2 and medical students should therefore be prioritised for interventions such as vaccination.

A combination of contact tracing and genomics was instrumental to understanding transmission within the university and with its surrounding population; notably in refuting transmission within epidemiologically linked clusters. We advocate for a combined genomic epidemiological approach to inform outbreak investigations as used in other settings8,23.

This study has a number of limitations. Incomplete sampling and subsequent sequence filtering in both the university and community should be considered when interpreting transmission; the asymptomatic and active case ascertainment in this study should mitigate this discrepancy. The lower community case ascertainment may result in unobserved transmission chains (such as those when assessing the introduction of Pango lineage B.1.160.7 into the university). Further, epidemiological links are dependent on self-reporting and therefore some data will be missing; whilst a lack of epidemiological association between groups in clusters is important and reassuring (such as between staff and students), it does not confirm a lack of transmission. We highlight shared student courses as a risk factor for transmission; this does not take into account the setting of transmission, i.e., during educational or social activities. Finally, the UoC is distinct in its collegiate structure with limited integration with the community; any generalisation of conclusions should be tempered by the study setting.

We present the first comprehensive integrated epidemiological and genomic evaluation of transmission of SARS-CoV-2 within a university. The insights gained will inform public policy regarding infection control measures in higher education settings. We find containment of transmission in student accommodation necessary to mitigate onward propagation. We highlight the importance of targeted public health measures towards nightclub venues to limit transmission. Critically, these findings are likely to be informative for future pandemic preparedness.

Methods

Ethics

The COG-UK study protocol was approved by the Public Health England Research Ethics Governance Group (reference: R&D NR0195). Public Health England affiliated authors had access to identifiable Cambridgeshire community case data. This data was processed under Regulation 3 of The Health Service (Control of Patient Information) Regulations 2002- permitting the processing of confidential patient information for communicable disease and other risks to public health and as such, individual patient consent is not required. Other authors only had access to anonymised or summarised data. Ethical approval for the UoC asymptomatic COVID-19 screening programme was granted by the UoC Human Biology Research Ethics Committee (HBREC.2020.35) with informed consent gained from participants.

Study setting

The UoC has ~23,000 students and 12,600 staff. The university is divided into 31 colleges and 150 departments, faculties and other institutions. Students belong to a college community, as well as being members of the university and an academic faculty/department. Colleges provide residential accommodation for approximately two thirds of students, either on campuses or in off-site housing, and offer social and sports activities, pastoral and academic support for each individual24. All colleges have membership from students across multiple courses. The university is based in the City of Cambridge (which has an estimated population of 123,90025), in the county of Cambridgeshire (estimated population 855,796 people in 201926) in the East of England.

Participants and samples

Samples were derived from university symptomatic testing and asymptomatic COVID-19 screening programmes between 5 October 2020 and 6 December 2020, covering the full term. Testing for all symptomatic students and staff was available on weekdays. The asymptomatic screening programme has been described in detail elsewhere11. In brief, screening was offered on a voluntary basis to all students residing in accommodation owned or managed by a college or the Cambridge Theological Federation. In total, 15,561 students were eligible to participate. To optimise testing efficiency, multiple swabs were pooled into the same tube of viral transport medium at the time of sample collection. Testing pools varied in size from 1 to 10 students, with each devised to include one or more student households as far as possible11. In this study, households are defined as individuals who share a kitchen, bathroom and/or lounge facilities. The members of any pool testing positive were re-tested using individual confirmatory PCR tests to confirm the result and identify the positive subject(s) (see Supplementary Methods for further details including infection prevention control measures). Only samples from individuals that were confirmed positive upon the re-testing were used for sequencing.

SARS-CoV-2 strains circulating in the local community were identified from the COG-UK dataset for Cambridgeshire. These data were derived from local community samples from non-hospitalised, symptomatic individuals, who requested a free diagnostic test via national community testing. Other samples were derived from patients treated at three Cambridgeshire hospital trusts: Cambridge University Hospitals NHS Foundation Trust (a teaching hospital providing secondary care services for Cambridge and the surrounding area as well as tertiary referral services for the East of England and surge capacity for COVID-19); Royal Papworth Hospital NHS Foundation Trust (specialist heart and lung hospital, also providing surge capacity for COVID-19); Cambridgeshire and Peterborough NHS Foundation Trust (provider of community, mental health and learning disability services in Cambridgeshire). Hospital samples were obtained from both asymptomatic screening and those exhibiting COVID-19 symptoms. Finally, samples were derived from the asymptomatic HCW programme at Cambridge University Hospitals27.

Sequencing

Positive samples from UoC testing with a PCR cycle threshold value ≤33 were selected and sequenced using the GridION platform (Oxford Nanopore). All Cambridgeshire samples sequenced between 24th September and 21st December 2020 were included to overlap with the university term. Samples from the local Cambridgeshire community and hospital cases (described above) were collected as part of national SARS-CoV-2 testing, and sequenced at one of seventeen COG-UK sequencing sites (further details in Supplementary Methods). The samples were prepared using either the ARTIC28 or veSeq29 protocols, and were sequenced using Illumina or Oxford Nanopore platforms. Genomic data were filtered to exclude sequences with >5% Ns and those of spuriously low file sizes (<29 KB). Genomes were aligned with minimap230 to the Wuhan Hu-1 reference genome (MN908947.3), collected December 2019. All samples were processed through COVID-CLIMB pipelines31,32. Protocols are available at https://github.com/COG-UK.

Phylogenetic analysis

Maximum likelihood phylogenetic trees were estimated using IQ-TREE (version 2.1.2 COVID-edition)33 and rooted using Wuhan Hu-1 (MN908947.3) as an outgroup. Trees were constructed using the GTR + Γ substitution model34, as determined by ModelFinder35. Branch support statistics were generated using the ultrafast bootstrap method36. TempEst37 was used to explore the temporal signal in the data. Trees were visualised, explored, and labelled with associated metadata using Microreact38 to identify epidemiological links supported by the genomic data. Specified mutations were identified using type_variants (https://github.com/cov-ert/type_variants). Possible transmission clusters were defined by extracting phylogenetic neighbourhoods identified using the CIVET tool (version 2.1.0) on 11 January 2021 (https://github.com/artic-network/civet). In selected clusters, further evaluation was conducted using A2B-COVID15. A2B-COVID evaluates data from individuals in a pairwise manner. Using viral genome sequences from two individuals, alongside data describing the timing of infection, it evaluates whether or not these data are consistent with a hypothesis that SARS-CoV-2 was transmitted directly from one individual to the other; data from each pair is described as being either consistent, borderline, or unlikely to have been observed given this hypothesis. Where indicated, collapsed nodes from trees generated from CIVET were inspected to visualise data in the context of the COG-UK national database (https://www.cogconsortium.uk/). For further evaluation of transmission in the largest cluster identified by CIVET, pairwise SNP differences between sequences were determined using SNP-dist (https://github.com/tseemann/snp-dists/releases/tag/v0.7.0).

Lineages

Global Pango Lineages39 were assigned to each genome using Pangolin (https://github.com/cov-lineages/pangolin/releases/tag/v2.1.6) with analyses performed on COVID-CLIMB32 (further details in Supplementary Methods).

Molecular clock and phylodynamic analyses

BEAST v1.10.440 was used to perform a time-scaled phylogenetic analysis using an exponential growth coalescent treeprior and a GTR + Γ substitution model including all university and community high-quality genomes from the study period. As there was a lack of clear temporal signal in our dataset due to the relatively short time period analysed, the substitution rate was fixed to 8 × 10−4 substitutions per site per year (s/s/y) under a strict clock model in line with previous SARS-CoV-2 analyses13,4144. Two chains of 100 million iterations were run independently to ensure convergence to the correct posterior distribution. Convergence was assessed using Tracer45, and 10% of states were removed to account for burn-in. Finally, a maximum clade credibility (MCC) tree was generated using TreeAnnotator.

To estimate the effective reproduction number (Re) and infectious period of SARS-CoV-2 over the term, a dominant clade (representing 69.6% of all university genomes) was selected and all community genome sequences that cluster with it incorporated, resulting in a total of 354 genomes. A Bayesian birth-death skyline (BDSKY) model16 was employed using BEAST v2.646. A GTR + Γ substitution model was used along with a strict clock model, placing a lognormal prior with mean 8 × 10−4 s/s/y (in real space) and standard deviation 0.1 on the clock rate. A lognormal prior with mean 0 and standard deviation 1 was placed on Re and a Beta prior with ɑ = 5 and β = 5 was placed on the sampling proportion. Re was parameterised into 20 epochs, equidistantly spaced between the origin time and the most recent sequence collection date. The sampling proportion was fixed to 0 before the first week of term and estimated for each week thereafter. The rate at which infected patients become non-infectious was assumed to be constant and a lognormal prior with mean 48.7 years−1 (in real space) and standard deviation 0.25 was placed on it, resulting in a prior mean effective infectious period between ~5 and ~15 days. To test the robustness of the posterior estimates different parameterisations were used for Re and the sampling proportion, and the sampling proportion prior was varied. Further details are provided in the supplementary methods. To test the robustness of posterior estimates to the clock rate prior all analyses were repeated using a lognormal prior with mean 1 × 10−3 s/s/y (in real space) and standard deviation 0.1 on the clock rate. Finally, to test the assumption of a strict clock model, analyses were repeated using an uncorrelated lognormally distributed relaxed clock model47. In these analyses the 95% HPD interval of the coefficient of variation of the clock rate did not exclude 0, indicating poor support for a relaxed clock model in this dataset. Furthermore, estimates of the BDSKY model parameters did not differ significantly from estimates under a strict clock model. Therefore, we only show results under a strict clock model. For all models three chains of 200 million iterations were run independently. Convergence was assessed using the R-package coda48, and 10% of states were removed to account for burn-in. MCC trees were generated using TreeAnnotator.

Household attack rates

A2B-COVID15 was used to exclude households for which the sequence and epidemiological data were inconsistent with a single viral introduction to the household. A chain binomial model was then used to estimate the probability that an infected person transmitted the virus to an uninfected person within the same household (further details in supplementary methods).

Epidemiological data

University student demographic data were derived from the UoC student electronic record system CamSIS, and household structure and membership data from the UoC asymptomatic screening programme. To identify university affiliated cases (students and staff) and hospital staff accessing the national SARS-CoV-2 testing service, Second Generation Surveillance System (SGSS) and contact-tracing data provided by NHS Test and Trace (T&T) data were interrogated. Epidemiologically linked common exposures for students, university staff and the local community were identified through T&T data. Common exposures were defined by T&T as locations or events that two or more people testing positive for COVID-19 visited in the same two to seven day period before symptom onset or positive test. Additional contact tracing information was also provided by the UoC COVID helpdesk. These data were compared with observed phylogenetic clusters to determine potential sources of transmission and determine the extent of transmission between the university and community.

Epidemiological data from UoC were initially compiled in Microsoft Azure SQL and Excel 2013 (Microsoft) and analysed in STATA 14.2 (College Station, TX, USA). Further data manipulation, statistical analysis and figure generation was undertaken with RStudio (version 1.3.1093) using R (version 4.0.2). Network diagrams were produced with R package iGraph (v1.2.6).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Supplementary information

Peer Review File (488.2KB, pdf)
41467_2021_27942_MOESM3_ESM.pdf (66.9KB, pdf)

Description of Additional Supplementary Files

Supplementary Data 1 (54KB, xlsx)
Supplementary Data 2 (9.7KB, xlsx)
Reporting Summary (4MB, pdf)

Acknowledgements

Authors A.S.J., W.H. and T.F., and authors L.d.P. and V.H. contributed equally. We thank members of the COVID-19 Genomics Consortium UK and NHS Test and Trace contact tracers for their contributions to generating data used in this study. We thank the Sanger Covid Team for assisting with Samples and Logistics. We are grateful to all students and staff at the University of Cambridge who have contributed to the COVID-19 response during Michaelmas Term. We are grateful to all staff members of the Cambridge COVID-19 Testing Centre for generating qPCR data. D.A. is a Wellcome Clinical PhD Fellow and gratefully supported by the Wellcome Trust (Grant number: 222903/Z/21/Z). B.W. receives funding from the University of Cambridge and the National Institute for Health Research (NIHR) Cambridge Biomedical Research Centre (BRC) at the Cambridge University Hospitals NHS Foundation Trust. I.G. is a Wellcome Senior Fellow and is supported by the Wellcome Trust (Grant number: 207498/Z/17/Z and 206298/B/17/Z). E.M.H. is supported by a UK Research and Innovation (UKRI) Fellowship: MR/S00291X/1. C.J.R.I. acknowledges Medical Research Council (MRC) funding (ref: MC_UU_00002/11). NJM is supported by the MRC (CSF MR/P008801/1) and NHSBT (WPA15-02). A.J.P. gratefully acknowledge the support of the Biotechnology and Biological Sciences Research Council (BBSRC); their research was funded by the BBSRC Institute Strategic Programme Microbes in the Food Chain BB/R012504/1 and its constituent project BBS/E/F/000PR10352, also Quadram Institute Bioscience BBSRC funded Core Capability Grant (project number BB/CCG1860/1). L.d.P. and O.G.P. were supported by the Oxford Martin School. This research was supported by the NIHR Cambridge BRC. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR, or the Department of Health and Social Care. The COVID-19 Genomics UK Consortium is supported by funding from the MRC part of UK Research & Innovation (UKRI), the National Institute of Health Research and Genome Research Limited, operating as the Wellcome Sanger Institute. The Cambridge Covid-19 testing Centre is funded by the Department of Health and Social Care, UK Government. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. For the purpose of Open Access, the author has applied a CC-BY public copyright licence to any Author Accepted Manuscript version arising from this submission.

Source data

Source Data (74.9KB, xlsx)

Author contributions

All authors read the manuscript and consented to its publication. Where a = conceptualization; b = methodology; c = software; d = validation; e = formal analysis; f = investigation; g = resources; h = data curation; i = writing—original draft preparation; j = writing—review and editing; k = visualization; l = supervision; m = project administration; n = funding acquisition, D.A. contributed a, b, c, d, e, f, h, i, j, k, m; B.B. contributed f, g, h, j, m; CB f, g, h; D.D.A. contributed j, l; G.D. contributed l; L.d.P. contributed b, c, e, g, h, j, k; T.F. contributed a, b, d, e, f, h, j; S.F. contributed f, g, h; I.G. contributed e, f, h; Y.C. contributed e, f, h; I.G.G. contributed a, b, f, g, j, l, m, n; Gr.H. contributed e, f, h; W.L.H. contributed a, b, c, e, j, k; E.M.H. contributed a, b, g, j, l, m, n; V.H. contributed b, c, e, j; J.H. contributed f, g, h; M.H. contributed e, f, h; R.H. contributed b, d, g, h; Ga.H. contributed f, h; R.I. contributed e, f, h; C.I. contributed b, c, e, h, j, k, l; A.J. contributed b, e, f, h, j; D.L contributed f; C.L. contributed a, g, j, n; D.M. contributed g, l, m; N.J.M. contributed a, b, j, l, m, n; P.H.M. contributed j, l, m, n; M.M. contributed f, g, h; R.M. contributed j, l, m; O.N. contributed f, h; S.P. contributed b, d, g, h; A.J.P. contributed l; S.J.P. contributed a, b, g, j, l, m, n; M.L.P. contributed e, f, h; O.G.P. contributed j, l; B.W. contributed a, b, d, e, f, h, j, k, m; Em.W. contributed a, j; G.W. contributed f, g, h; M.W. contributed j, m, n; El.W. contributed f, h, j; Cambridge Covid-19 Testing Centre contributed b, d, g, h; University of Cambridge Asymptomatic COVID-19 Screening Programme Consortium contributed b, d, g, h; The COVID-19 Genomics UK (COG-UK) Consortium contributed b, d, g, h.

Peer review

Peer review information

Nature Communications thanks Sébastien Calvignac-Spencer, Joep de Ligt and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Data availability

The Assembled/consensus genomes generated in this study have been deposited in the GISAID49 database and raw reads are available from European Nucleotide Archive (ENA)50 under accession PRJEB37886. Pooled sample sequence raw reads and assembled sequences are deposited in the NCBI Sequence Read Archive Database (SRA; https://www.ncbi.nlm.nih.gov/sra) under the BioProject accession number PRJNA779279.

ENA and Genbank accession codes for individual sequences used in this study are available in supplementary materials (Supplementary Data 1 and 2). All genomes, phylogenetic trees and basic metadata are available from the COG-UK consortium website (https://www.cogconsortium.uk/data). Limited public metadata, analysis files, and processed genomic data for this work are available from GitHub at https://github.com/COG-UK/camb-uni-phylo/ (10.5281/zenodo.564335451), which also contains a list of ENA and Genbank study sequence accession numbers for this study. For confidentiality reasons, extended metadata52 are under restricted access for confidentiality reasons and in line with study ethics; requests for access should be directed to corresponding authors and specifically for Public Health England data, to the Public Health England office of data release (https://www.gov.uk/government/publications/accessing-public-health-england-data/about-the-phe-odr-and-accessing-data) with an estimated 60 working days turnaround time. Processed metadata generated for figures in this study are provided in the Source Data file. Source data are provided with this paper.

Code availability

Custom code used in this analysis is available at https://github.com/COG-UK/camb-uni-phylo/. Please direct further queries to the corresponding authors.

Competing interests

R.H. is an employee of AstraZeneca AB. The remaining authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Dinesh Aggarwal, Ben Warne.

These authors jointly supervised this work: Michael P. Weekes, Chris Illingworth, Ewan M. Harrison, Nicholas J. Matheson, Ian G. Goodfellow.

Lists of authors and their affiliations appear at the end of the paper.

Contributor Information

Dinesh Aggarwal, Email: dinesh.aggarwal@nhs.net.

Ewan M. Harrison, Email: eh6@sanger.ac.uk

Nicholas J. Matheson, Email: njm25@cam.ac.uk

Ian G. Goodfellow, Email: ig299@cam.ac.uk

The Cambridge Covid-19 testing Centre:

Alexandra Orton, Julie Douthwaite, Steve Rees, Christopher Brown, Roger Clark, Daniel R. Jones, Fred Kuenzi, Jennifer Rankin, and Ian Waddell

University of Cambridge Asymptomatic COVID-19 Screening Programme Consortium:

Patrick Maxwell, Nicholas Matheson, Chris Abell, Vickie Braithwaite, Craig Brierley, Jon Crowcroft, Aastha Dahal, Kathryn Faulkner, Michael Glover, Ian Goodfellow, Jane Greatorex, Laura James, Paul Lehner, Ian Leslie, Kathleen Liddell, Ben Margolis, Sally Morgan, Linda Sheridan, Sally Valletta, Anna Vignoles, Martin Vinnell, Mark Wills, Sarah Hilborne, Sarah Berry, Mahin Bagheri Kahkeshi, Dawn Hancock, Jennifer Winster, Jessica Enright, Richard Samworth, Vijay Samtani, Gabriela Ahmadi-Assalemi, Tom Feather, Robin Goodall, Steve Hoensch, Dean Johnson, Martin Hunt, Nick Mathieson, Katya Nikitina, Zara Sheldrake, Martin Keen, Aris Sato, David Connor, Jonathan Tolhurst, Jack Williman, Victoria Hollamby, Sinead Jordan, Tania Fatseas, Peter Taylor, Christine Georgiou, Michelle Caspersz, Claire McNulty, Richard Davies, Rebecca Clarke, Darius Danaei, Rory Dyer, Rob Glew, Oliver Lambson, Karen Gibbs, Barbara Mozdzen, Gabor Raub, Asako Radecki, Phil White, Robert Hughes, Lucie Gransden, Matt Ceaser, Robert Sing, Karl Wilson, Ajith Parlikad, Maharshi Dhada, Tom Ridgman, Diane Mungovan, Steve Matthews, Paul Searle, John Mills, Andy Neely, Robert Henderson, Edna Murphy, Matthew Russell, Anthony Freeling, Steve Poppitt, Jo Tynan, James Knapton, Filippo Marchetti, Daniela De Angelis, Theresa Feltwell, Nazreen F. Hadjirin, William L. Hamilton, Aminu Jahun, Malte Pinckert, Ashley Shaw, Afzal Chaudhry, Nicholas M. Brown, Lenette Mactavous, Sophie Hannan, Aleksandra Hosaja, Clare Leong, Jo Wright, Natalie Quinnell, Chris Workman, Mark Ferris, Giles Wright, and Elizabeth Wright

The COVID-19 Genomics UK (COG-UK) Consortium:

Dinesh Aggarwal, Ellena Brooks, Alessandro M. Carabelli, Carol M. Churcher, Katerina Galai, Sophia T. Girgis, Ravi K. Gupta, Catherine Ludden, Georgina M. McManus, Sophie Palmer, Sharon J. Peacock, Kim S. Smith, Elias Allara, David Bibby, Chloe Bishop, Andrew Bosworth, Daniel Bradshaw, Vicki Chalker, Meera Chand, Gavin Dabrera, Nicholas Ellaby, Eileen Gallagher, Natalie Groves, Ian Harrison, Hassan Hartman, Richard Hopes, Jonathan Hubb, Stephanie Hutchings, Angie Lackenby, Juan Ledesma, David Lee, Nikos Manesis, Carmen Manso, Tamyo Mbisa, Shahjahan Miah, Peter Muir, Husam Osman, Vineet Patel, Clare Pearson, Steven Platt, Hannah M. Pymont, Mary Ramsay, Esther Robinson, Ulf Schaefer, Alicia Thornton, Katherine A. Twohig, Ian B. Vipond, David Williams, William L. Hamilton, Louise Aigrain, Alex Alderton, Roberto Amato, Cristina V. Ariani, Jeff Barrett, Andrew R. Bassett, Mathew A. Beale, Charlotte Beaver, Katherine L. Bellis, Emma Betteridge, James Bonfield, Iraad F. Bronner, Michael H. S. Chapman, John Danesh, Robert Davies, Matthew J. Dorman, Eleanor Drury, Jillian Durham, Ben W. Farr, Luke Foulser, Sonia Goncalves, Scott Goodwin, Marina Gourtovaia, David K. Jackson, Keith James, Dorota Jamrozy, Ian Johnston, Leanne Kane, Sally Kay, Jon-Paul Keatley, Dominic Kwiatkowski, Cordelia F. Langford, Mara Lawniczak, Stefanie V. Lensing, Steven Leonard, Laura Letchford, Kevin Lewis, Jennifier Liddle, Rich Livett, Stephanie Lo, Alex Makunin, Inigo Martincorena, Shane McCarthy, Samantha McGuigan, Robin J. Moll, Rachel Nelson, Karen Oliver, Steve Palmer, Naomi R. Park, Minal Patel, Liam Prestwood, Christoph Puethe, Michael A. Quail, Diana Rajan, Shavanthi Rajatileka, Nicholas M. Redshaw, Carol Scott, Lesley Shirley, John Sillitoe, Scott A. J. Thurston, Gerry Tonkin-Hill, Jaime M. Tovar-Corona, Danni Weldon, Andrew Whitwham, Myra Hosmillo, Stephen W. Attwood, Louis du Plessis, Marina Escalera Zamudio, Sarah Francois, Bernardo Gutierrez, Moritz U. G. Kraemer, Jayna Raghwani, Tetyana I. Vasylyeva, Alex E. Zarebski, Nabil-Fareed Alikhan, Alp Aydin, David J. Baker, Leonardo de Oliveira Martins, Gemma L. Kay, Thanh Le-Viet, Alison E. Mather, Lizzie Meadows, Justin O’Grady, Steven Rudder, Alexander J. Trotter, Chris J. Illingworth, Chris Jackson, Elihu Aranday-Cortes, Patawee Asamaphan, Alice Broos, Stephen N. Carmichael, Ana da Silva Filipe, Joseph Hughes, Natasha G. Jesudason, Natasha Johnson, Kathy K. Li, Daniel Mair, Jenna Nichols, Seema Nickbakhsh, Marc O. Niebel, Kyriaki Nomikou, Richard J. Orton, David L. Robertson, Rajiv N. Shah, James G. Shepherd, Joshua B. Singer, Igor Starinskij, Emma C. Thomson, Lily Tong, Sreenu Vattipally, Amy Ash, Cherian Koshy, Nick Cortes, Stephen Kidd, Jessica Lynch, Nathan Moore, Matilde Mori, Emma Wise, Tanya Curran, Derek J. Fairley, James P. McKenna, Helen Adams, David Bonsall, Christophe Fraser, Tanya Golubchik, Benjamin J. Cogger, Mohammed O. Hassan-Ibrahim, Cassandra S. Malone, Nicola Reynolds, Michelle Wantoch, Safiah Afifi, Robert Beer, Michaela John, Joshua Maksimovic, Kathryn McCluggage, Sian Morgan, Karla Spellman, Catherine Bresner, Thomas R. Connor, William Fuller, Martyn Guest, Huw Gulliver, Christine Kitchen, Angela Marchbank, Ian Merrick, Robert Munn, Anna Price, Joel Southgate, Trudy Workman, Amita Patel, Luke B. Snell, Rahul Batra, Themoula Charalampous, Jonathan Edgeworth, Gaia Nebbia, Angela H. Beckett, Samuel C. Robson, David M. Aanensen, Khalil Abudahab, Mirko Menegazzo, Ben E. W. Taylor, Anthony P. Underwood, Corin A. Yeats, Louise Berry, Tim Boswell, Gemma Clark, Vicki M. Fleming, Hannah C. Howson-Wells, Carl Jones, Amelia Joseph, Manjinder Khakh, Michelle M. Lister, Wendy Smith, Iona Willingham, Paul Bird, Karlie Fallon, Thomas Helmer, Christopher Holmes, Julian Tang, Victoria Blakey, Sharon Campbell, Veena Raviprakash, Nicola Sheriff, Lesley-Anne Williams, Matthew Carlile, Johnny Debebe, Nadine Holmes, Matthew W. Loose, Christopher Moore, Fei Sang, Victoria Wright, Francesc Coll, Gilberto Betancor, Adrian W. Signell, Harry D. Wilson, Thomas Davis, Sahar Eldirdiri, Anita Kenyon, M. Estee Torok, Hannah Lowe, Samuel Moses, Luke Bedford, Jonathan Moore, Susanne Stonehouse, Ali R. Awan, Chloe L. Fisher, John BoYes, Laura Atkinson, Judith Breuer, Julianne R. Brown, Kathryn A. Harris, Jack C. D. Lee, Divya Shah, Nathaniel Storey, Flavia Flaviani, Adela Alcolea-Medina, Gabrielle Vernet, Rebecca Williams, Michael R. Chapman, Wendy Chatterton, Judith Heaney, Lisa J. Levett, Monika Pusok, Li Xu-McCrae, Matthew Bashton, Darren L. Smith, Gregory R. Young, Frances Bolt, Alison Cox, Alison Holmes, Pinglawathee Madona, Siddharth Mookerjee, James Price, Paul A. Randell, Olivia Boyd, Fabricia F. Nascimento, Lily Geidelberg, Rob Johnson, David Jorgensen, Manon Ragonnet-Cronin, Aileen Rowan, Igor Siveroni, Graham P. Taylor, Erik M. Volz, Katherine L. Smollett, Nicholas J. Loman, Claire McMurray, Alan McNally, Sam Nicholls, Radoslaw Poplawski, Joshua Quick, Will Rowe, Joanne Stockton, Rocio T. Martinez Nunez, Cassie Breen, Angela Cowell, Jenifer Mason, Elaine O’Toole, Trevor I. Robinson, Joanne Watts, Graciela Sluga, Shazaad S. Y. Ahmad, Ryan P. George, Nicholas W. Machin, Fenella Halstead, Wendy Hogsden, Venkat Sivaprakasam, Holli Carden, Antony D. Hale, Katherine L. Harper, Louissa R. Macfarlane-Smith, Shirelle Burton-Fanning, Jennifer Collins, Gary Eltringham, Brendan AI. Payne, Yusri Taha, Sheila Waugh, Sarah O’Brien, Steven Rushton, Rachel Blacow, Amanda Bradley, Alasdair Maclean, Guy Mollett, Rebecca Dewar, Martin P. McHugh, Kate E. Templeton, Elizabeth Wastenge, Lindsay Coupland, Samir Dervisevic, Emma J. Meader, Rachael Stanley, Louise Smith, Edward Barton, Clive Graham, Debra Padgett, Garren Scott, Jane Greenaway, Emma Swindells, Clare M. McCann, Andrew Nelson, Wen C. Yew, Monique Andersson, Derrick Crook, David Eyre, Anita Justice, Timothy Peto, Nichola Duckworth, Tim J. Sloan, Sarah Walsh, Kelly Bicknell, Anoop J. Chauhan, Scott Elliott, Sharon Glaysher, Robert Impey, Allyson Lloyd, Sarah Wyllie, Nick Levene, Lynn Monaghan, Declan T. Bradley, Tim Wyatt, Martin D. Curran, Surendra Parmar, Matthew T. G. Holden, Sharif Shaaban, Alexander Adams, Hibo Asad, Alec Birchley, Matthew Bull, Jason Coombes, Sally Corden, Simon Cottrell, Noel Craine, Michelle Cronin, Alisha Davies, Elen De Lacy, Fatima Downing, Sue Edwards, Johnathan M. Evans, Laia Fina, Amy Gaskin, Bree Gatica-Wilcox, Laura Gifford, Lauren Gilbert, Lee Graham, David Heyburn, Ember Hilvers, Robin Howe, Hannah Jones, Rachel Jones, Sophie Jones, Sara Kumziene-SummerhaYes, Caoimhe McKerr, Catherine Moore, Mari Morgan, Nicole Pacchiarini, Malorie Perry, Amy Plimmer, Sara Rey, Giri Shankar, Sarah Taylor, Joanne Watkins, Chris Williams, Anna Casey, Liz Ratcliffe, Erwan Acheson, Zoltan Molnar, David A. Simpson, Thomas Thompson, Cressida Auckland, Sian Ellard, Christopher R. Jones, Bridget A. Knight, Jane A. H. Masoli, Tanzina Haque, Jennifer Hart, Dianne Irish-Tavares, Tabitha W. Mahungu, Eric Witele, Ashok Dadrah, Melisa L. Fenton, Tranprit Saluja, Amanda Symmonds, Yann Bourgeois, Garry P. Scarlett, Kate Cook, Hannah Dent, Christopher Fearn, Salman Goudarzi, Katie F. Loveson, Hannah Paul, Cariad Evans, Kate Johnson, David G. Partridge, Mohammad Raza, Paul Baker, Stephen Bonner, Sarah Essex, Steven Liggett, Ronan A. Lyons, Adhyana I. K. Mahanama, Kordo Saeed, Buddhini Samaraweera, Siona Silveira, Eleri Wilson-Davies, P. Emanuela, Nadua Bayzid, Marius Cotic, Leah Ensell, John A. Hartley, Riaz Jannoo, Angeliki Karamani, Mark Kristiansen, Helen L. Lowe, Sunando Roy, Adam P. Westhorpe, Rachel J. Williams, Charlotte A. Williams, Sarah Jeremiah, Jacqui A. Prieto, Lisa Berry, Dimitris Grammatopoulos, Katie Jones, Sarojini Pandey, Andrew Beggs, Alex Richter, Fiona Ashcroft, Angus Best, Liam Crawford, Nicola Cumley, Megan Mayhew, Oliver Megram, Jeremy Mirza, Emma Moles-Garcia, Benita Percival, Giselda Bucca, Andrew R. Hesketh, Colin P. Smith, Rose K. Davidson, Carlos E. Balcazar, Michael D. Gallagher, Áine O’Toole, Andrew Rambaut, Stefan Rooke, Thomas D. Stanton, Thomas Williams, Kathleen A. Williamson, Claire M. Bewshea, Audrey Farbos, James W. Harrison, Aaron R. Jeffries, Robin Manley, Stephen L. Michell, Michelle L. Michelsen, Christine M. Sambles, David J. Studholme, Ben Temperton, Joanna Warwick-Dugdale, Alistair C. Darby, Richard Eccles, Matthew Gemmell, Richard Gregory, Sam T. Haldenby, Julian A. Hiscox, Margaret Hughes, Miren Iturriza-Gomara, Kathryn A. Jackson, Anita O. Lucaci, Charlotte Nelson, Steve Paterson, Lucille Rainbow, Lance Turtle, Edith E. Vamos, Hermione J. Webster, Mark Whitehead, Claudia Wierzbicki, Adrienn Angyal, Rebecca Brown, Thushan I. de Silva, Timothy M. Freeman, Marta Gallis, Luke R. Green, Danielle C. Groves, Alexander J. Keeley, Benjamin B. Lindsey, Stavroula F. Louka, Matthew D. Parker, Paul J. Parsons, Nikki Smith, Rachel M. Tucker, Dennis Wang, Max Whiteley, Matthew Wyles, Peijun Zhang, Mohammad T. Alam, Laura Baxter, Hannah E. Bridgewater, Paul E. Brown, Jeffrey K. J. Cheng, Chrystala Constantinidou, Lucy R. Frost, Sascha Ott, Richard Stark, Grace Taylor-Joyce, Meera Unnikrishnan, Alberto C. Cerda, Tammy V. Merrill, Rebekah E. Wilson, Jonathan Ball, Joseph G. Chappell, Patrick C. McClure, Theocharis Tsoleridis, David Buck, Mariateresa de Cesare, Angie Green, George MacIntyre-Cockett, John A. Todd, Amy Trebes, Rory N. Gunson, Claire Cormie, Joana Dias, Sally Forrest, Harmeet K. Gill, Ellen E. Higginson, Leanne M. Kermack, Mailis Maes, Chris Ruis, Sushmita Sridhar, and Jamie Young

Supplementary information

The online version contains supplementary material available at 10.1038/s41467-021-27942-w.

References

  • 1.Zhou F, et al. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study. Lancet. 2020;395:1054–1062. doi: 10.1016/S0140-6736(20)30566-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Yang J, et al. Prevalence of comorbidities and its effects in patients infected with SARS-CoV-2: a systematic review and meta-analysis. Int J. Infect. Dis. 2020;94:91–95. doi: 10.1016/j.ijid.2020.03.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Amoolya Vusirikala, H. W. et al. Gayatri Amirthalingam. Seroprevalence of SARS-CoV-2 Antibodies in University Students: cross-sectional study, December 2020, England. (2021). [DOI] [PMC free article] [PubMed]
  • 4.Yamey G, Walensky RP. Covid-19: re-opening universities is high risk. BMJ. 2020;370:m3365. doi: 10.1136/bmj.m3365. [DOI] [PubMed] [Google Scholar]
  • 5.Group, C.s.T.F. Risks associated with the reopening of education settings in September. 2021 (2020).
  • 6.Sahu P. Closure of universities due to coronavirus disease 2019 (COVID-19): impact on education and mental health of students and academic staff. Cureus. 2020;12:e7541. doi: 10.7759/cureus.7541. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Education, T.a.F.G.o.H.E.F. Principles for managing SARS-CoV-2 transmission associated with higher education - 3 September 2020. Vol. 2021 (2020).
  • 8.Meredith LW, et al. Rapid implementation of SARS-CoV-2 sequencing to investigate cases of health-care associated COVID-19: a prospective genomic surveillance study. Lancet Infect. Dis. 2020;20:1263–1271. doi: 10.1016/S1473-3099(20)30562-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Moreno GK, et al. Severe acute respiratory syndrome coronavirus 2 transmission in intercollegiate athletics not fully mitigated with daily antigen testing. Clin Infect Dis. 2021;73:S45–S53. doi: 10.1093/cid/ciab343. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.consortiumcontact@cogconsortium.uk, C.-G.U. An integrated national scale SARS-CoV-2 genomic surveillance network. Lancet Microbe. 2020;1:e99–e100. doi: 10.1016/S2666-5247(20)30054-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Ben Warne, J. E. et al. Feasibility and efficacy of mass testing for SARS-CoV-2 in a UK university using swab pooling and PCR. Preprint at https://www.researchsquare.com/article/rs-520626/v1 (2021).
  • 12.Andrew Rambaut, N. L. et al., On behalf of COVID-19 Genomics Consortium UK (CoG-UK). Preliminary genomic characterisation of an emergent SARS-CoV-2 lineage in the UK defined by a novel set of spike mutations. Vol. 2021 (2020).
  • 13.Hodcroft, E. B. et al. Emergence and spread of a SARS-CoV-2 variant through Europe in the summer of 2020. Preprint at https://www.medrxiv.org/content/10.1101/2020.10.25.20219063v3 (2020). [DOI] [PubMed]
  • 14.Volz E, et al. Assessing transmissibility of SARS-CoV-2 lineage B.1.1.7 in England. Nature. 2021;593:266–269. doi: 10.1038/s41586-021-03470-x. [DOI] [PubMed] [Google Scholar]
  • 15.Illingworth, C. J. R. et al. A2B-COVID: a method for evaluating potential SARS-CoV-2 transmission events. Preprint at https://www.medrxiv.org/content/10.1101/2020.10.26.20219642v2 (2020). [DOI] [PMC free article] [PubMed]
  • 16.Stadler T, Kuhnert D, Bonhoeffer S, Drummond AJ. Birth-death skyline plot reveals temporal changes of epidemic spread in HIV and hepatitis C virus (HCV) Proc. Natl Acad. Sci. USA. 2013;110:228–233. doi: 10.1073/pnas.1207965110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Muller N, et al. Severe acute respiratory syndrome coronavirus 2 outbreak related to a Nightclub, Germany, 2020. Emerg. Infect. Dis. 2020;27:645–648. doi: 10.3201/eid2702.204443. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Choi, H., Cho, W., Kim, M. H. & Hur, J. Y. Public health emergency and crisis management: case study of SARS-CoV-2 outbreak. Int. J. Environ. Res. Public Health17, 3984 (2020). [DOI] [PMC free article] [PubMed]
  • 19.Group, T.C.s.T.a.f. Children’s Task and Finish Group: Paper on higher education settings. Vol. 2021 (2021).
  • 20.du Plessis L, et al. Establishment and lineage dynamics of the SARS-CoV-2 epidemic in the UK. Science. 2021;371:708–712. doi: 10.1126/science.abf2946. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Li R, et al. Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2) Science. 2020;368:489–493. doi: 10.1126/science.abb3221. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Smith LE, et al. Adherence to the test, trace, and isolate system in the UK: results from 37 nationally representative surveys. BMJ. 2021;372:n608. doi: 10.1136/bmj.n608. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Seemann T, et al. Tracking the COVID-19 pandemic in Australia using genomics. Nat. Commun. 2020;11:4376. doi: 10.1038/s41467-020-18314-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Cambridge, U.o. How the University and Colleges work. https://www.cam.ac.uk/about-the-university/how-the-university-and-colleges-work (2013).
  • 25.Statistics, O.f.N. 2011 ONS Census. https://www.cambridge.gov.uk/media/1170/census-2011-cambridge-data.pdf (2011).
  • 26.Statistics, O.f.N. Cambridgeshire and Peterborough Population Overview Report. https://cambridgeshireinsight.org.uk/population/ (2019).
  • 27.Rivett, L. et al. Screening of healthcare workers for SARS-CoV-2 highlights the role of asymptomatic carriage in COVID-19 transmission. Elife9, e58728 (2020). [DOI] [PMC free article] [PubMed]
  • 28.Quick, J. nCoV-2019 sequencing protocol v3 (LoCost) V.3. Vol. 2021 (2020).
  • 29.Bonsall, D. et al. A comprehensive genomics solution for HIV surveillance and clinical monitoring in low-income settings. J. Clin. Microbiol.58, e00382 (2020). [DOI] [PMC free article] [PubMed]
  • 30.Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34:3094–3100. doi: 10.1093/bioinformatics/bty191. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Nicholls SM, et al. CLIMB-COVID: continuous integration supporting decentralised sequencing for SARS-CoV-2 genomic surveillance. Genome Biol. 2021;22:196. doi: 10.1186/s13059-021-02395-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Connor TR, et al. CLIMB (the Cloud Infrastructure for Microbial Bioinformatics): an online resource for the medical microbiology community. Micro. Genom. 2016;2:e000086. doi: 10.1099/mgen.0.000086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Minh BQ, et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol. 2020;37:1530–1534. doi: 10.1093/molbev/msaa015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Tavare S, et al. Some probabilistic and statistical problems in the analysis of DNA sequences. Lect. math. life sci. 1986;17:57–86. [Google Scholar]
  • 35.Kalyaanamoorthy S, Minh BQ, Wong TKF, von Haeseler A, Jermiin LS. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Methods. 2017;14:587–589. doi: 10.1038/nmeth.4285. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Minh BQ, Nguyen MA, von Haeseler A. Ultrafast approximation for phylogenetic bootstrap. Mol. Biol. Evol. 2013;30:1188–1195. doi: 10.1093/molbev/mst024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Rambaut A, Lam TT, Max Carvalho L, Pybus OG. Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen) Virus Evol. 2016;2:vew007. doi: 10.1093/ve/vew007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Argimon S, et al. Microreact: visualizing and sharing data for genomic epidemiology and phylogeography. Micro. Genom. 2016;2:e000093. doi: 10.1099/mgen.0.000093. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Rambaut A, et al. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat. Microbiol. 2020;5:1403–1407. doi: 10.1038/s41564-020-0770-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Suchard MA, et al. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 2018;4:vey016. doi: 10.1093/ve/vey016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Ghafari, M. et al. Purifying selection determines the short-term time dependency of evolutionary rates in SARS-CoV-2 and pH1N1 influenza. Mol. Biol. Evol. msac009. 10.1093/molbev/msac009 (2022). [DOI] [PMC free article] [PubMed]
  • 42.Vaughan, T. G., Sciré, J., Nadeau, S. A. & Stadler, T. Estimates of outbreak-specific SARS-CoV-2 epidemiological parameters from genomic data. Preprint at https://www.medrxiv.org/content/10.1101/2020.09.12.20193284v1 (2020). [DOI] [PMC free article] [PubMed]
  • 43.Nadeau, S. A., Vaughan, T. G., Scire, J., Huisman, J. S. & Stadler, T. The origin and early spread of SARS-CoV-2 in Europe. Proc. Natl. Acad. Sci. USA118, e2012008118 (2021). [DOI] [PMC free article] [PubMed]
  • 44.Geoghegan JL, et al. Genomic epidemiology reveals transmission patterns and dynamics of SARS-CoV-2 in Aotearoa New Zealand. Nat. Commun. 2020;11:6351. doi: 10.1038/s41467-020-20235-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Rambaut A, Drummond AJ, Xie D, Baele G, Suchard MA. Posterior summarization in Bayesian phylogenetics using tracer 1.7. Syst. Biol. 2018;67:901–904. doi: 10.1093/sysbio/syy032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Bouckaert R, et al. BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis. PLoS Comput. Biol. 2019;15:e1006650. doi: 10.1371/journal.pcbi.1006650. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Drummond AJ, Ho SY, Phillips MJ, Rambaut A. Relaxed phylogenetics and dating with confidence. PLoS Biol. 2006;4:e88. doi: 10.1371/journal.pbio.0040088. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Plummer M, Best N, Cowles K, Vines K. CODA: convergence diagnosis and output analysis for MCMC. R. N. 2006;6:7–11. [Google Scholar]
  • 49.Shu, Y. & McCauley, J. GISAID: Global initiative on sharing all influenza data - from vision to reality. Euro Surveill.22, 30494 (2017). [DOI] [PMC free article] [PubMed]
  • 50.Toribio AL, et al. European Nucleotide Archive in 2016. Nucleic Acids Res. 2017;45:D32–D36. doi: 10.1093/nar/gkw1106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Aggarwal, D. et al. Genomic epidemiology of SARS-CoV-2 in a UK university identifies dynamics of transmission. Github 10.5281/zenodo.5643354 (2021). [DOI] [PMC free article] [PubMed]
  • 52.Griffiths, E. J. T. et al. The PHA4GE SARS-CoV-2 contextual data specification for open genomic epidemiology. Preprint at https://www.preprints.org/manuscript/202008.0220/v1 (2020).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Peer Review File (488.2KB, pdf)
41467_2021_27942_MOESM3_ESM.pdf (66.9KB, pdf)

Description of Additional Supplementary Files

Supplementary Data 1 (54KB, xlsx)
Supplementary Data 2 (9.7KB, xlsx)
Reporting Summary (4MB, pdf)

Data Availability Statement

The Assembled/consensus genomes generated in this study have been deposited in the GISAID49 database and raw reads are available from European Nucleotide Archive (ENA)50 under accession PRJEB37886. Pooled sample sequence raw reads and assembled sequences are deposited in the NCBI Sequence Read Archive Database (SRA; https://www.ncbi.nlm.nih.gov/sra) under the BioProject accession number PRJNA779279.

ENA and Genbank accession codes for individual sequences used in this study are available in supplementary materials (Supplementary Data 1 and 2). All genomes, phylogenetic trees and basic metadata are available from the COG-UK consortium website (https://www.cogconsortium.uk/data). Limited public metadata, analysis files, and processed genomic data for this work are available from GitHub at https://github.com/COG-UK/camb-uni-phylo/ (10.5281/zenodo.564335451), which also contains a list of ENA and Genbank study sequence accession numbers for this study. For confidentiality reasons, extended metadata52 are under restricted access for confidentiality reasons and in line with study ethics; requests for access should be directed to corresponding authors and specifically for Public Health England data, to the Public Health England office of data release (https://www.gov.uk/government/publications/accessing-public-health-england-data/about-the-phe-odr-and-accessing-data) with an estimated 60 working days turnaround time. Processed metadata generated for figures in this study are provided in the Source Data file. Source data are provided with this paper.

Custom code used in this analysis is available at https://github.com/COG-UK/camb-uni-phylo/. Please direct further queries to the corresponding authors.


Articles from Nature Communications are provided here courtesy of Nature Publishing Group

RESOURCES