Version Changes
Revised. Amendments from Version 1
The text of the article has been edited in response to reviewer comments. For example:
In the abstract and elsewhere in the article: we have changed the term “symbiosis” to the plural “symbioses” where appropriate.
In the “ Genomics of symbiosis” section, we have added an additional reference as recommended.
We have rephrased slightly and added a reference for the text concerning “create biodiversity hotspots which house upwards of 25% of all described ocean species”.
In the legend for Figure 1 we have added an explanation for using red and green fonts to indicate the taxa with primary plastids that subsequently spread to other taxa.
“In the section “The ASG project will transform symbiosis research”, the third paragraph here (starting with “The hub partners…”) needs elaboration.” We have added a hyperlink to a description of the hub partners to clarify the intention here.
In Table 1 we have replaced the genus name “Symbiodinium” with the family Symbiodiniaceae referring to the family.
We have rephrased and provided references for the text “Many of the fish that throng around coral reefs are open spawners, …”.
Also in this paragraph, we have corrected the word provides to ‘provide’ in “Much like a healthy reef, our hope is that the high-quality genomes we produce will generate the chatter that attracts new researchers and provides a foundation for growth of fundamental …”
Abstract
We present the Aquatic Symbiosis Genomics Project, a global collaboration to generate high quality genome sequences for a wide range of eukaryotes and their microbial symbionts. Launched under the Symbiosis in Aquatic Systems Initiative of the Gordon and Betty Moore Foundation, the ASG Project brings together researchers from across the globe who hope to use these reference genomes to augment and extend their analyses of the dynamics, mechanisms and environmental importance of symbioses. Applying large-scale, high-throughput sequencing and assembly technologies, the ASG collaboration will assemble and annotate the genomes of 500 symbiotic organisms – both the “hosts” and the microbial symbionts with which they associate. These data will be released openly to benefit all who work on symbioses, from conservation geneticists to those interested in the origin of the eukaryotic cell.
Keywords: Symbiosis, Marine, Freshwater, Genome Sequencing, Collaboration, Open Science
Disclaimer
The views expressed in this article are those of the author(s). Publication in Wellcome Open Research does not imply endorsement by Wellcome.
The genomics of symbiosis
Symbiosis, the living together of distinct organisms ( Archibald, 2014; Oulhen et al., 2016), describes a spectrum of relationships from mutualistic to parasitic, and from obligate to temporary. Symbiosis has been and is fundamental to the evolution of life on Earth, from the deep origins of the eukaryotic cell and photosynthetic eukaryotes, through to the recent emergence of new partnerships. The power of symbiosis arises from the ability of the joint organism to draw from the independent, billion-year evolutionary histories of both partners. Symbiosis is a fact of life – it has arisen many, many times and new symbioses are constantly evolving ( Figure 1). In this era of rapid climate change and biodiversity loss, many keystone symbiotic systems are threatened, and their loss imperils the ecosystems they support.
Figure 1. The phylogenetic diversity of eukaryotic symbioses.
Symbiotic taxa, and Aquatic Symbiosis Genomics target species, are found across the diversity of the eukaryotic Tree of Life.
Taxa highlighted with blue boxes include ASG targets. Within the tree, the small cartoons indicate the major event of plastid acquisition through symbiosis with a cyanobacterium (in the Archaeplastida; blue cell engulfed) and the several events of secondary and tertiary plastid acquisition in other lineages. The taxa containing primary plastids are shown in green and red. Illustration by John Archibald and Mark Blaxter.
Well-known mutualist symbioses permit colonisation of otherwise inaccessible habitats, are critical to ecosystem functioning, and support marine and freshwater diversity. For example, coral reefs, built through a photosymbiotic association between cnidarians and dinoflagellate algae ( LaJeunesse et al., 2018; Weis, 2019), create biodiversity hotspots which house upwards of 25% of all described ocean species ( Fisher et al., 2015). The dominant animals colonising deep-sea hydrothermal vents are nutritionally dependent on chemosymbiotic associations with bacteria ( Roeselers & Newton, 2012), allowing them to thrive in the food-limited dark ocean. For these symbioses, the biological fitness consequences are largely understood, but in many less well-known symbioses, such as those between sponges and their bacterial collaborators, or partnerships in the diverse world of single celled eukaryotes, the basis of the relationships are not known in any detail.
The aquatic symbiosis genomics project will transform symbiosis research
The Gordon and Betty Moore Foundation has created a major funding initiative focused on investigating the biology of symbiosis in marine and freshwater ecosystems (see Symbiosis in Aquatic Systems Initiative). To support this global initiative, the Aquatic Symbiosis Genomics project (ASG; see Aquatic Symbiosis Genomics Project – Wellcome Sanger Institute) plans to generate high-quality genome sequences from a wide range of symbiotic systems. Our focus is on symbioses involving at least one eukaryotic partner, and where there is likely to be co-evolving interplay between the species involved.
Like a symbiotic organism, the ASG project is more than the simple sum of its parts. ASG will merge the decades of ecological, evolutionary, taxonomic, and experimental expertise of researchers from diverse backgrounds with the decades of genomics experience of the Wellcome Sanger Institute. ASG works on a hub and spokes model, where communities of researchers nucleated on specific questions and/or species systems have come together as hubs to propose sets of taxa for sequencing ( Table 1). These (currently) total ~450 distinct symbiotic organisms from the open ocean, the deep sea, coastal, littoral, and freshwater ecosystems, which are expected to include over 1000 nominal species of hosts and symbionts. The ASG target list includes species representing many phyla of animals, protists, algae and fungi, and encompasses ancient and recently-evolved partnerships.
Table 1. Aquatic symbiosis genomics project hubs.
| Lead researcher * | Project Title (short) | Major taxa represented | |
|---|---|---|---|
| Hosts | Symbionts | ||
| Archibald | New symbioses in single-celled
eukaryotes |
Amoebozoa, Dinophyceae,
Diplonemea (Euglenozoa), Haptophyta, Ochrophyta |
Bacteria, Kinetoplastea, Ochrophyta |
|
Beinart, Petersen,
Sigwart |
Molluscan symbioses | Mollusca | Arthropoda, Bacteria, Chlorophyta,
Cnidaria, Dinophyceae, Platyhelminthes, Florideophyceae |
|
Dawson, Sutherland,
Thompson |
Pelagic symbioses | Acoela, Ctenophora, Cnidaria,
Tunicata |
Bacteria, Chlorophyta, Dinophyceae, other
Alveolata |
| Hentschel | Sponge symbioses | Porifera | Bacteria, Archaea, Viruses, Symbiodiniaceae
(Dinophyceae) and others |
| Keeling | Symbiosis in ciliates | Ciliophora | Archaea, Bacteria, Chlorophyta, Ciliophora,
Dinophyceae |
| Lopez | Metazoan photosymbioses | Acoela, Cnidaria, Mollusca,
Porifera, Tunicata |
Bacteria, Chlorophyta, Cnidaria, Dinophyceae,
Haptophyta, Myzozoa |
| Martín-Durán | Annelid chemosymbioses | Annelida | Bacteria, Archaea |
| Simakov | Cephalopod symbioses | Mollusca | Bacteria, Archaea |
| Sweet | Coral symbioses | Cnidaria | Symbiodiniaceae (Dinophyceae) |
| Talbot | Marine lichens | Fungi | Bacteria, Chlorophyta, ascomycete Fungi,
Ochrophyta |
* see author list for affiliations.
The hub partners have defined the major scientific questions they wish to explore, and will source and identify specimens that will deliver answers. ASG follows an ethical code of sampling practice, avoiding overcollection and respecting local and international laws and protocols, especially as ASG will be sampling from endangered ecosystems and in some cases endangered species. The project participants are fully committed to the Convention on Biological Diversity Nagoya Protocols on Access and Benefit Sharing, and only samples where express permission has been obtained will be sourced and sequenced. Samples may come from the wild, from mesocosms and aquaria, from explant lab cultures or from culture collections.
Genome sequencing and assembly will be delivered by the Tree of Life programme at the Sanger Institute using pipelines being developed for the Darwin Tree of Life and other major biodiversity genomics projects. Genomes will be assembled, annotated and released openly through the European Bioinformatics Institute ( EMBL-EBI).
Sequencing symbionts: from sample to openly accessible genome assembly
Each ASG Hub ( Table 1) has defined a set of taxa that it will sample for sequencing. We will sequence from single eukaryotic host specimens or clonal cultures rather than bulk samples whenever possible. While this can limit the mass of DNA and RNA available for sequencing, it has the very strong benefit of reducing allelic sequence complexity and enabling assembly. Importantly, we do not require that the symbiotic partners are separated before sequencing, as we will separate the host and symbiont genomes bioinformatically during assembly ( Challis et al., 2020).
Each sample is formally identified and associated with rich metadata describing its collection location and other environmental features. We collate and validate these metadata through the COPO biodiversity data brokering system. Samples are shipped to the Sanger Institute for long DNA and RNA extraction and sequencing, with particular focus on low-input methods. We are generating a combination of long read and long range genomic data. For long reads we primarily use the Pacific Biosciences Sequel IIe circular consensus sequencing approach to generate high fidelity (HiFi) reads in the 15 to 20 kilobase range, and include Oxford Nanopore Technologies long reads where needed. For long range data we use chromatin conformation capture sequencing (known as Hi-C). These long range data generate important information that link sequences within chromosomes and organelles in the multi-kilobase to megabase range and will allow us to disentangle genomes from different species. The joint transcriptome of the symbioses will be sampled using RNA-Seq, both on Illumina short read and Pacific Biosciences long read platforms.
We have strong expectations about what we should find in the sequence data, and what we should be assembling, but biology is full of exceptions and surprises and organisms taken from the wild are frequently found in association with other cobionts. Each symbiosis contains a community of genomes that can be viewed as a low complexity metagenome: the “host” genome and the genomes of its organelles (mitochondrion and in some cases plastid), the symbiont genome (which if it is eukaryotic contains one or more organellar genomes) and the genomes of other commensals and cobionts. We separate data into presumed organismal and organellar subsets and assemble each independently. First we identify taxonomically informative marker loci, such as small subunit ribosomal RNAs (organellar 12S, prokaryotic 16S and eukaryotic 18S), cytochrome oxidase I genes, and ribulose-1,5-bisphosphate carboxylase-oxygenase genes, in the HiFi reads and primary assembly. These tell us which taxa are likely to be present and thus which genomes we should expect to assemble. To separate the data we use intrinsic features (GC and tetranucleotide composition, read coverage, coding capacity), sequence similarity to known genomes, and Hi-C linkage information. Binning contigs and their constituent reads into distinct subsets facilitates complete assembly of each organismal and organellar genome ( Challis et al., 2020; Kumar & Blaxter, 2011). We aim to automate this cobiont identification and binning process, as it will be of utility in analyses of all Tree of Life genomes: many specimens harbour parasitic and other cobionts. Given 25- to 30-fold genome coverage in HiFi reads for each symbiont partner, we expect to generate primary assemblies with contig N50s in the multi-megabase range. The Hi-C data are used to scaffold these contigs into near-chromosomal pseudomolecules.
For each symbiotic system we will then curate the assemblies to improve accuracy ( Howe et al., 2021) with particular attention to correct scaffolding of nuclear chromosomes and circularisation of organellar and prokaryotic genomes, and identification of remaining complex and unresolvable repetitive regions (such as ribosomal RNA and centromeric repeats). We aim to achieve or exceed the latest Earth BioGenome Project ( Lewin et al., 2018) assembly standards. Curated assemblies and all raw data will be submitted to the European Nucleotide Archive ( ENA) ( Harrison et al., 2021) and from there to the rest of the International Nucleotide Sequence Database Consortium for immediate open release. The genomes will be annotated using the RNA-Seq transcriptomic data binned by species, and the annotations released openly. We have developed an ASG-specific data portal that collates all of the data generated by the project and promotes analysis. The Aquatic Symbiosis Genomics project relies on engagement and support from the whole of the Tree of Life production genomics team and of many colleagues who are participants in the ten Hubs. Each symbiotic system will be the subject of an open access publication, a Genome Note, that credits the full team that generated the assemblies, from collectors to annotators ( Threlfall & Blaxter, 2021).
Building an aquatic symbiosis genomics community
The ASG project aims to generate a lasting resource in terms of the ~1000 genomes involved in ~500 symbiotic systems. To ensure this resource results in a flourishing ecosystem of postgenomic research, we are building community and expertise through a parallel programme of training and mentoring in genomics and bioinformatics. In collaboration with Wellcome Connecting Science and The Carpentries, the ASG project will deliver intensive and extensive collaborative training and investigative informatic analysis of symbiont genomes, to build collective genomics and bioinformatics capacity in the symbiosis community. Training will include core informatics, coding, and reproducible science, as well as deeper analytical dives into co-evolving genomes, detailed genome annotation, and prediction of the metabolic underpinnings of symbiotic cooperation.
Just as reefs built by corals and their symbiotic algae allow an exuberant and diverse ecology to thrive, the ASG project will build a lasting genomic foundation for flourishing and diverse analyses of symbioses. Many bony coral reef fish species have a pelagic early life history, their larvae spending their first weeks in the open ocean ( Leis & McCormick, 2002). These may be recruited back to the reef because they can hear and smell it: the chatter generated by a healthy reef attracts, recruits, and builds the reef community ( Gordon et al., 2019). Much like a healthy reef, our hope is that the high-quality genomes we produce will generate the chatter that attracts new researchers and provides a foundation for growth of fundamental research on the nature of symbiosis and conservation of habitats where symbioses abound.
Acknowledgements
We thank Jonathan Threlfall for assistance with manuscript editing. This research was funded in part by Wellcome [grant 206194]. For the purpose of Open Access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission.
Funding Statement
This work was supported by Wellcome [206194] and the Gordon and Betty Moore Foundation [GBMF8897].
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
[version 2; peer review: 5 approved, 1 approved with reservations]
Data availability
No data are associated with this article. ASG data will be released openly in the European Nucleotide Archive.
References
- Archibald J: One plus one equals one: symbiosis and the evolution of complex life.Oxford University Press, USA.2014. Reference Source [Google Scholar]
- Challis R, Richards E, Rajan J, et al. : BlobToolKit - interactive quality assessment of genome assemblies. G3 (Bethesda). 2020;10(4):1361–74. 10.1534/g3.119.400908 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fisher R, O’Leary RA, Low-Choy S, et al. : Species richness on coral reefs and the pursuit of convergent global estimates Curr Biol. 2015;25(4):500–505. 10.1016/j.cub.2014.12.022 [DOI] [PubMed] [Google Scholar]
- Gordon TAC, Radford AN, Davidson IK, et al. : Acoustic enrichment can enhance fish community development on degraded coral reef habitat. Nat Commun. 2019;10(1):5414. 10.1038/s41467-019-13186-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Harrison PW, Ahamed A, Aslam R, et al. : The European Nucleotide Archive in 2020. Nucleic Acids Res. 2021;49(D1):D82–85. 10.1093/nar/gkaa1028 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Howe K, Chow W, Collins J, et al. : Significantly improving the quality of genome assemblies through curation. GigaScience. 2021;10(1):giaa153. 10.1093/gigascience/giaa153 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kumar S, Blaxter ML: Simultaneous genome sequencing of symbionts and their hosts. Symbiosis. 2011;55(3):119–26. 10.1007/s13199-012-0154-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- LaJeunesse TC, Parkinson JE, Gabrielson PW, et al. : Systematic revision of symbiodiniaceae highlights the antiquity and diversity of coral endosymbionts. Curr Biol. 2018;28(16):2570–2580.e6. 10.1016/j.cub.2018.07.008 [DOI] [PubMed] [Google Scholar]
- Leis JM, McCormick MI: The biology, behavior, and ecology of the pelagic, larval stage of coral reef fishes. In: Coral Reef Fishes. Elsevier,2002;171–199. 10.1016/B978-012615185-5/50011-6 [DOI] [Google Scholar]
- Lewin HA, Robinson GE, Kress WJ, et al. : Earth biogenome project: sequencing life for the future of life. Proc Natl Acad Sci U S A. 2018;115(17):4325–33. 10.1073/pnas.1720115115 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Oulhen N, Schulz BJ, Carrier TJ: English translation of Heinrich Anton de Bary’s 1878 speech, ‘Die Erscheinung Der Symbiose’ (‘ De La Symbiose’). Symbiosis. 2016.69:131–139. 10.1007/s13199-016-0409-8 [DOI] [Google Scholar]
- Roeselers G, Newton ILG: On the evolutionary ecology of symbioses between chemosynthetic bacteria and bivalves. Appl Microbiol Biotechnol. 2012;94(1):1–10. 10.1007/s00253-011-3819-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Threlfall J, Blaxter M: Launching the Tree of Life Gateway [version 1; peer review: not peer reviewed]. Wellcome Open Res. 2021;6:125. 10.12688/wellcomeopenres.16913.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weis VM: Cell biology of coral symbiosis: foundational study can inform solutions to the coral reef crisis. Integr Comp Biol. 2019;59(4):845–55. 10.1093/icb/icz067 [DOI] [PubMed] [Google Scholar]

