Abstract
Sex chromosomes are critical elements of sexual reproduction in many animal and plant taxa, however they show incredible diversity and rapid turnover even within clades. Here, using a chromosome-level assembly generated with long read sequencing, we report the first evidence for genetic sex determination in cephalopods. We have uncovered a sex chromosome in California two-spot octopus (Octopus bimaculoides) in which males/females show ZZ/ZO karyotypes respectively. We show that the octopus Z chromosome is an evolutionary outlier with respect to divergence and repetitive element content as compared to other chromosomes and that it is present in all coleoid cephalopods that we have examined. Our results suggest that the cephalopod Z chromosome originated between 455 and 248 million years ago and has been conserved to the present, making it the among the oldest conserved animal sex chromosomes known.
Introduction
Octopuses, squids, and cuttlefishes – the coleoid cephalopods – are a remarkable branch in the tree of life, whose members exhibit a repertoire of sophisticated behaviors (1). As a clade, coleoid cephalopods harbor an incredible variety of novel traits including the largest and most complex nervous system in the invertebrate world, independently derived camera-type eyes, and rapid adaptive camouflage abilities (2,3). The burst of evolutionary novelty that distinguishes cephalopods is even more striking when put into a phylogenetic context; cephalopods are a deeply diverged lineage that last share a common ancestor with other extant molluscan lineages in the Cambrian period roughly 550 million years ago (4).
Here, using PacBio long-read sequencing of genomic DNA and IsoSeq full-length mRNA sequencing, we provide a novel chromosome-scale reference genome and annotation for a female California two-spot octopus O. bimaculoides. Strikingly, our assembly reveals evidence for a hemizygous chromosome, the first evidence of genetic sex determination in cephalopods. We use our assembly and annotation in combination with existing genomic information from other cephalopods to create the first whole genome alignments from this group and demonstrate that the sex chromosome is of an ancient, unique origin, dating to between 455–248 million years ago, and has been conserved to the present day in both the octopus and squid lineages.
A chromosome-level assembly reveals a hemizygous Z chromosome in female octopus
The California two-spot octopus was the first cephalopod genome to be sequenced in 2015 (5) and subsequently placed into scaffolds in 2022 (6). Although these resources have been tremendously valuable for cephalopod research, the assembly still contains numerous gaps and many gene annotations remain fragmented due to the highly-repetitive nature of the genome. To sequence through the long, repetitive stretches of the O. bimaculoides genome, we re-sequenced a single female individual with PacBio’s long, high-fidelity (HiFi) sequencing and used chromosomal conformation capture (Hi-C) to place scaffolds into chromosomes. After scaffolding, the total genome assembly consists of 2.3Gb with 30 chromosomal scaffolds representing the expected N = 30 karyotype of Octopus. A comparison of assembly statistics to existing O. bimaculoides assemblies is shown in Table S1. Generally speaking, our new assembly is more complete than previous assemblies, with a contig N50 of 0.86 Mb and a scaffold N50 of 101.05 Mb. The new assembly reduced the number of scaffolds in the assembly from 370,699 (6) to 583, thus increasing the average scaffold length by roughly three orders of magnitude.
Strikingly, our Hi-C contact map showed evidence for one chromosome, chromosome 17, having reduced coverage in comparison to other chromosomes in our assembly from a female individual (Figs. 1A & S1). As the original reference assembly from (6) was from a male individual we were able to compare coverage in that assembly, which showed no difference in coverage between chromosome 17 and any other chromosome. Using short read Illumina data that we generated from unrelated female and male O. bimaculoides individuals (N = 2 of each sex; Table S2) we confirmed that females of this species are hemizygous for chromosome 17 whereas males are diploid, so hereafter we refer to chromosome 17 as Z (Figs. 1B & S2).
Figure 1:
Sequencing data showing half coverage at chromosome 17 of O. bimaculoides. A. Hi-C contact map from our assembly of a female O. bimaculoides. Chromosome 17, our putative Z chromome is highlighted as a clear outlier in coverage B. Normalized read depth of male and female whole-genome Illumina short read data. Purple points are from chromosome 17, whereas every other scaffold is shaded in orange. Scatterplot shows male vs. female normalized coverage, while the density estimates on the margins are from female (top) and male (side) separately.
Since the O. bimaculoides male genotype is clearly ZZ and the female is hemizygous at Z, we next turned our attention towards identification of a potential W chromosome limited to females. To do so we looked for scaffolds that were only present in the female-derived sequence libraries and absent from the male-derived libraries and assemblies. We found no such candidates. This suggests that females are ZO and males ZZ, perhaps indicating the evolutionary loss of the W chromosome after substantial degradation (7,8). Indeed ZZ/ZO and XX/XO sex determination systems have been described in other groups including Lepidopterans (9) and plants (8).
Whole genome alignment among cephalopods shows the Z chromosome is an outlier
To determine if the Z chromosome was unique to O. bimaculoides or more broadly distributed among cephalopods, we next created the first whole-genome, multiple alignment among existing cephalopod genomes using the Progressive Cactus / Comparative Genomics toolkit pipeline (10) (see Supplemental Methods for details). This alignment compared three other Octopus genomes (Hapalochlaena maculosa, Octopus minor, and Octopus sinensis), three squid genomes (Architeuthis dux, Euprymna scolopes, and Sepia pharaonis), and the outgroup to coleoid cephalopods, the chambered nautilus (Nautilus pompilius) (Figure 2A). Our alignment enables a host of analyses, including studies of sequence divergence and synteny, and we were particularly interested to compare patterns of evolution on autosomes versus the Z chromosome.
Figure 2:
A. Cephalopod phylogeny used in this study. Phylogeny inferred using one-to-one protein orthologs identified by Orthofinder. Branch lengths in units of amino acid substitutions per site. Divergence dates are from (14,15) B.Windowed pairwise divergence calculated between O. bimaculoides and O. sinesis (top panel) and O. bimaculoides and E. scolopes (bottom panel). For these analysis 1Mb non-overlapping windows were used using our a multiple alignment extracted with O. bimaculoides as the reference. Chromosome 17, our presumptive Z chromosome is highlighted in purple. C. Genomic repeats associated with each leaf taxon. Repeats show percent of genome composed of DNA transposons, LINE, LTR, and SINE elements for each species. D. LINE content across genomes of three species. A single outlier with respect to LINE density is observed in each species, chromosome 17 in O. bimaculoides, chromosome 20 in O. sinensis, and chromosome 43 in E. scolopes, highlighted in purple.
The Z chromosome is a clear outlier in these comparisons (Fig. 2). Looking at 1 megabase windowed divergence between O. bimaculoides and O. sinensis revealed that the Z chromosome is evolving significantly slower than autosomes (Mann–Whitney U test, p = 0.0011). The same pattern holds true for divergence between O. bimaculoides and the bobtail squid Euprymna scolopes (Mann–Whitney U test, p < 0.0001; Fig. 2B). This result mirrors what is seen in primate autosomes in comparison to the X chromosome (11,12) and is likely due to purging of recessive, mildly deleterious mutations from the hemizygous chromosome.
Repetitive element characteristics of the Z chromosome are unique
Our whole-genome alignment allows us to examine repetitive element evolution in a phylogenetic framework. Figures 2A & 2C show a phylogenetic tree representing the relationship among these cephalopods along with the percent of the genome occupied by each of four classes of repetitive element: DNA transposons, LINEs, LTRs, and SINEs. It is clear from this comparison that the lineage leading to Octopuses underwent a dramatic increase in the number of SINE elements, SINEs taking up approximately 4.64% of the genome sequence in O. bimaculoides. The SINE expansion in Octopus genomes has been noted previously with sparser comparisons (5), however when seen in a phylogenetic light (Fig. 2) this pattern is abundantly clear.
While a historical expansion of SINE elements is a striking feature of this genome, we also found an abundance of LINE elements, which composed a higher proportion of the genome than SINE elements, at 12.73%. Two clades of LINE elements, R2/R4/NeSL and RTE/Bov-B made up the bulk of this, with 3.10% and 4.59% of the genomic sequence, respectively (Fig. 2C).
Our chromosome-level assembly allows us to explore if there is genomic heterogeneity in the accumulation of transposable elements (TEs). Visualization of the TE contributions to individual chromosomes (Fig. 2D) suggests that while most chromosomes have little variation in their proportions of TEs, the Z chromosome is again a notable outlier with respect to LINE elements, harboring LINEs at approximately twice the density as other chromosome arms (Mann-Whitney test; p < 0.0001; Fig. 2D).
LINE elements are a signature of cephalopod sex chromosomes
We hypothesized that this huge abundance of LINEs could be a clear signature of the cephalopod Z chromosome in other genomic assemblies. Looking at the landscape of TEs in the high quality, chromosome-level assemblies of O. sinensis and E. scolopes yielded a striking pattern–a single chromosome in each of those assemblies (chr20 and chr43 respectively) are outliers for LINE elements, perhaps suggesting the presence of an orthologous Z chromosome (Fig. 2D).
As in O. bimaculoides, each of these chromosomes harbors a significantly greater density of LINEs in comparison to other chromosomes/putative autosomes in the genome (Mann-Whitney tests; p < 0.0001 and p = 0.00023 for O. sinensis and Euprymna scolopes, respectively). Upon inspection of the much more distant nautilus genome, no such LINE element outlier chromosome could be identified (Fig. S3). Thus evidence from LINE element enrichment suggests that this is a unique feature of the Z chromosome and that the Z chromosome may have originated before the split of the squid and octopus lineages.
Syntenic relationships of genes on the Z chromosome are conserved
We hypothesized that the chromosomes we identified in O. sinensis and E. scolopes might be orthologous to the O. bimaculoides Z chromosome on the basis of LINE element content. To test this hypothesis, we compared synteny among chromosomes using gene-level annotations as revealed by the GENESPACE software package (13), which uses orthologous gene identification to define conserved synteny blocks. As can be seen in Figures 3A and 3B, gene-based synteny confirms our hypothesis – the O. bimaculoides Z chromosome has a large block of synteny conserved on both chr20 in O. sinensis and on chr43 in E. scalopes. Dotplots showing finer resolution pairwise alignments are shown in Figure S4.
Figure 3:
A. Riparian synteny plot generated using orthogroups between the chromosomes of squid and octopus. Ribbons connected to the O.bimaculoides Z chromosome are highlighted in purple. B. Riparian plot demonstrating gene-based synteny between octopus and squid sex chromosomes reveals conservation of the Z chromosome among O. bimaculoides, O. sinesis, and E. scolopes. C. Normalized coverage of short read libraries mapped to representative chromosomes in E. scolopes. Chromosome 43 is the only chromosome with such variation in coverage. SRA identifiers for each library are shown in the legend. Data was generated by (16)
A single, ancient origin of the cephalopod Z chromosome
Our evidence, when taken together, demonstrates that the O. bimaculoides Z chromosome is a genomic outlier with clear homology at the gene level and with respect to repeat content to chr20 in O. sinensis and to chr43 in E. scolopes. While the divergence time between O. bimaculoides and O. sinensis is relatively modest at approximately 34 million years (14), divergence between squids and octopus is much older, with an approximate divergence time between O. bimaculoides and E. scolopes of 248 – 339 million years (15). It was thus imperative to examine whether the orthologous chromosome in E. scolopes was functioning as a Z chromosome in that lineage. While single animal sequence libraries from sexed E. scolopes were not available, we examined coverage of a set of Illumina short read experiments derived from single, unsexed embryos from (16). These results again confirmed our hypothesis– while coverage among chromosomes was nearly uniform, chr43 stands out as an outlier with multiple distinct coverage classes, suggesting females are hemizygous as observed in O. bimaculoides (Figs. 3C, S6, and S7).
Having established strong support for homology of the Z chromosome among squid and octopus, we were lastly interested in examining what genes might be shared in orthologous regions of the Z chromosome among lineages. We found 19 unique protein coding loci shared among these three taxa that are housed in this region of the genome. We did blastp homology searches of these genes to human proteins and found 16 of 19 had strong hits (Table S3). Using publicly available summaries from GeneCards (17) we report that all 16 show mRNA expression in human reproductive tissues, and 15 of 16 of these show protein expression in human reproductive tissues. A particularly leading hit here is the protein obimac0008950 which shows strong orthology to the human Sperm Associated Antigen 9 (SPAG9; e-val=0.0). Thus we believe that the genes retained on the Z chromosome may represent an ancient set of proteins essential to animal reproduction and/or gametogenesis.
Our results provide the first glimpse of sex determination in coleoid cephalopods, a phenomenon which until now has remained a mystery. The clear presence of a hemizygous Z chromosome in females suggests a ZZ/ZO sex determination system that had previously been missed in cytological comparisons, likely as a result of the lack of heterogamy. Further, our results suggest that the cephalopod Z chromosome evolved once in the lineage leading to the common ancestor of squids and octopuses and has thus been conserved for between 455 and 248 million years of evolution. This is an astoundingly long time for a sex chromosome to be preserved (7). A few other ancient sex chromosomes have been described previously from liverworts (~ 430Mya; (18)) and mosses (~ 300Mya; (19)), and there is some evidence the insect X chromosome is quite ancient (~ 450Mya; (20)), although it is not well conserved and shows rapid evolution in some clades. Thus in context, the cephalopod Z chromosome may be the oldest conserved animal sex chromosome yet described.
Supplementary Material
Acknowledgements
We thank Caroline Albertin and Matthew Birk for providing samples, Mara Lawniczak for her input on the project and comments on the manuscript, and John Postlethwait, Peter Ralph, Melissa Toups, Graham Coop, and members of the Kern-Ralph colab for their input and comments on this manuscript. Sequencing and sample preparation was performed by the University of Oregon Genomics & Cell Characterization Core Facility. GCC was funded by NSF GRFP 1842486. ADK was funded in part by NIH awards R35148253 and R01HG010774. CMN was funded in part by NIH award R01NS118466. CMN and ACM were funded in part by a University of Oregon Renee James Seed Grant.
References and Notes
- 1.Hanlon R. T., Messenger J. B., Cephalopod Behaviour (Cambridge University Press, 2018), second edn. [Google Scholar]
- 2.Young J. Z., The anatomy of the nervous system of Octopus vulgaris (Clarendon Press, 1971). [Google Scholar]
- 3.Hanlon R., Current biology 17, R400 (2007). [DOI] [PubMed] [Google Scholar]
- 4.Ponder W., Lindberg D. R., Phylogeny and Evolution of the Mollusca (Univ of California Press, 2008). [Google Scholar]
- 5.Albertin C. B., et al. , Nature 524, 220 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Albertin C. B., et al. , Nature communications 13, 2427 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Charlesworth D., Charlesworth B., Marais G., Heredity 95, 118 (2005). [DOI] [PubMed] [Google Scholar]
- 8.Charlesworth D., Journal of experimental botany 64, 405 (2013). [DOI] [PubMed] [Google Scholar]
- 9.Johnson N. A., Lachance J., Annals of the New York Academy of Sciences 1256, E1 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Armstrong J., et al. , Nature 587, 246 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Hobolth A., Christensen O. F., Mailund T., Schierup M. H., PLoS genetics 3, e7 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Scally A., et al. , Nature 483, 169 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Lovell J. T., et al. , Elife 11, e78526 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Jiang D., et al. , BMC biology 20, 1 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Tanner A. R., et al. , Proceedings of the Royal Society B: Biological Sciences 284, 20162818 (2017). [Google Scholar]
- 16.Schmidbaur H., et al. , Nature communications 13, 2172 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Stelzer G., et al. , Current protocols in bioinformatics 54, 1 (2016). [DOI] [PubMed] [Google Scholar]
- 18.Iwasaki M., et al. , Current Biology 31, 5522 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Carey S. B., et al. , Science Advances 7, eabh2488 (2021).34193417 [Google Scholar]
- 20.Toups M. A., Vicoso B., Evolution 77, 2504 (2023). [DOI] [PubMed] [Google Scholar]
- 21.Songco-Casey J. O., et al. , Current Biology 32, 5031 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Marc G. ¸ Kingsford ais, C, Bioinformatics 27, 764 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Ranallo-Benavidez T. R., Jaron K. S., Schatz M. C., Nature communications 11, 1432 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Cheng H., Concepcion G. T., Feng X., Zhang H., Li H., Nature methods 18, 170 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Guan D., et al. , Bioinformatics 36, 2896 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Durand N., et al. , Juicer provides a one-click system for analyzing loop-resolution hi-c experiments. cell syst 3: 95–98 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Dudchenko O., et al. , Science 356, 92 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Flynn J. M., et al. , Proceedings of the National Academy of Sciences 117, 9451 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Chen N., Current protocols in bioinformatics 5, 4 (2004). [DOI] [PubMed] [Google Scholar]
- 30.Bruna T., Hoff K. J., Lomsadze A., Stanke M., Borodovsky M., NAR genomics and bioinformatics 3, lqaa108 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Tardaguila M., et al. , Genome research 28, 396 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Hubisz M. J., Pollard K. S., Siepel A., Briefings in bioinformatics 12, 41 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Vasimuddin M., Misra S., Li H., Aluru S., 2019 IEEE international parallel and distributed processing symposium (IPDPS) (IEEE, 2019), pp. 314–324. [Google Scholar]
- 34.Li H., Bioinformatics 34, 3094 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Quinlan A. R., Hall I. M., Bioinformatics 26, 841 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Mayakonda A., Lin D.-C., Assenov Y., Plass C., Koeffler H. P., Genome research 28, 1747 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Paradis E., Schliep K., Bioinformatics 35, 526 (2019). [DOI] [PubMed] [Google Scholar]
- 38.Zhang Y., et al. , Nature Ecology & Evolution 5, 927 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Albertin C. B., et al. , Nature communications 13, 2427 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Song W., et al. , Frontiers in Marine Science 8, 639670 (2021). [Google Scholar]
- 41.Da Fonseca R. R., et al. , GigaScience 9, giz152 (2020).31942620 [Google Scholar]
- 42.Kim B., Kang S., Ahn D., et al. , GigaScience Database 10, 100503 (2018). [Google Scholar]
- 43.Whitelaw B. L., et al. , GigaScience 9, giaa120 (2020).33175168 [Google Scholar]
- 44.Li F., et al. , Molecular ecology resources 20, 1572 (2020). [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



