Summary
The divergence of chimpanzee and bonobo provides one of the few examples of recent hominid speciation1,2. We present a fully annotated, high-quality bonobo genome assembly, constructed without guidance from reference genomes, by applying a multiplatform genomics approach. We generate a bonobo genome assembly where >98% genes are completely annotated and 99% of the gaps are closed, including resolution of about half of the segmental duplications and almost all full-length mobile elements. We compare the bonobo genome to other apes1,3-5 and identify >5,571 fixed structural variants that specifically distinguish the bonobo and chimpanzee lineages. We focus on genes that have been lost, changed in structure, or expanded in the last few million years of bonobo evolution. We produce a high-resolution map of incomplete lineage sorting (ILS) estimating that ~5.1% of the human genome is closer to chimpanzee/bonobo and >36.5% of the genome shows ILS if we consider a deeper phylogeny including gorilla and orangutan. Here, we show that 26% of human–chimpanzee/human–bonobo ILS segments are nonrandomly distributed and genes within these clustered segments show significantly accelerated amino acid evolution when compared to the rest of the genome.
Bonobo or pygmy chimpanzee (Pan paniscus) and the common chimpanzee (Pan troglodytes) are among the most recently diverged ape species (1.7 million years ago)1,2. Both species represent the closest living species to humans and, therefore, offer the potential to pinpoint genetic changes that are also unique to human. The first bonobo generated from short-read whole-genome sequence1 produced a genome assembly (panpan1.1) with >108,000 gaps where the vast majority of segmental duplications (SDs) were not incorporated and few structural variants (SVs) were identified (Supplementary Table 1). As a result of the lower accuracy of early next-generation sequencing technology6 and the fragmentary nature of the original chimpanzee genome5, large fractions of ape genomes could not be compared1,3-5 and gene models were often incomplete7,8. In the last few years, long-read genome sequencing technologies have significantly enhanced our ability to generate contiguous, high-quality genomes where most genes and common repeat elements are fully annotated9. Here, we apply a multi-platform approach to produce a highly contiguous, accurate bonobo reference genome. Our analysis highlights the extent and rapidity at which hominid genomes can differ and provides new insights with respect to ILS and its relevance to gene evolution and the genetic relationship among living hominids.
Sequence and assembly.
We sequenced DNA from a female bonobo (Mhudiblu, Pan paniscus) to 74-fold sequence coverage using the long-read PacBio RS II platform (Supplementary Tables 2 and 3, Supplementary Fig. 1). We generated a 3.0 Gbp assembly (contig N50 of 16.58 Mbp; Supplementary Table 4) and constructed a chromosomal-level AGP (a golden path) assembly (Mhudiblu_PPA_v0) using Bionano optical maps and a FISH BAC clone-order framework (Fig. 1)10. The Mhudiblu_PPA_v0 assembly assigns 74 Mbp of new sequence to chromosomes closing 99.5% of the original 108,390 gaps (Supplementary Table 5). This assembly has been annotated by NCBI and is available on the UCSC Genome Browser (panPan3, see Data access and Extended Data Figure 1 for assembly workflow schema). We estimate the sequence accuracy of the bonobo assembly to be 99.97-99.99% (Supplementary Table 6). The overall nucleotide divergence between chimpanzee and bonobo based on these new long-read assemblies is 0.421±0.086% for autosomes and 0.311±0.060% for the X chromosome (Supplementary Table 7). Using these new assemblies, we genotyped 27 previously sequenced ape genomes resulting in slight adjustments in mean effective population sizes for the apes (Extended data Fig. 2).
Gene annotation.
We predict 22,366 full-length protein-coding genes and 9,066 noncoding genes using the NCBI Eukaryotic Genome Annotation Pipeline. We also generated 857,000 full-length bonobo cDNA (Supplementary Table 8) and applied the Comparative Annotation Toolkit (CAT)11 to identify 20,478 protein-coding and 36,880 noncoding bonobo gene models; 99.5% of the protein-encoding models show no frameshift errors12 and 38.4% of protein-coding isoforms are now more complete. We identify 119 genes that have potential frameshifting indels disrupting the primary isoform relative to the human reference (GRCh38) (Supplementary Table 9). Respectively, 206 and 1,576 protein-coding genes are part of gene families that contracted or expanded in bonobo when compared to human (Supplementary Tables 10-12). We identify 65 putatively novel exons with support from full-length cDNA (Supplementary Tables 13-14), such as the novel protein-coding exon in ANAPC2 found in bonobo but not in chimpanzee (Supplementary Fig. 2). Using other ape genomes13,14 and a genome-wide analysis from 20 bonobo and chimpanzee samples, we identified genes showing an excess of amino acid replacement, balancing selection, and potential selective sweeps (Tajima’s D)15. Most of the genes showing selective sweeps in bonobo (DIRC2, GULP1, ERC2; Supplementary Tables 15-18) or chimpanzee (KIAA040, TM4SF4, FOXP2; Supplementary Tables 19-22) are novel.
Mobile element insertions (MEIs).
The number of full-length (retrotransposition competent), lineage-specific L1 elements in the bonobo genome (415 L1Pt) is similar to chimpanzee (383 L1Pt) and 15-25% greater than human (330 L1HS) (Supplementary Fig. 3-5). An analysis of Alu repeats leads to a refined subfamily classification and we find that the number of bonobo-specific elements is nearly identical to that of chimpanzee (n = 1,492). Pan lineages, thus, show among the lowest rates of Alu insertions when compared to humans (where the rate has doubled) and the rhesus genome (~10-fold increased rate) (Extended Data Fig. 3). While bonobo shows reduced single-nucleotide variant genetic diversity16 when compared to chimpanzee, we find that bonobo SVA elements are more copy number polymorphic (45%) (Extended Data Fig. 3) when compared to chimpanzee (35%; p < 6.5 X 10−4). In contrast, the chimpanzee-specific endogenous retrovirus (PTERV1) shows an indistinguishable low rate of polymorphism for PtERV1 in both species (9%) suggesting relatively little activity since Pan divergence (Supplementary Data).
Segmental duplications (SDs).
We identify 87.4 Mbp of SDs (≥1 kbp and ≥90% identity) (Supplementary Fig. 6-7; Supplementary Table 23 and Extended Data Fig. 3), most of which was previously unassembled. SDs are interspersed with an excess of large (≥10 kbp) intrachromosomal duplications consistent with the burst of SDs that occurred at the root of the hominid lineage17. Despite the ~6-fold improvement, the largest and most identical duplications were still not initially resolved (~84 Mbp). Using the Segmental Duplication Assembler (SDA) algorithm18,19, we successfully resolved an additional 56 Mbp (Supplementary Table 24) and used these to identify recent gene family expansions (Supplementary Tables 25-31 and Extended Data Fig. 4). We show, for example, that the Eukaryotic Translation Initiation Factor 4 Subunit A3 (EIF4A3) gene family has expanded in both chimpanzee and bonobo. There is evidence that five out of the six paralogs are expressed and encode a full-length open reading frame (ORF; Fig. 2 and Extended Data Fig. 5). We estimate that the initial EIF4A3 gene duplication occurred in the ancestral lineage approximately 2.9 million years ago. It then subsequently expanded and experienced gene conversion events independently in the chimpanzee and bonobo lineages creating five and six copies of the EIF4A3 gene family, respectively. Interestingly, some of the gene conversion signals correspond to a set of specific amino-acid changes in the basic ancestral structure that are now common to only chimpanzee and bonobo (Fig. 2 and Extended Data Fig. 5).
Structural variation and gene disruption.
As part of the assembly curation, we validated nine larger inversions that distinguish human and bonobo karyotypes and created a FISH-based chromosomal backbone (Fig. 1a and b) and used Strand-seq to assign orphan contigs to chromosomes (36 Mbp) (Mhudiblu_PPA_v1; Supplementary Tables 32-38). We identify 17 fixed inversions differentiating bonobo from chimpanzee of which 11 are bonobo specific (Supplementary Table 39) and 22 regions that likely represent bonobo inversion polymorphisms (Supplementary Table 40). Moreover, we assign 38 fixed inversions occurring in the common Pan ancestor (Supplementary Table 39). We annotated and validated the breakpoint intervals of each tested inversion (Supplementary Table 41) and found SDs or LINEs at the breakpoints of inversions in 82% and 86% of cases, respectively (Supplementary Table 40). We also compared the bonobo genome to that of human, chimpanzee, and gorilla to identify smaller (>50 bp) deletions and insertions. We classify 15,786 insertions and 7,082 deletions as bonobo-specific and genotyped these in a population of ape samples16,20 to identify 3,604 fixed insertions and 1,965 fixed deletions of which only a small fraction (2.66% or 148/5,571) intersect genic functional elements (Supplementary Tables 42-45).
Bonobo-specific events deleting ENCODE regulatory elements (n = 381), for example, are enriched in membrane-associated genes with extracellular domains while chimpanzee-specific events (n = 185) are associated with cadherin-related genes (Supplementary Table 46). Deletions (n = 1,040) shared between chimpanzee and bonobo show an enrichment for the loss of putative regulatory elements associated with post-synaptic genes (3.32 enrichment; p = 1.2 x 10-7) and pleckstrin homology-like domains (6.15 enrichment; p = 1.20 x 10-9). We validate 110 events that disrupt protein-coding genes by generating high-fidelity genomic sequencing for each of the ape reference genomes and restricting to those events that could be genotyped in a population of genomes (Supplementary Data). As expected, many fixed gene-loss events occurred in genes tolerant to mutation, redundant duplicated genes, or genes where the event simply altered the structure of the protein. For example, we validate a 25.7 kbp gene loss of one of the keratin-associated genes (KRTAP19-16) associated with hair production in the ancestral lineage of chimpanzee and bonobo (Supplementary Fig. 8). In the bonobo lineage, we identify five fixed SVs affecting protein-coding genes (Supplementary Table 47), but only two of which completely ablate the gene. For example, LYPD8, which encodes a secreted protein that prevents gram-negative bacteria invasion of colonic epithelium, has been totally deleted by a 24.3 kbp bonobo-specific deletion. Similarly, SAMD9 (SAMD Family Member 9) is a fixed gene loss in bonobo as a result of a 41.46 kbp bonobo-specific deletion. The other three bonobo-specific fixed SV events in protein-coding regions all maintain the ORF, including a 49-amino acid deletion of ADAR1, a gene critical for RNA editing and implicated in human disease (Extended Data Fig. 6)21-23.
Incomplete lineage sorting (ILS).
The higher quality and more contiguous nature of the bonobo genome provide an opportunity to generate a higher-resolution ILS map. Compared to the original bonobo assembly where only ~800 Mbp (27%) could be analyzed, it is now possible to align ~76% of the genome in a four-way ape genome alignment (2,357 Mbp within 10 kbp windows; Supplementary Table 48) due to long-read genome assemblies14. We performed a genome-wide phylogenetic window-based analysis to systematically identify regions inconsistent with the species tree and classified these as human–bonobo and human–chimpanzee ILS topologies (Fig. 3). We predict that 5.07% of the human genome is closer to chimpanzee/bonobo (Table 1); 2.52% of the human genome is more closely related to the bonobo genome (human–bonobo (H-B) ILS segments) than the chimpanzee genome while 2.55% of the human genome is more closely related to the chimpanzee genome (Human-chimpanzee (H-C) ILS) than the bonobo genome (Fig 3a). This proportion of ILS nearly doubles earlier estimates (3.1%)1 (Supplementary Table 1). Consistent with previous observations1, the largest ILS segments are biased (1.8-fold) to intergenic regions, depleted for genes (>35%) and are particularly enriched in L1 content. Of note, the distribution of ILS segments is highly nonrandom based on simulation experiments. We specifically measured the distance between ILS segments (see below) and identified a subset (~26%) of sites that are significantly more clustered than expected by chance (Fig. 3b).
Table 1.
Window size |
Number of ILS segments | Percentage of ILS | Total ILS* |
Genomic properties | ||||||
---|---|---|---|---|---|---|---|---|---|---|
(G,((B,H),C)) | (G,((H,C),B)) | (G,((B,H),C)) | (G,((H,C),B)) | GC* | Intergenic/Intragenic | Alu* | L1* | Exon* | ||
20 kb | 218 | 218 | 0.19 | 0.19 | 0.38 | 37.7 | 1.79 | 6.37 | 31.44 | 0.49 |
10 kb | 1,143 | 1,138 | 0.49 | 0.48 | 0.97 | 38.39 | 1.73 | 7.35 | 27.08 | 0.47 |
5 kb | 4,314 | 4,373 | 0.91 | 0.92 | 1.83 | 38.95 | 1.64 | 7.85 | 24.67 | 0.58 |
2 kb | 18,218 | 18,334 | 1.52 | 1.53 | 3.05 | 39.58 | 1.49 | 8.71 | 21.51 | 0.72 |
1 kb | 46,584 | 46,938 | 2.06 | 2.07 | 4.13 | 40.06 | 1.37 | 9.8 | 19.85 | 0.8 |
500 bp | 102,197 | 103,338 | 2.52 | 2.55 | 5.07 | 40.54 | 1.33 | 11.24 | 18.66 | 0.75 |
Genome average | 40.89 | 1.15 | 10.17 | 17.42 | 1.17 |
G, gorilla; B, bonobo; H, human; C, chimpanzee. (G,((B,H),C)) and (G,((H,C),B)) represent two different ILS topologies.
content in (%); GC, Alu, L1 and Exon contents are based on the GRCh38 genome.
We focused specifically on protein-coding exons based on human RefSeq annotation24 and identified 1,446 exons mapping to ILS regions (713 exons to H-B and 733 exons to a H-C topology; Supplementary Table 49). As a whole, genes corresponding to these ILS exons are significantly enriched in both glycoprotein function (p = 1.30 x 10−14 for H-B and p = 5.60 x 10−11 for H-C) and calcium-binding epidermal growth factor (EGF) domain function (p = 4.40 × 10−12 for H-B and p = 9.40 x 10−7 for H-C) (Supplementary Table 50). We considered multiple occurrences in the same gene and identified 84 genes with at least two exons under ILS (Supplementary Table 51) with some enrichment in photoreceptor activity (p = 1.6 x 10−4, Supplementary Table 51 and Supplementary Fig. 9) as well as EGF-like (p = 1.9 x 10−6) and transmembrane (p = 2.4 x 10−3) function. Overall, we observe a significant excess of amino acid replacement (dN/dS) for all 1,446 ILS exons when compared to non-ILS exons consistent with either the action of relaxed selection or positive selection (Fig. 3c and Extended Data Fig. 7). Exons mapping to the clustered ILS segments show greater dN/dS with respect to exons in the non-clustered ILS segments suggesting that they are contributing disproportionately to accelerated amino acid evolution in the hominid genome.
We extended the ILS analysis (Supplementary Data) across 15 million years of hominid evolution by inclusion of orangutan and gorilla ape genome data. As expected, ILS estimates for the human genome increase to >36.5% (Supplementary Table 52, Extended Data Fig. 7) similar to (albeit still greater than) earlier estimates3,14. We measured the inter-ILS distance and observed a consistent nonrandom pattern of clustered ILS for these deeper topologies with more ancient ILS showing an even greater proportion of clustered sites (Fig. 3). Once again, we observe a significant elevated mean dN/dS in clustered H-C and H-B (p < 2.2e-16, mean = 0.366) as well as clustered Orangutan-Human and Orangutan-Gorilla-Human topologies (p < 2.2e-16, mean = 0.316) when compared to the null distribution (Supplementary Fig. 10). A GO analysis25 of the genes intersecting these combined data confirm the most significant signals for immunity (e.g., glycoprotein (p = 1.3E-25), immunoglobulin-like fold/FN3 (p = 2.4E-20)), but also genes related to EGF signaling (p = 1.4E-18), solute transporter function (e.g., transmembrane region (p = 1.3E-25)), and specifically calcium transport (p = 3.7E-8) (Supplementary Table 53). While ILS regions, in general, show single-nucleotide polymorphism diversity patterns consistent with balancing selection, it is noteworthy that both clustered and non-clustered ILS exons show a significant excess of polymorphic gene-disruptive events consistent with the action of relaxed or balancing selection (Supplementary Fig. 11). An examination of these gene-rich clustered ILS regions reveals a complex pattern of diverse ILS topologies suggesting deep coalescence operating across specific regions of the human genome as has been reported for major histocompatibility complex (Extended Data Fig. 8).
Discussion
High-quality hominid genomes are a critical resource for understanding the genetic differences that make us human as well as the diversification of the Pan lineage over the last two million years of evolution. The bonobo represents one the last of the great ape genomes to be sequenced using long-read sequencing technology. Its sequence will facilitate more systematic genetic comparisons between human, chimpanzee, gorilla, and orangutan without the limitations of technological differences in sequencing and assembly of the original reference1,3-5,14. As a result, we now predict that a significantly greater fraction (~5.1%) of the human genome is closer to chimpanzee/bonobo when compared to previous studies (3.1%)1,3. We estimate that >36.5% of the hominid genome shows ILS if we consider a deeper phylogeny including gorilla and orangutan. Remarkably, 26% of the ILS regions are clustered and exons underlying these clustered ILS signals show elevated rates of amino acid replacement. These findings support a previous study in gorilla that showed a subtler correlation where genes with higher dN/dS values are enriched in ILS segments3. In that study, however, they explained the observation as a result of stronger purifying selection in non-ILS sites or background selection reducing the effective population size and as a result a depletion of ILS. Our genome-wide exon analyses specifically show that only a subset of clustered ILS exons are driving this effect and that these genes are enriched for glycoprotein and EGF-lke calcium signaling functions due to the action of either relaxed or positive selection for genes in these pathways (Supplementary Information).
METHODS SUMMARY
We sequenced and assembled the genome of a single female bonobo (Mhudiblu a.k.a. Mhudibluy obtained from the San Diego Zoo ISIS 601152 born 15 April 2001 and later transferred to the Wuppertal Zoo in Germany where she was referred to as Muhdeblu) using long-read PacBio RS II sequencing chemistry and the Falcon genome assembler. The assembly was error corrected using Quiver26, Pilon27, and an in-house FreeBayes-based28 indel correction pipeline optimized to improve CLR assemblies14. We also generated Illumina WGS data using the Illumina TruSeq PCR-Free library preparation kit. Genome assembly contigs were ordered and oriented into scaffolds using Bionano optical maps (Supplementary Table 54) (HybridScaffolds suite, Bionano Genomics Saphyr platform) and four-color FISH of 324 BAC clones. Cell lines from chimpanzee, bonobo, gorilla and orangutan were obtained from Coriell (S006007) or from a collection developed by Professor Mariano Rocchi; no approval from ethics committees were required for use of these established lines. We assigned each contig/scaffold into unique groups corresponding to individual chromosomal homologues using SaaRclust29,30 while applying Strand-seq to detect inversions, assign orphan contig and orient contigs31,32. To estimate genome-wide sequence accuracy, we applied Merqury33 using Illumina WGS data. We also generated a bonobo large-insert BAC library (VMRC74) and selected at random 17 clones for complete PacBio insert sequencing34. CAT11 was utilized for genome annotation using human GENCODE V33 and RNA-seq data. We also generated >850,000 full-length non-chimeric transcripts from Iso-Seq data generated from iPSC and derived neuronal progenitor cell lines35 from bonobo sample AG05253 and we searched for gene structures split over multiple contigs (Supplementary Table 55). Repeat content of the assembled genome was analyzed using RepeatMasker (RepeatMasker-Open-4.1.0) and the Dfam3 repeat library. We assigned lineage-specific Alu and full-length LINE, SVA_D and PtERV elements to subfamilies by applying COSEG (www.repeatmasker.org/COSEGDownload.html) to determine the lineage-specific subfamily composition. For cross-species analysis of MEIs, we performed liftOver based on chains built from the Cactus whole-genome alignments generated during CAT annotation. For cross-assembly analyses of bonobo MEI insertions and a specific subset of other analyses (Supplementary Data), we used bowtie2 to map MEI flanking sequences between genomes. We estimated the duplication content in the bonobo assembly, applying the whole-genome analysis comparison (WGAC) method36 and targeted collapsed duplications for assembly using SDA19. Insertions and deletions were detected in bonobo, chimpanzee, and gorilla using PBSV, Sniffles37 and Smartie-sv14 and genotyped using Paragraph38 against a panel of 27 Illumina WGS genomes. We searched for evidence of ILS among the chimpanzee, gorilla, and human lineages applying Prank (v.140110) to construct multiple sequence alignments and using ete3 module to identify segments under ILS (Supplementary Table 56). For consistency, NCBI reference genome nomenclature has been used throughout the manuscript and corresponds to the following UCSC IDs (NCBI/UCSC): panpan1.1/panPan2, Mhudiblu_PPA_v0/panPan3, Clint_PTRv2/panTro6, Kamilah_GGO_v0/gorGor6, Susie_PABv2/ponAbe3, GRCh38/hg38 (methods details are in the Supplementary Data).
Extended Data
Supplementary Material
Acknowledgements:
We thank Lucia Carbone and Oliver Ryder for providing the Mhudiblu bonobo cell line; Rusty Gage and Carol Marchetto for preparation of RNA sequence and access to bonobo iPSC lines; Arang Rhie for assistance with Merqury analysis; and Tonia Brown for manuscript proofreading and editing. The silhouettes of the bonobo, gorilla, and human in figure 3 and in the extended data figure 2 are downloaded from phylopic.org under the Public Domain Dedication 1.0 license. The silhouette of the chimpanzee in figure 3 and in the extended data figure 2 is downloaded from phylopic.org under the Creative Commons Attribution 3.0 Unported license (https://creativecommons.org/licenses/by/3.0/, credit to T. Michael Keesey and Tony Hisgett). This work was supported, in part, by the National Institutes of Health (NIH) grants HG002385 and 1U24HG009081 to E.E.E.; Futuro in Ricerca 2010-RBFR103CE3 to M.V.; R01 GM59290 to M.A.B.; 2U41HG007234 to M.D. and B.P.; U01HG010961, U41HG010972, R01HG010485, U01HL137183, and 5U54HG007990 to B.P.; Arian Smit's NHGRI grant RO1 HG002939 to J.M.S.; 5T32HG008345-04 to B.P; R01HG010329-01 to S.R.S; and European Regional Development Fund 2014-2020.4.01.16-0030 to F.M. E.E.E. is an investigator of the Howard Hughes Medical Institute. The work of Jinna Hoffman and Françoise Thibaud-Nissen was supported by the Intramural Research Program of the National Library of Medicine, National Institutes of Health. P.H. is supported by an NIH Pathway to Independence Award (NHGRI, K99HG011041).
Footnotes
Code availability. Custom scripts used in this study are available at GitHub (https://github.com/EichlerLab and https://github.com/MaoYafei).
Competing interests: J.G.U. is an employee of Pacific Biosciences, Inc. All the other coauthors declare no competing financial and non-financial interests.
Data availability.
The Mhudiblu_PPA_v0 (GCA_013052645.1), Mhudiblu_PPA_v1 (GCA_013052645.2) and Mhudiblu_PPA_v2 (GCA_013052645.3) assemblies are deposited at NCBI with BioProject accession PRJNA526933. The raw PacBio CLR, StrandSeq, Illumina, and Iso-Seq data of bonobo are deposited at NCBI with SRA accession SRP188441. The BioNano map of bonobo Mhudiblu is deposited at NCBI with BioProject accession PRJNA526933. The raw PacBio HiFi data of bonobo Mhudiblu and gorilla Kamilah are deposited at NCBI with accession SRP301932 in BioProject accession PRJNA691628. The BACs used in this study are listed in Supplementary Table 57 at NCBI with BioProject accession PRJNA634395.
References
- 1.Prufer K et al. The bonobo genome compared with the chimpanzee and human genomes. Nature 486, 527–531, doi: 10.1038/nature11128 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Takemoto H, Kawamoto Y & Furuichi T How did bonobos come to range south of the congo river? Reconsideration of the divergence of Pan paniscus from other Pan populations. Evol Anthropol 24, 170–184, doi: 10.1002/evan.21456 (2015). [DOI] [PubMed] [Google Scholar]
- 3.Scally A et al. Insights into hominid evolution from the gorilla genome sequence. Nature 483, 169–175, doi: 10.1038/nature10842 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Locke DP et al. Comparative and demographic analysis of orang-utan genomes. Nature 469, 529–533, doi: 10.1038/nature09687 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Consortium, T. C. S. a. A. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437, 69–87, doi:nature04072 [pii] 10.1038/nature04072 (2005). [DOI] [PubMed] [Google Scholar]
- 6.Luo C, Tsementzi D, Kyrpides N, Read T & Konstantinidis KT Direct comparisons of Illumina vs. Roche 454 sequencing technologies on the same microbial community DNA sample. PLoS One 7, e30087, doi: 10.1371/journal.pone.0030087 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Prado-Martinez J et al. Great ape genetic diversity and population history. Nature 499, 471–475, doi: 10.1038/nature12228 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Sudmant PH et al. Global diversity, population stratification, and selection of human copy-number variation. Science 349, aab3761, doi: 10.1126/science.aab3761 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Logsdon GA, Vollger MR & Eichler EE Long-read human genome sequencing and its applications. Nat Rev Genet, doi: 10.1038/s41576-020-0236-x (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Ventura M et al. Gorilla genome structural variation reveals evolutionary parallelisms with chimpanzee. Genome Res 21, 1640–1649, doi: 10.1101/gr.124461.111 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Fiddes IT et al. Comparative Annotation Toolkit (CAT)-simultaneous clade and personal genome annotation. Genome Res 28, 1029–1038, doi: 10.1101/gr.233460.117 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Stanke M, Diekhans M, Baertsch R & Haussler D Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24, 637–644, doi: 10.1093/bioinformatics/btn013 (2008). [DOI] [PubMed] [Google Scholar]
- 13.Gordon D et al. Long-read sequence assembly of the gorilla genome. Science 352, aae0344, doi: 10.1126/science.aae0344 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Kronenberg ZN et al. High-resolution comparative analysis of great ape genomes. Science 360, doi: 10.1126/science.aar6343 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Pavlidis P & Alachiotis N A survey of methods and tools to detect recent and strong positive selection. J Biol Res (Thessalon) 24, 7, doi: 10.1186/s40709-017-0064-0 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.de Manuel M et al. Chimpanzee genomic diversity reveals ancient admixture with bonobos. Science 354, 477–481, doi: 10.1126/science.aag2602 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Marques-Bonet T et al. A burst of segmental duplications in the genome of the African great ape ancestor. Nature 457, 877–881 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Vollger MR et al. Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads. Ann Hum Genet, doi: 10.1111/ahg.12364 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Vollger MR et al. Long-read sequence and assembly of segmental duplications. Nat Methods 16, 88–94, doi: 10.1038/s41592-018-0236-3 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Sudmant PH et al. Evolution and diversity of copy number variation in the great ape lineage. Genome Res 23, 1373–1382, doi: 10.1101/gr.158543.113 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Rice GI et al. Mutations in ADAR1 cause Aicardi-Goutières syndrome associated with a type I interferon signature. Nat Genet 44, 1243–1248, doi: 10.1038/ng.2414 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Savva YA, Rieder LE & Reenan RA The ADAR protein family. Genome Biol 13, 252, doi: 10.1186/gb-2012-13-12-252 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Gallo A, Vukic D, Michalík D, O'Connell MA & Keegan LP ADAR RNA editing in human disease; more to it than meets the I. Hum Genet 136, 1265–1278, doi: 10.1007/s00439-017-1837-0 (2017). [DOI] [PubMed] [Google Scholar]
- 24.O'Leary NA et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res 44, D733–745, doi: 10.1093/nar/gkv1189 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Huang d. W., Sherman BT & Lempicki RA Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4, 44–57, doi: 10.1038/nprot.2008.211 (2009). [DOI] [PubMed] [Google Scholar]
- 26.Chin CS et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods 10, 563–569, doi: 10.1038/nmeth.2474 (2013). [DOI] [PubMed] [Google Scholar]
- 27.Walker BJ et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 9, e112963, doi: 10.1371/journal.pone.0112963 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Haplotype-based variant detection from short-read sequencing (arXiv, 2012). [Google Scholar]
- 29.Ghareghani M et al. Strand-seq enables reliable separation of long reads by chromosome via expectation maximization. Bioinformatics 34, i115–i123, doi: 10.1093/bioinformatics/bty290 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Porubsky D et al. A fully phased accurate assembly of an individual human genome. bioRxiv, 855049, doi: 10.1101/855049 (2019). [DOI] [Google Scholar]
- 31.Falconer E et al. DNA template strand sequencing of single-cells maps genomic rearrangements at high resolution. Nat Methods 9, 1107–1112, doi: 10.1038/nmeth.2206 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Sanders AD et al. Characterizing polymorphic inversions in human genomes by single-cell sequencing. Genome Res 26, 1575–1587, doi: 10.1101/gr.201160.115 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Rhie A, Walenz BP, Koren S & Phillippy AM Merqury: reference-free quality and phasing assessment for genome assemblies. bioRxiv, 2020.2003.2015.992941, doi: 10.1101/2020.03.15.992941 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Huddleston J et al. Reconstructing complex regions of genomes using long-read sequencing technology. Genome Res 24, 688–696, doi: 10.1101/gr.168450.113 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Marchetto MC et al. Species-specific maturation profiles of human, chimpanzee and bonobo neural cells. Elife 8, doi: 10.7554/eLife.37527 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Bailey JA, Yavor AM, Massa HF, Trask BJ & Eichler EE Segmental duplications: organization and impact within the current human genome project assembly. Genome Res 11, 1005–1017, doi: 10.1101/gr.187101 (2001). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Sedlazeck FJ et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat Methods 15, 461–468, doi: 10.1038/s41592-018-0001-7 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Chen S et al. Paragraph: a graph-based structural variant genotyper for short-read sequence data. Genome Biol 20, 291, doi: 10.1186/s13059-019-1909-7 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The Mhudiblu_PPA_v0 (GCA_013052645.1), Mhudiblu_PPA_v1 (GCA_013052645.2) and Mhudiblu_PPA_v2 (GCA_013052645.3) assemblies are deposited at NCBI with BioProject accession PRJNA526933. The raw PacBio CLR, StrandSeq, Illumina, and Iso-Seq data of bonobo are deposited at NCBI with SRA accession SRP188441. The BioNano map of bonobo Mhudiblu is deposited at NCBI with BioProject accession PRJNA526933. The raw PacBio HiFi data of bonobo Mhudiblu and gorilla Kamilah are deposited at NCBI with accession SRP301932 in BioProject accession PRJNA691628. The BACs used in this study are listed in Supplementary Table 57 at NCBI with BioProject accession PRJNA634395.