Abstract
We have reviewed recent progress and the remaining challenge in vector-omics. We have highlighted several technologies and applications that facilitate novel biological insights beyond achieving a reference-quality genome assembly. Among other topics, we have discussed the applications of chromatin conformation capture, chromatin accessibility assays, optical mapping, full-length RNA sequencing, single cell RNA analysis, proteomics, and population genomics. We anticipate that we will witness a great expansion in vector-omics research not only in its application in a broad range of species, but also its ability to uncover novel genetic elements and tackle previously inaccessible regions of the genome. It is our hope that the continued innovation in device portability, cost reduction, and informatics support will in the foreseeable future bring vector-omics to every vector laboratory and field station in the world, which will have an unparalleled impact on basic research and the control of vector-borne infectious diseases.
Introduction
Rapid advancements in -omics technologies, methods to characterize or quantify a collection of biological molecules, have enabled high-quality genome assemblies and new strategies for biological discovery. In this review, we focus on recent advances and future perspectives in omics research of disease vectors. We describe the evolution and impact of sequencing technologies and evaluate the potential of some recently developed -omics tools. We highlight new methods or applications that facilitate novel biological insights beyond a high-quality genome assembly. Conventional transcriptomics is not a major focus of this review however we will discuss recent advances in single-cell RNA-seq and long-read RNA-seq.
1. Brief history of sequencing technologies and the progression of vector genomics
The landmark publication of the genome of the African malaria mosquito Anopheles gambiae in 2002 [1], assembled using a Sanger-based whole genome shotgun sequencing approach, set in motion the rapidly growing field of Vector-omics. Early on, the high costs of Sanger sequencing resulted in assemblies with very short contig N50s from a limited number of vector species. Innovations in next-generation sequencing (NGS) platforms such as 454 and Illumina (Solexa) made it possible to generate millions of short DNA sequences in a cost-effective and high throughput manner. NGS technologies have been used to sequence dozens of new vector species (vectorbase.org), including the genomes of 16 Anopheles species [2]. In addition, NGS technologies have enabled population genomic projects such as the Ag1000G project [3]. Illumina short-read sequencing coupled with large insert mate-pair scaffolding has been the power horse in vector genome projects. However, most assemblies produced using NGS methods comprise of numerous short contigs. This shortcoming is largely due to the difficulty in using short reads to assemble repetitive sequences throughout the genome. The recent introduction of single-molecule long-read sequencing and new scaffolding technologies have made a large impact in genomics as a whole, and helped to improve genome assemblies of several important vector species [4–8]. In particular, the most recent AaegL5 genome analysis of the yellow fever and dengue fever mosquito Aedes aegypti, serves as a gold standard for future vector genome projects as it provides a reference-quality chromosome assembly further enhanced by extensive additional omics resources [4].
2. Current practices and considerations for vector genome assembly
Several recently published assemblies comprise chromosome-length scaffolds and have provided new benchmarks for producing reference-quality assemblies. To achieve such quality, appropriate sequencing technologies, assembly algorithms, and genome scaffolding methods need to be considered (Table 2).
Table 2.
Evaluating the pros and cons of selected sequencing and mapping technologies.
Technology | Description and General Applications | Advantages | Disadvantages | Read Length (average/max) |
---|---|---|---|---|
PacBio Sequencing | Long read sequencing technology used for de novo genome assembly, isoform identification, and detecting epigenetic modifications. | Sequencing of long DNA molecules directly enable detection of epigenetic modifications. Variations enable identification of splice variants and scRNA-seq. Abundance of compatible analysis resources. PacBio has been tested on a variety of vector species and a low input protocol was used to assemble a mosquito genome. Multiple-pass sequencing can improve consensus error rates. | Expensive sequencing instrument requiring most labs to send their libraries to core facility. High cost per Mb compared to Illumina short reads. High error rate but can be improved with multiple pass libraries. Read length limited by a DNA polymerase. | ~15 kb/150kb |
Oxford Nanopore Technologies | Long read sequencing technology used for de novo genome assembly, isoform identification, and detecting epigenetic modifications. | Sequencing of long DNA molecules directly enable detection of epigenetic modifications. Can sequence transcripts for full cDNA and direct RNA sequencing for nucleotide modification detection. Length of reads are only limited by the DNA isolation and sequencing library. Sequencing devices are portable and scalable. Real time analysis is possible for rapid workflows. Several insect tissues have been sequenced showing promise for the vector field. Continuing community-guided improvements to chemistry | High error rate but can be corrected by overlap consensus. Sensitive to long homopolymers. Requires hands-on training to operate. Limited protocols for vectors. Constant updates to kits and user interface which make it challenging for comparison and to keep pace. | ~15 kb/2Mb* (in theory, read length is only limited by the DNA input) |
10x Genomics Linked Long Reads | Linked read sequencing technology used for de novo assembly and scaffolding, structural variant identification, and haplotype phasing. | Moderately low cost per Mb. Uses high accuracy short reads to obtain long-range information. Has been used for structural variant detection. Can use extremely low input DNA. | Reads are not true long reads and incapable of detecting epigenetic modifications. Subject to similar sequencing biases as Illumina. | Up to 150kb (dependent on workflow) |
Bionano Optical Mapping | Optical maps produced by incorporation and visualization of fluorescent labels at nick sites used for hybrid genome scaffolding, haplotype phasing, and structural variant detection. | Capable of producing chromosome-scale assemblies and phasing haplotypes. Bionano can produce top-notch genome assemblies. Bionano was used to resolve the structure of a highly repetitive M Locus region in a mosquito genome. | Requires high-quality genomic DNA. Expensive instrument. Unique expertise needed for data analysis. | 300kb/>2Mb |
HiC Sequencing | Short reads linked by crosslinking interacting fragments within the chromatin. Used for genome scaffolding and phasing. | Can span repetitive centromeric regions. Low sequencing error rates. Used to produce the most recent chromosome-scale AaegL5 genome assembly and has been tested for several vector species. | High starting input requirements. Technique requires some training and optimization for tissue crosslinking. Subject to similar sequencing biases as Illumina. | 150kb/1Mb |
scRNA-sequencing (10x Genomics) | RNAseq at the single cell resolution. Available protocols allow for scRNA-sequencing of thousands of cells at one time. Can be used to study vector- pathogen interactions at the single cell resolution. | Utilizes innovative microfluidics and library construction to track individual molecules and their native cell. Compatibility with other technologies (eg. Single cell long read sequencing for splice isoform detection). | Low number of reads per cell. High noise in sequencing data. Requires accurate cell counts/high quality cell preparations. Requires purchasing the Chromium Controller for cell separation. Requires hands-on training to get started | 150bp/full-length cDNA |
ChIP-seq | Sequences immunoprecipitation-enriched antigen-bound DNA fragments. Can detect nucleosome position and histone modification patterns, genome-wide expression changes, and transcription factor binding sites. | Can effectively be used to study heterochromatin. Can be used for genome-wide surveys when detecting Histone PTMs (using Dm antibodies if epitopes are conserved). ChIP-seq can be used to target pathways and understand transcription factor binding and pathway regulation. Has been used in several mosquito species. | Requires antibodies for successful enrichment and assay is highly dependent on antibody quality. Some experiments may require high sequencing depth if enrichment is low. | 150bp |
ATAC-seq | Assesses chromatin accessibility using a transposase-tagmented library. ATAC-seq is useful for mapping expressed regions such as transcription start sites and transcribed sequences. | Produces genome-wide accessibility information that can be used to infer transcriptional activity. Useful for understanding global regulatory responses under varied conditions and has been used in mosquitoes. Low tissue requirements and rapid/simple transposase library construction. | High mitochondrial background. Random adapter incorporation results in a fraction of incompatible ends for fragment amplification. Loss of information at transcription factor binding regions due to cleavage. | 150bp |
2.1. Comparisons of PacBio and Oxford Nanopore for genome assembly
Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) are two major platforms that offer single-molecule long-read sequencing, which is also known as Third-Generation Sequencing (TGS). Direct sequencing of template molecules without PCR amplification is possible using TGS technologies, which facilitates longer read length. The first TGS platform to become widely popular was the single-molecule real-time (SMRT) sequencing technology from PacBio [9]. ONT is a more recent TGS technology that determines the sequence of a single strand of DNA as it passes through a protein nanopore embedded in a membrane [10]. Different bases in a DNA strand cause unique changes in the current as the DNA passes through, generating a sequencing signal without the need for a DNA polymerase. Although both platforms suffer from significantly higher error rates than NGS, both PacBio and ONT can produce reads that are tens of kb in length. In theory, since strands of DNA are sequenced by simply passing through a protein nanopore, ONT read length is only limited by the length of the DNA template. In practice, many researchers have generated ultra-long reads, hundreds of kb in size, with the longest ONT read reported at >2Mb [11]. These long and ultra-long reads help produce more contiguous genome assemblies [12].
PacBio sequencing has been used to produce reference-quality genome assemblies for Aedes aegypti [4], Anopheles funestus [5], and Anopheles coluzzii [6], and Aedes albopictus [7]. Notably, PacBio was used to produce the previously mentioned AaegL5 reference assembly. The AaegL5 assembly sets a new standard for vector genome assembly, overcoming the large genome size and high repeat content to produce chromosome-length scaffolds. PacBio was also used to generate sufficient sequence data from a single An. coluzzii mosquito to obtain a genome assembly. Recent improvements to PacBio’s Circular Consensus Sequencing (CCS) involves sequencing the same molecule through repeated passes to produce consensus HiFi reads that have high per-base accuracy [13]. This improvement allows users to have high read accuracy (up to 99.8%) while maintaining a relatively long average read length (13.5 kbps) which improves accuracy and continuity of genome assemblies [14]. However, longer reads are still needed to span long stretches of repeats [15].
Recently, a reference-quality Anopheles albimanus assembly was produced by ONT sequencing [8]. This assembly includes much of the genomic ‘dark matter’ such as the repeat-rich centromeres, rDNA clusters and a novel telomeric sequence assembled at the ends of every chromosome. ONT offers multiple sequencing platforms (Flongle, MinION, GridION, PromethION) which allows for simpler scalability and makes it accessible to a broad range of laboratories beyond core facilities [16]. The affordability and portability of some ONT platforms are attractive features when considering future vector sequencing projects.
2.2. Examples of long read assembly algorithms
Error-prone long reads are difficult to assemble using algorithms described for high-accuracy NGS-based sequences. To overcome this, researchers developed new assembly algorithms (Canu and FALCON) that implement an overlap-based read correction step prior to assembly, which resulted in highly accurate and contiguous genome reconstruction [17, 18]. These programs tend to be time-consuming and computationally demanding. Several new software packages have been developed to overcome such limitations. For example, a genome assembler named Flye, which is based on a modified de Brujin graph assembly, shows better performance times while maintaining comparable assembly quality [19]. Although assemblies produced using only long TGS reads are highly contiguous and overcome major challenges posed by repeat-rich regions, they may still contain errors that can hinder further research efforts [20]. For this reason, high-accuracy short reads are often included for additional error-correction or polishing [21]. Other assemblers not mentioned in this review have been evaluated elsewhere [22].
2.3. Advancements in genome scaffolding technologies and algorithms
Although long read sequences have significantly improved contig sizes, other technologies are still needed to join these contigs into longer scaffolds. Recent publications of several reference-quality arthropod vector assemblies demonstrated the value of high-throughput chromatin conformation capture (Hi-C) sequencing and Bionano Optical Mapping for anchoring contigs onto chromosome-length scaffolds [4, 5, 7, 8, 23]. Hi-C measures the frequency of within-nuclei-contact between two sequenced loci. Thus, it can be used to understand nuclear organization and delineate the relative proximity between sequences to build chromosome-length scaffolds [24]. Hi-C ligation sequencing data have been used to generate chromosome-length scaffolds in mosquitoes. Hi-C data was first used to link together the short-read contigs from previous short-read assemblies to generate chromosome-length scaffolds for the genomes of Aedes aegypti and Culex quinqefasciatus to produce new reference assemblies (AagL4 and CpipJ3, respectively) [23]. Hi-C data have also been used to scaffold contigs from PacBio long reads for Anopheles funestus (AfunF3) [5], Aedes aegypti (AaegL5) [4], and Aedes albopictus (AalbF2) [7]. The Hi-C assembled AaegL5 genome was validated, specifically in the highly repetitive M locus, by the use of Bionano optical mapping to further resolve its structure and identified several gaps [4]. Bionano optical mapping, another genome scaffolding method, utilizes either a restriction enzyme, or more recently, a nicking enzyme, to label molecules of DNA that are optically detected and aligned to produce large genomic maps. These maps then anchor in silico-digested reference sequences to produce highly contiguous ‘superscaffolds’ and can identify large structural variations and assist in genome assembly phasing. When assembling the Anopheles albimanus genome, Bionano was used as the primary scaffolding method, which was supplemented by Hi-C in repeat-rich regions that are void of Bionano labels [8]. Prior to Hi-C and Bionano, chromosomal in situ hybridization of DNA probes was and continues to be used to order, orient, correct, or validate genomic scaffolds for multiple Anopheles species [25–27] and Aedes aegypti [4]. In addition, gene synteny-based computational methods allowed researchers to perform evolutionary superscaffolding of 21 Anopheles mosquito assemblies to produce new consensus sets of scaffold adjacencies [26].
2.4. Towards resolving haplotypes and achieving haploid assemblies
High rates of polymorphism in vector species, which can be as high as 15 times that of vertebrate genomes can greatly complicate assembly efforts [28]. Additionally, the small physical size of many vector species limit the amount of genomic DNA that can be harvested from an individual, which necessitates the use of multiple individuals in order to obtain enough DNA for long-read sequencing [28]. Assembling complex diploid genomes with high allelic heterozygosity can result in artificially duplicated haplotigs. To overcome these obstacles, researchers often employ one or more strategies that include using inbreeding to limit genetic diversity, using a modified ‘haploidify’ algorithm to ignore heterozygous positions during assembly [2], or using methods to purge haplotigs based on coverage information [7, 29]. As mentioned earlier, a recent modification to PacBio library preparations enabled the sequencing and assembly of the genome of An. coluzzii using a single mosquito [6]. Some protocols for ONT suggest library preparation may require less than 500ng of DNA, which could also enable sequencing of individuals.
All current vector assemblies are not true haploid assemblies as the scaffolds often represent a mosaic of fragments derived from a mixture of maternal and paternal chromosomes. A few methods have been used to produce true haploid genome assemblies for diploid organisms. One such method uses trio binning, which separates paternal and maternal reads of the offspring using parental reads. Two haploid assemblies are produced use the separated reads: a paternally derived assembly and a maternally derived assembly. These haploid genome assemblies are so far the most accurate reflection of the diploid genome. Trio binning has recently been combined in conjunction with the Canu assembler (Trio-Canu) to separate long reads and assemble highly contiguous maternal and paternal haploid genomes for a cattle and a cattle-yak hybrid [30, 31].
3. Beyond genome assembly
3.1. Long read sequencing enables isoform annotation and the detection of modified bases
Long read sequencing platforms such as ONT and PacBio are also being used for applications other than genome assembly. PacBio’s Isoform Sequencing (IsoSeq) and ONT’s cDNA and direct-RNA technologies can sequence full-length transcripts, and enable the discovery of novel gene products, isoforms, and noncoding RNAs. PacBio’s IsoSeq has already used in Anopheles stephensi providing updates to nearly 5,000 gene models, more than 1,700 alternative splice isoforms, and discovering six trans-splicing events [32]. ONT MinION cDNA sequencing coupled with multiplexed PCR enrichment was used to detect the Zika virus in the field, which represents a simple and cost-effective method of viral pathogen surveillance [33]. Since the ONT platforms directly sequence extracted DNA or RNA, they can also detect modified bases (e.g., N6-methyladenosine in RNA and N6-methylcytosine at CpG sites in DNA) that are useful for epigenetic studies [12]. Exploiting these characteristics of ONT sequencing, researchers have already created methylomes of murine embryonic placenta [34].
3.2. Improving genome annotation using proteomics
Using RNA-sequencing to assemble the transcriptome and annotate genes has been the gold standard for providing high-quality annotations of genomes. With the growing accessibility of protein sequencing (mass spectrometry), deep proteome profiling and untargeted proteome-focused genome annotation can help resolve small open reading frames or protein isoforms. This complementary approach often corrects gene models, identifies novel peptides, and provides annotations for genes missed by other methods. Recently, using a combination of proteomic and transcriptomic data, improvements were made to the gene annotation of two An. stephensi assemblies (AsteI2 and AsteS1) [35].
3.3. 3D Genome architecture and chromosomal territories
The nuclear architecture includes several aspects: chromosomal territories, intra-/inter-chromosomal contacts, and attachments of chromosomes with the nuclear envelope [36, 37]. Microscopy studies revealed that chromosomes are organized into non-randomly positioned territories; the relative positioning of the territories with respect to the nuclear periphery is important for gene regulation [37]. In Drosophila, down-regulated genes of interphase chromosomes [38] and gene-poor heterochromatic regions of polytene chromosomes predominantly occupy the nuclear periphery [39, 40]. However, similar studies on disease vectors are virtually lacking. One study identified a correlation between reorganization in the 3D position and expression of three genes, actin, ferritin, and Hsp 70, in live intact Biomphalaria glabrata snails upon exposure to Schistosoma mansoni miracidia, a parasitic worm that causes intestinal schistosomiasis (Bilharzia) in humans and other mammals [41]. A recent study employed multicolored oligopaints to quantitatively study chromosome territories in salivary gland cells and ovarian nurse cells of malaria mosquitoes, Anopheles gambiae, An. coluzzii, and An. merus. The study found that the percentage of chromosome regions located at the nuclear periphery was typically higher, while the number of inter-chromosomal contacts was lower, in salivary gland cells than in ovarian nurse cells demonstrating a significant inverse relationship between these two features of the nuclear architecture [42]. In addition to microscopy, cross-linking Hi-C experiments [43–46] have been used to infer intra- and inter- chromosomal interactions in cell nuclei. Hi-C studies have led to the identification of topologically associating domains (TADs), chromatin structures that favor short-range interactions [47]. TADs are considered as structural and functional units of the genome as they are delimited by sharp boundaries [48]. Identification and characterization of TADs in Drosophila [45, 49] suggests that similar structures will be found in other insects, including disease vectors.
3.4. Identification of chromosomal rearrangement
Inversions are the most common chromosomal rearrangements in malaria mosquitoes. The polytene chromosome complement of Anopheles females consists of the X chromosome and four autosomal arms: 2R, 2L, 3R, and 3L. An. arabiensis, An. coluzzii and An. gambiae, the three species that have a continent-wide distribution in arid sub-Saharan Africa, are the only species within the An. gambiae complex that have a highly polymorphic chromosome 2. Mosquito species with little or no chromosomal polymorphisms tend to occupy smaller and wetter geographic regions [50, 51]. A recent study applied a Hi-C-based sequencing approach to molecularly map the breakpoints of inversions 2Rbc and 2Rd in An. coluzzii and found that two breakpoints are reused in two separate inversions [52]. It is possible to apply this approach to discover inversions and map the breakpoints in other disease vectors, such as those lacking well-developed polytene chromosomes. Another study used 10x Genomics linked reads to detect structural variants in Ae. aegypti. Linked reads obtained from two individuals (one male and one female) was used to detect more than 100 insertion/deletion mutations and 29 inversions and translocations [4]. Since 10x Genomics linked-read sequencing requires minimal starting material and can produce long phasing information, it will also be a useful approach for detecting long-range variants in other vector species. Taking advantage of the long read length, ONT sequencing was used to detect well-characterized inversions, deletions, and translocations involved in pancreatic cancer [53], demonstrating the potential of TGS as another option for structural variant detection.
3.5. Chromatin accessibility and DNA binding
Several techniques are available to capture protein-DNA interactions and chromatin conformation and accessibility at a whole-genome scale, which allows for useful inferences regarding transcription. One such technology known as Formaldehyde-Assisted Isolation of Regulatory Elements followed by sequencing (FAIRE-seq) selectively enriches for open chromatin that are unbound by nucleosomes [54]. FAIRE-seq recently facilitated the discovery of putative Ae. aegypti cis-regulatory elements that drive tissue-specific gene expression, a much-needed genetic resource for non-model vector species [55]. Chromatin Immunoprecipitation followed by sequencing (ChIP-seq) is another technology that selectively enriches for protein-bound DNA fragments for sequencing. The protein for analysis may be a specific transcription factor or a modified histone as long as a specific antibody is available. ChIP-seq is used in a genome-wide analysis of histone modifications and chromatin changes in response to Plasmodium infection in Anopheles gambiae [56]. Another exciting technology called Assay Transposase-Accessible Chromatin followed by sequencing (ATAC-seq), similar to FAIRE-seq, assesses chromatin accessibility and nucleosome positioning [57]. Recently, it was used to assist annotation of the AaegL5 assembly, to map open chromatin and regions near transcription start sites [4].
3.6. Vector biology at the single cell
Advancements in microfluidics technologies has led to development of sequencing strategies to resolve gene expression profile at the resolution of a single cell. These technologies are enabling the identification of distinct cell types [58] and are powerful tools to study cellular variation and development [59]. Single-cell (sc)RNA-seq has enabled analysis of the transcriptional programming of blood cells from Anopheles gambiae hemolymph [60]. The development of gel bead emulsions for scalable massively paralleled transcriptional profiling [61] and release of the 10x Genomics Chromium single cell platform improved performance of single cell technologies [62]. These advancements have already aided in a recent investigation of the transcriptional kinetics during human embryonic development [63]. Single-cell RNA-sequencing will be a powerful tool for studying development of various arthropod vector species and for understanding the intricacies of cellular responses to pathogens.
3.7. Targeted sequencing approaches
Vector genomes such as that of Aedes aegypti are highly repetitive and are gigabases in size. Sequencing targeted regions of the genome can be more cost-effective and efficient than WGS approaches for identifying relevant genomic markers. One approach known as Exome-seq employs high-throughput sequencing of enriched protein-coding exon libraries, which has been used to map a locus responsible for Brugia malayi resistance in a Kenyan Ae. aegypti population [64]. It was also used to identify differences between populations of Ae. aegypti [65]. Another targeted sequencing approach makes use of the RNA-directed Cas9 endonuclease and ONT sequencing to target regions between selected Cas9 cut sites [66]. This technology has been used in conjunction with Bionano mapping to detect large structural variations and represents a cost-effective method because it does not require whole genome sequencing [67]. This method has the potential to be adopted to surveil identified structural variations relevant to vector capacity such as the Anopheles gambiae 2La inversion [68] or to close assembly gaps such as those identified in the M Locus of Aedes aegypti [4].
3.8. Population genomics
Population genomics serves as a reliable approach to understand the variations within and between populations of a given species, and, in the context of disease vectors, can provide insights into traits such as vectoral capacity. When combined with epidemiological reports and geographical context, population genomic analyses based on high-throughput SNP-calling can be used to infer histories of migration of vector species and the introduction of vector-borne diseases. Recent examples include the study of the worldwide voyages of Ae. aegypti from Africa [69] and geo-genetic mapping of An. gambiae populations originating from West Africa [70]. In addition, studies of regional population structure can help predict future migration patterns and inform effective control measures. Illumina sequencing of 39 individuals collected from various Ae. aegypti populations in central California revealed three introductions from divergent source populations [71]. The availability of the Ae. aegypti AaegL5 assembly has greatly facilitated efforts to map quantitative trait loci (QTL) [4]. For example, a restriction associated DNA sequencing (RAD-seq) analysis identified two QTLs on Chromosome 2 that are associated with dissemination of dengue virus from the mosquito midgut. Analysis of SNPs in a population of coexisting pyrethroid resistance and susceptibility revealed three synonymous mutations in a voltage-gated sodium channel known to be a target of pyrethroids. These analyses further demonstrate the value of accurate assemblies in improving the accuracy and resolution of QTL analysis.
In 2017, the first phase of the An. gambiae 1000 Genome Project (Ag1000G) was published [3]. Whole-genome Illumina sequencing of 765 wild specimens of An. gambiae and An. coluzzii from 15 different locations identified more than 50 million SNPs. The discovery of high genetic diversity, a polymorphic allele every 2.2 bases of the accessible genome, implies that insecticide-resistance alleles may naturally exist in Anopheles and that mosquito populations can quickly adapt to new insecticides. The study also explored the accessible coding genome for CRISPR/Cas9 target sites and found targets in 11,625 of 12,502 protein-coding genes. However, only 5,474 genes retained at least one target after excluding sites with nucleotide variation in any of the 765 genomes. This knowledge will inform future genetic control strategies, such as RNA-directed Cas9 gene drives. In addition, understanding local population structures in isolated regions will help identify possible field-testing sites for control releases [72].
4. Future Perspectives
Here we have reviewed recent progress in vector-omics and highlighted several technologies that we believe will contribute to a greater understanding of the basic biology of disease vectors and facilitate the developments of novel vector control strategies. We anticipate that the quantum leap in contiguity, accuracy, and completeness of the Ae. aegypti genome assembly [4] will be extended to a large number of vector species, well beyond the few intensively studied mosquito species. Such a qualitative leap and quantitative expansion of genome assemblies will strengthen and broaden the genomic foundation on which vector biology research will thrive. These improvements will also dramatically increase the accessibility of heterochromatic or repeat-rich regions including the male-determining locus, Y chromosomes, centromeres, and telomeres, which in turn will shed light on the function and evolution of some of the “dark matters” of the vector genomes. We also anticipate a broader application of temporal- and tissue-specific RNAseq, long-read/full-length cDNA/RNA-seq, and whole-genome chromatin conformation analysis. These datasets will facilitate gene expression profile analysis and significantly improve annotation especially regarding isoforms, long non-coding RNAs, small regulatory RNAs, and other novel RNA species. The increased accessibility of heterochromatic sequences and availability of methods to systematically study modifications to DNA, RNA, and histones will also facilitate epigenetic investigations, which are largely missing in vector-omics. Proteomics, which are important to study protein isoforms and post-translational modifications, may witness broader applications as the cost decreases and sensitivity increases. Similarly, single-cell RNAseq, a technique that affords single-cell resolution in analysis of vector-pathogen interactions, will see a wider adoption when the cost further reduces, and protocols accumulate for various tissue types and vector species. The tremendous payoff of the Ag1000G project [3] has inspired and will continue to inspire future population genomics studies of vectors, which will provide the critical link between laboratory-based genomic research and the natural populations. Finally, it is our hope that the continued innovation in device portability, cost reduction, and informatics support will in the foreseeable future bring genomics to every vector laboratory and field station in the world, which will have an unparalleled impact on basic research and the control of vector-borne infectious diseases.
Table 1.
Vector Genome Assemblies and Improvements since 2016.
Species | Assembly | Genome Size (Gbps) | Identifying characteristics and technologies used | Contig N50 (kbps) | Scaffold N50 (Mbps) | Publication DOI | GeneBank assembly accession/Biosample submission |
---|---|---|---|---|---|---|---|
Aedes aegypti | AaegL5 | 1.28 | Consortium effort assembly using a combination of PacBio, HiC, and Illuimina sequencing technologies and Bionano optical mapping to resolve the highly repetitive M Locus. 10X Genomics linked long reads were used for variant detection and ATAC-seq used to map chromatin accessibility. | 11.758 Mbps | 410 | 10.1038/s41586-018-0692-z | GCA_002204515.1 |
Aedes aegypti (Improvement) | AaegL4 | 1.15 | Scaffolded AegL2 contigs using HiC to improve the continuity and accuracy of the existing assembly generated with Illumina. | 82.6 | 404 (footnote: consisted of 3 chromosomes and smaller scaffolds N50: 65 kbps) | 10.1126/science.aal3327 | GCA_000004015.2/SAMN06546148 |
Aedes albopictus | AalbF2 | 1.190–1.275 | De novo assembly using PacBio long read sequencing and HiC scaffolding. Purge Haplotigs was used to eliminate redundancy and reduce the draft assembly size by nearly half. | 1.185 Mbps | 55.7 | 10.1101/2020.02.28.969527 | GCA_006496715.1 |
Culex quinquefasciatus (Improvement) | CpipJ3 | 0.54 | Published alongside the AegL4 improvement. The existing CpipJ2 contigs were scaffolded with HiC to improve contiguity and accuracy. | 28.5 | 191(footnote: consisted of 3 chromosomes and smaller scaffolds N50: 45 kbps) | 10.1126/science.aal3327 | GCA_000004015.2/SAMN06546149 |
Anopheles albimanus | AalbS3 | 0.173 | End-to-end de novo assembly using Oxford Nanopore Long Read sequencing and Bionano and HiC for scaffolding. This assembly is comprised of three full-length chromosomes and include previously inaccessible centromeres and rDNA clusters as separate scaffolds and novel telomeric sequences at the ends of every chromosome. | 13.7 Mbps | 51 (footnote: consisted of three full-length chromosomal scaffolds plus rDNA and centromeres) | 10.1101/2020.04.17.047084 | PRJNA622927 |
Anopheles albimanus (Improvement) | AalbS2 | 0.173 | A high-resolution physical map was published for An. albimanus assembly produced as part of the 16 Anopheles Project, correcting misassembles and represented the most complete physical map (98.2%) for any mosquito species. | N/A | 18.1 (physical maps spanned entire chromosome arms) | 10.1534/g3.116.034959, 10.1126/science.1258522 | GCA_000349125.2 |
Anopheles atroparvus (Improvement) | AatrE3 | 0.224 | High coverage physical map of the genome assembly (89.6%) was produced using FISH of probes designed based on the AatrE1 assembly. | N/A | 9.2 | 10.1186/s12864-018-4663-4 10.1126/science.1258522 | GCA_000473505.1 |
Anopheles funestus | AfunF3 | 0.211 | Chromosome-scale assembly produced with PacBio long reads. A deduplication strategy resulted in a haploid genome assembly. | 632 | 93.811 | 10.1093/gigascience/giz063 | RCWQ00000000.1 |
Anopheles coluzzii | AcolN1 | 0.251 | Highly contiguous genome assembly produced with PacBio long reads from a single mosquito (only two haplotypes present) | 3.47 Mbps | N/A | 10.3390/genes10010062 | GCA_004136515.2 |
Ixodes scapularis | ISE6 | 2.8 | ISE6 cell line commonly used for clinical testing was assembled using PacBio long read sequencing. | 269 | N/A | 10.12688/f1000research.13635.1 | GCA_002892825.1 |
Ixodes scapularis | IscaW1 | 1.8 | The very first tick vector genome assembly produced using Sanger sequencing. | 2.9 | 0.051 | 10.1038/ncomms10507 | ABJB010000000 |
Highlights.
New sequencing and scaffolding technologies have facilitated the assembly of highly contiguous chromosome-length vector reference genomes.
Annotations of vector genomes are improved by full-length RNAseq, chromatin accessibility assays, expansive transcriptome sequencing, and proteomics.
These new advancements facilitate novel biological insights through the discovery of previously inaccessible genetic elements and provide a solid foundation for population genomics.
Novel sequencing platforms present exciting and affordable opportunities for on-site field studies and open new avenues for epigenetic analysis.
Acknowledgement
This work is supported by NIH grants AI123338, AI121284, AI133571, AI099528, AI135298, and the Virginia Agriculture Experimental Station. AC is supported by a fellowship from the Robert Wood Johnson Foundation.
References
- 1.Holt RA, et al. , The Genome Sequence of the Malaria Mosquito Anopheles gambiae. Science, 2002. 298(5591): p. 129–149. [DOI] [PubMed] [Google Scholar]
- 2.Neafsey DE, et al. , Mosquito genomics. Highly evolvable malaria vectors: the genomes of 16 Anopheles mosquitoes. Science, 2015. 347(6217): p. 1258522. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Anopheles gambiae Genomes C, et al. , Genetic diversity of the African malaria vector Anopheles gambiae. Nature, 2017. 552(7683): p. 96–100. [DOI] [PMC free article] [PubMed] [Google Scholar]; The first large-scale study of genomic variation in the major African malaria vector determined unprecedented levels of nucleotide diversity in natural populations. Dramatic historical fluctuations in effective population size have been inferred from the genomes of individual mosquitoes. High levels of polymorphism imply, as the authors suggest, that insecticide-resistance alleles may pre-exist in natural populations and there are also higher than expected likelihood for resistance to CRISPR/cas9-mediated genome editing.
- 4.Matthews BJ, et al. , Improved reference genome of Aedes aegypti informs arbovirus vector control. Nature, 2018. 563(7732): p. 501–507. [DOI] [PMC free article] [PubMed] [Google Scholar]; Overcoming the challenges of a large genome size and high repeat content, the highly contiguous chromosome-scale assembly was generated by integrating PacBio long reads, Illumina short reads, and Hi-C ligation sequencing. The authors also resolved the structure of the male-determining locus, first for any insect. The authors provided a comprehensive genome annotation combining extensive RNA-seq and ATAC-seq. Genome-wide analysis uncovered structural variations and several QTLs relevant to vector competence. This paper serves as a gold standard for future vector genome projects and an invaluable resource to facilitate the development of novel mosquito control strategies.
- 5.Ghurye J, et al. , A chromosome-scale assembly of the major African malaria vector Anopheles funestus. Gigascience, 2019. 8(6). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Kingan SB, et al. , A High-Quality De novo Genome Assembly from a Single Mosquito Using PacBio Sequencing. Genes (Basel), 2019. 10(1). [DOI] [PMC free article] [PubMed] [Google Scholar]; Assembling complex diploid genomes with high allelic heterozygosity can result in artificially duplicated haplotigs. To overcome these obstacles, the authors modified PacBio library preparations which enabled the sequencing and assembly of the genome of An. coluzzii using a single mosquito.
- 7.Palatini U, et al. , Improved reference genome of the arboviral vector Aedes albopictus. bioRxiv, 2020: p. 2020.02.28.969527. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Compton A, et al. , The beginning of the end: a chromosomal assembly of the New World malaria mosquito ends with a novel telomere. bioRxiv, 2020: p. 2020.04.17.047084. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Eid J, et al. , Real-time DNA sequencing from single polymerase molecules. Science, 2009. 323(5910): p. 133–8. [DOI] [PubMed] [Google Scholar]
- 10.Jain M, et al. , The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol, 2016. 17(1): p. 239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Payne A, et al. , BulkVis: a graphical viewer for Oxford nanopore bulk FAST5 files. Bioinformatics, 2019. 35(13): p. 2193–2198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Lu H, Giordano F, and Ning Z, Oxford Nanopore MinION Sequencing and Genome Assembly. Genomics Proteomics Bioinformatics, 2016. 14(5): p. 265–279. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Wenger AM, et al. , Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nature Biotechnology, 2019. 37(10): p. 1155–1162. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Nurk S, et al. , HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. bioRxiv, 2020: p. 2020.03.14.992248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Lang D, et al. , Comparison of the two up-to-date sequencing technologies for genome assembly: HiFi reads of Pacbio Sequel II system and ultralong reads of Oxford Nanopore. bioRxiv, 2020: p. 2020.02.13.948489. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Lee YG, et al. , Constructing a Reference Genome in a Single Lab: The Possibility to Use Oxford Nanopore Technology. Plants (Basel), 2019. 8(8). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Koren S, et al. , Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Research, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Chin C-S, et al. , Phased diploid genome assembly with single-molecule real-time sequencing. Nature Methods, 2016. 13(12): p. 1050–1054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Kolmogorov M, et al. , Assembly of long, error-prone reads using repeat graphs. Nature Biotechnology, 2019. 37(5): p. 540–546. [DOI] [PubMed] [Google Scholar]
- 20.Watson M and Warr A, Errors in long-read assemblies can critically affect protein prediction. Nature Biotechnology, 2019. 37(2): p. 124–126. [DOI] [PubMed] [Google Scholar]
- 21.Walker BJ, et al. , Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement. PLOS ONE, 2014. 9(11): p. e112963. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Jayakumar V and Sakakibara Y, Comprehensive evaluation of non-hybrid genome assembly tools for third-generation PacBio long-read sequence data. Briefings in Bioinformatics, 2019. 20(3): p. 866–876. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Dudchenko O, et al. , De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science, 2017. 356(6333): p. 92–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Burton JN, et al. , Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nature Biotechnology, 2013. 31: p. 1119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Artemov GN, et al. , The Physical Genome Mapping of Anopheles albimanus Corrected Scaffold Misassemblies and Identified Interarm Rearrangements in Genus Anopheles. G3 (Bethesda), 2017. 7(1): p. 155–164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Waterhouse RM, et al. , Evolutionary superscaffolding and chromosome anchoring to improve Anopheles genome assemblies. BMC Biology, 2020. 18(1): p. 1. [DOI] [PMC free article] [PubMed] [Google Scholar]; The study presents complementary approaches to improving contiguity of genome assemblies based on evolutionary superscaffolding and chromosome anchoring. Gene synteny-based computational methods allowed researchers to perform evolutionary superscaffolding of 21 Anopheles mosquito assemblies to produce new consensus sets of scaffold adjacencies. In addition, chromosome anchoring was done for selected mosquito species using physical mapping of DNA probes.
- 27.Artemov GN, et al. , Partial-arm translocations in evolution of malaria mosquitoes revealed by high-coverage physical mapping of the Anopheles atroparvus genome. BMC Genomics, 2018. 19(1): p. 278. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Richards S, Arthropod Genome Sequencing and Assembly Strategies, in Insect Genomics: Methods and Protocols, Brown SJ and Pfrender ME, Editors. 2019, Springer New York: New York, NY. p. 1–14. [DOI] [PubMed] [Google Scholar]
- 29.Roach MJ, Schmidt SA, and Borneman AR, Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies. BMC Bioinformatics, 2018. 19(1): p. 460. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Koren S, et al. , De novo assembly of haplotype-resolved genomes with trio binning. Nature Biotechnology, 2018. 36(12): p. 1174–1182. [DOI] [PMC free article] [PubMed] [Google Scholar]; Authors used short but accurate Illumina reads from both parents to separate the long PacBio reads from an F1 hybrid into paternal and maternal reads. They used this to produce reference-quality haplotype genome assemblies for both the paternal and maternal breed.
- 31.Rice ES, et al. , Chromosome-length haplotigs for yak and cattle from trio binning assembly of an F1 hybrid. bioRxiv, 2019: p. 737171. [Google Scholar]
- 32.Jiang X, et al. , Single molecule RNA sequencing uncovers trans-splicing and improves annotations in Anopheles stephensi. Insect Molecular Biology, 2017. 26(3): p. 298–307. [DOI] [PMC free article] [PubMed] [Google Scholar]; Single molecule long reads were applied to obtain full-length transcript sequences and improve genome annotation. The availability of full-length transcripts helped identify splice isoforms and reveal an interesting trans-splicing phenomenon.
- 33.Quick J, et al. , Multiplex PCR method for MinION and Illumina sequencing of Zika and other virus genomes directly from clinical samples. Nature Protocols, 2017. 12: p. 1261. [DOI] [PMC free article] [PubMed] [Google Scholar]; Authors provide a streamlined protocol for rapid identification of viral genomes using a portable MinION sequencer. The authors highlight how these sensitive and cost-effective methods could be adopted for on-site disease surveillance and clinical diagnosis.
- 34.Gigante S, et al. , Using long-read sequencing to detect imprinted DNA methylation. Nucleic Acids Research, 2019. 47(8): p. e46–e46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Prasad TSK, et al. , Integrating transcriptomic and proteomic data for accurate assembly and annotation of genomes. Genome Research, 2017. 27(1): p. 133–144. [DOI] [PMC free article] [PubMed] [Google Scholar]; The authors integrate proteomic and RNAseq data to improve genome annotation.They illuminate the potential of combining these complementary omics tools to correct gene models and append genes missed completely due to incomplete assembly or erroneous predictions.
- 36.Deng W and Blobel GA, Manipulating nuclear architecture. Curr Opin Genet Dev, 2014. 25: p. 1–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Cremer T, Cremer C, and Lichter P, Recollections of a scientific journey published in human genetics: from chromosome territories to interphase cytogenetics and comparative genome hybridization. Hum Genet, 2014. 133(4): p. 403–16. [DOI] [PubMed] [Google Scholar]
- 38.Pickersgill H, et al. , Characterization of the Drosophila melanogaster genome at the nuclear lamina. Nat Genet, 2006. 38(9): p. 1005–14. [DOI] [PubMed] [Google Scholar]
- 39.Hochstrasser M, et al. , Spatial organization of chromosomes in the salivary gland nuclei of Drosophila melanogaster. J Cell Biol, 1986. 102(1): p. 112–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Kinney NA, Sharakhov IV, and Onufriev AV, Investigation of the chromosome regions with significant affinity for the nuclear envelope in fruit fly--a model based approach. PLoS One, 2014. 9(3): p. e91943. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Arican-Goktas HD, et al. , Differential spatial repositioning of activated genes in Biomphalaria glabrata snails infected with Schistosoma mansoni. PLoS Negl Trop Dis, 2014. 8(9): p. e3013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.George P, et al. , Three-dimensional Organization of Polytene Chromosomes in Somatic and Germline Tissues of Malaria Mosquitoes. Cells, 2020. 9(2): p. 339. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Lieberman-Aiden E, et al. , Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 2009. 326(5950): p. 289–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Sexton T, et al. , Three-dimensional folding and functional organization principles of the Drosophila genome. Cell, 2012. 148(3): p. 458–72. [DOI] [PubMed] [Google Scholar]
- 45.Ulianov SV, et al. , Active chromatin and transcription play a key role in chromosome partitioning into topologically associating domains. Genome Res, 2016. 26(1): p. 70–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Battulin N, et al. , Comparison of the three-dimensional organization of sperm and fibroblast genomes using the Hi-C approach. Genome Biol, 2015. 16: p. 77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Dixon JR, et al. , Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature, 2012. 485(7398): p. 376–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Ong CT and Corces VG, CTCF: an architectural protein bridging genome topology and function. Nat Rev Genet, 2014. 15(4): p. 234–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Eagen KP, Hartl TA, and Kornberg RD, Stable Chromosome Condensation Revealed by Chromosome Conformation Capture. Cell, 2015. 163(4): p. 934–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Coluzzi M, et al. , A polytene chromosome analysis of the Anopheles gambiae species complex. Science, 2002. 298(5597): p. 1415–8. [DOI] [PubMed] [Google Scholar]
- 51.Reidenbach KR, et al. , Cuticular differences associated with aridity acclimation in African malaria vectors carrying alternative arrangements of inversion 2La. Parasit Vectors, 2014. 7: p. 176. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Corbett-Detig RB, et al. , Fine-Mapping Complex Inversion Breakpoints and Investigating Somatic Pairing in the Anopheles gambiae Species Complex Using Proximity-Ligation Sequencing. Genetics, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Norris AL, et al. , Nanopore sequencing detects structural variants in cancer. Cancer Biology & Therapy, 2016. 17(3): p. 246–253. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Giresi PG, et al. , FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active regulatory elements from human chromatin. Genome research, 2007. 17(6): p. 877–885. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Mysore K, Li P, and Duman-Scheel M, Identification of Aedes aegypti cis-regulatory elements that promote gene expression in olfactory receptor neurons of distantly related dipteran insects. Parasites & Vectors, 2018. 11(1): p. 406. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Ruiz JL, et al. , Chromatin changes in Anopheles gambiae induced by Plasmodium falciparum infection. Epigenetics & Chromatin, 2019. 12(1): p. 5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Buenrostro JD, et al. , Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nature Methods, 2013. 10(12): p. 1213–1218. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Buettner F, et al. , Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nature Biotechnology, 2015. 33: p. 155. [DOI] [PubMed] [Google Scholar]
- 59.Svensson V, et al. , Power analysis of single-cell RNA-sequencing experiments. Nature Methods, 2017. 14: p. 381. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Severo MS, et al. , Unbiased classification of mosquito blood cells by single-cell genomics and high-content imaging. Proceedings of the National Academy of Sciences, 2018. 115(32): p. E7568–E7577. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Zheng GXY, et al. , Massively parallel digital transcriptional profiling of single cells. Nature Communications, 2017. 8(1): p. 14049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Liu S and Trapnell C, Single-cell transcriptome sequencing: recent advances and remaining challenges [version 1; peer review: 2 approved]. F1000Research, 2016. 5(182). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.La Manno G, et al. , RNA velocity of single cells. Nature, 2018. 560(7719): p. 494–498. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Juneja P, et al. , Exome and Transcriptome Sequencing of Aedes aegypti Identifies a Locus That Confers Resistance to Brugia malayi and Alters the Immune Response. PLOS Pathogens, 2015. 11(3): p. e1004765. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Dickson LB, et al. , Exon-Enriched Libraries Reveal Large Genic Differences Between Aedes aegypti from Senegal, West Africa, and Populations Outside Africa. G3 (Bethesda, Md.), 2017. 7(2): p. 571–582. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Gabrieli T, et al. , Selective nanopore sequencing of human BRCA1 by Cas9-assisted targeting of chromosome segments (CATCH). Nucleic Acids Research, 2018. 46(14): p. e87–e87. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Gabrieli T, et al. , Cas9-Assisted Targeting of CHromosome segments (CATCH) for targeted nanopore sequencing and optical genome mapping. bioRxiv, 2017: p. 110163. [Google Scholar]
- 68.Sharakhov IV, et al. , Breakpoint structure reveals the unique origin of an interspecific chromosomal inversion (2La) in the Anopheles gambiae complex. Proceedings of the National Academy of Sciences, 2006. 103(16): p. 6258–6262. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Powell JR, Gloria-Soria A, and Kotsakiozi P, Recent History of Aedes aegypti: Vector Genomics and Epidemiology Records. BioScience, 2018. 68(11): p. 854–860. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Schmidt H, et al. , Transcontinental dispersal of Anopheles gambiae occurred from West African origin via serial founder events. Communications biology, 2019. 2: p. 473–473. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Lee Y, et al. , Genome-wide divergence among invasive populations of Aedes aegypti in California. BMC Genomics, 2019. 20(1): p. 204. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Bergey CM, et al. , Assessing connectivity despite high diversity in island populations of the malaria mosquito Anopheles gambiae. bioRxiv, 2018: p. 430702. [DOI] [PMC free article] [PubMed] [Google Scholar]