Skip to main content
G3: Genes | Genomes | Genetics logoLink to G3: Genes | Genomes | Genetics
. 2024 Dec 24;15(2):jkae264. doi: 10.1093/g3journal/jkae264

The genome of the solitary bee Tetrapedia diversipes (Hymenoptera, Apidae)

Priscila K F Santos 1,2,, Natalia de Souza Araujo 3, Elaine Françoso 4, John H Werren 5, Karen M Kapheim 6, Maria Cristina Arias 7,2
Editor: K Vogel
PMCID: PMC11797046  PMID: 39718247

Abstract

Tetrapedia diversipes is a Neotropical solitary bee commonly found in trap-nests, known for its morphological adaptations for floral oil collection and prepupal diapause during the cold and dry season. Here, we present the genome assembly of T. diversipes (332 Mbp), comprising 2,575 scaffolds, with 15,028 predicted protein-coding genes. Repetitive elements constitute 38.68% of the genome, notably Class II transposable elements. An investigation into lateral gene transfers identified a low frequency (0.037%) of nuclear copies of mitochondrial DNA and 18 candidate regions from bacterial origins. Furthermore, the annotation of 3 scaffolds reveals the presence of the Wolbachia endosymbiont genome, confirming the infection by 2 strains in T. diversipes populations. This genome contributes valuable insights into Neotropical bee genomics, offering a resource for comparative studies and enhancing our understanding of the molecular basis of solitary bee adaptations and interactions.

Keywords: Neotropical, genome sequencing, de novo assembly, oil-collecting bee, Tetrapedia diversipes

Introduction

Tetrapedia diversipes is a solitary bee from the Neotropical region. Females construct their nests in preexisting cavities, such as abandoned beetle holes (Alves-dos-Santos et al. 2002). Due to the female's philopatric behavior, the reuse of nests by subsequent generations is common, resulting in a high abundance of this species in trap-nests (Cordeiro et al. 2019; Santos et al. 2020). Consequently, it serves as an excellent biological model due to its facilitated sampling and maintenance.

Both males and females of this species have morphological adaptations for collecting floral oils (Alves-dos-Santos et al. 2002) (Fig. 1). While males likely utilize these oils for mating purposes (Cappellari et al. 2012), females employ them in nest construction by mixing with soil and for provisioning larvae by mixing with pollen (Alves-dos-Santos et al. 2002). In addition to its unique adaptations for floral oil collection, T. diversipes exhibits other remarkable biological traits. This includes entering diapause during the colder and drier months in the subtropical region of Brazil, sharing a common ancestor with the sub-social lineages from Xylocopinae, and engaging in a host–parasite relationship with the cleptoparasitic bee Coelioxoides waltheriae (Alves-dos-Santos et al. 2002; Bossert et al. 2018; Santos et al. 2018).

Fig. 1.

Fig. 1.

Male specimen of T. diversipes collected from trap-nests maintained at the Bee Laboratory (USP, São Paulo, Brazil). Details of the posterior and anterior legs are highlighted at the top and bottom right, respectively. Both legs contain adaptations for oil collection (Alves-dos-Santos et al. 2002). Photograph by Alexandre Zuntini, used with permission.

Despite advances in sequencing technologies and the growing number of genomes from nonmodel organisms in databases, bee species from the Neotropical region remain significantly underrepresented. Although 60% of social bee species are endemic to the Neotropics (Noll et al. 2018), only 8 social bee species (Bombus dahlbomii, Euglossa dilemma, Frieseomelitta varia, Megalopta genalis, Melipona beecheii, Melipona bicolor, Melipona quadrifasciata, and Tetragonisca angustula) and 1 solitary species (Eufriesea mexicana) from this region are included among the 112 bee genomes available on GenBank (accessed on 2024 September 20).

Here, we present the genome of T. diversipes and its key characteristics. The genome spans 332 Mbp, distributed across 2,575 scaffolds, with 38.68% consisting of repetitive elements. Evidence suggesting possible lateral gene transfer (LGT) events from mitochondrial DNA and endosymbionts was also detected. A total of 15,028 protein-coding genes were predicted through annotation. This genome will be a valuable resource for future comparative and evolutionary studies in bee research.

Materials and methods

Sample collection and sequencing

Tetrapedia diversipes larvae and adults were collected from trap-nests maintained at the Bee Laboratory [University of São Paulo (USP), São Paulo, Brazil]. For male selection among the larvae, 10 individuals were screened through 16 microsatellite loci following standardized conditions (Arias et al. 2016), being male hemizygotes for all loci. Additionally, the mitochondrial gene cytochrome c oxidase I was sequenced to differentiate T. diversipes larvae from its cleptoparasitic bee, C. waltheriae. For the Illumina sequencing, DNA was isolated from whole larva body following a phenol–chloroform protocol (Bonasio et al. 2010; Standage et al. 2016). DNA quantification was performed using a Qubit 2.0 fluorometer system (Invitrogen), and DNA quality was verified by agarose gel electrophoresis. The samples with the highest DNA integrity and quantity were sent to Macrogen (South Korea) for paired-end library construction and sequencing. For 100-bp short-read sequencing, 3 different libraries were constructed. One male larva was used for the TruSeq DNA PCR-Free (350 bp insert) library, and 2 male larvae were used for each Nextera Mate Pair (Gel-Plus) library of 3 and 8 kb insert sizes. All libraries were sequenced using 1 lane of the Illumina HiSeq2500 platform. For long-read sequencing, DNA was obtained from the entire body, excluding the abdomen, of 2 newborn adult males. The same DNA extraction protocol was used. One 20-kb SMRTbell Templates library was constructed and sequenced in 3 SMRT cells of the Pacbio Sequel system. A summary is presented in Supplementary Table 1.

Data preprocessing and genome assembly

The FASTQC-v0.11.6 software (Andrews 2010) was utilized to check the quality of Illumina sequencing. Illumina reads smaller than 36 bp and with a Phred score below 20 (for shotgun library) or 15 (for mate-pair libraries) were removed using Trimmomatic-0.36 (Bolger et al. 2014). PacBio subreads are output free of adapters and were corrected through overlapping using Canu-v1.8 software (Koren et al. 2017). Short and long reads were combined in a hybrid de novo genome assembly using MaSuRCA-v3.2.6 (Zimin et al. 2013) with default parameters. Genomic statistics (total number of scaffolds, contigs, largest scaffold, N50, total length, and N's per 100 kb) and quality were assessed using Quast-v5.2.0 (Gurevich et al. 2013), and completeness was evaluated using the Benchmarking Universal Single-Copy Orthologs (BUSCO)-v5.3.2 software (Simao et al. 2015). For this, the Hymenoptera database of single-copy orthologs containing 5,991 genes (September 2022) was used.

Prediction and annotation of protein-coding genes

Genome annotation was performed using Maker-v2.31.10 (Holt and Yandell 2011) following the tutorial at http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_WGS_Assembly_and_Annotation_Winter_School_2018 and https://gist.github.com/darencard/bb1001ac1532dd4225b030cf0cd61ce2 (accessed in September 2018). Initially, RepeatMasker-v4.0 (Smit et al. 2015) was used within Maker to mask low complexity and repetitive regions of the genome based on Repbase (Jurka et al. 2005) repeat library from all available species (April 2019). Gene prediction proceeded in 2 rounds. In the first round, “est2genome” and “protein2genome” options were enabled (set to 1), using a combination of the de novo assembled transcriptome of T. diversipes (Santos et al. 2018) and bee protein sequences from the UniRef90 database (April 2019) (Suzek et al. 2007) as input. To build the bee proteins database, we accessed the UniProt website, selected the UniRef database, searched for Apoidea, and downloaded the fasta sequences from the 90% cluster.

In the second round, SNAP (Korf 2004) and Augustus (Stanke and Waack 2003) were trained to refine gene prediction. Gene models predicted in the first round with an annotation edit distance < 0.25 and protein length of 50 or more amino acids served as input to train SNAP. Augustus was trained inside BUSCO-v3.0.2 using the Hymenoptera database (September 2018 containing 4,415 genes) with the “–long” argument to optimize the hidden Markov model search model. After training, Augustus and SNAP were used to predict the gene models, with the “est2genome” and “protein2genome” options set to 0. This training and predicting cycle was repeated 3 times until the best results were observed, indicated by larger average gene size and more predicted proteins annotated with BUSCO. Finally, the functional annotation of T. diversipes’ final gene set was generated by running Blastp (Johnson et al. 2008) using the SwissProt (April 2019) (Bairoch and Apweiler 1996) and InterproScan-v5.34-73.0 (Hunter et al. 2009) databases.

Repetitive elements

To identify transposable elements (TEs) in the genome, we developed a comprehensive custom library for T. diversipes using multiple strategies. Initially, 3 independent repeat libraries were created using RepeatModeler-v.open-1.0.11, TransposonPSI-v2.2.26, and LTRharvest (included in GenomeTools-v1.5.8—Gremme et al. 2013). Libraries from TransposonPTS and LTRdigest were then classified with RepeatClassifier (included in RepeatModeler). All classified libraries were concatenated and then merged based on similarity (≥80% identity) using USEARCH-v11.0.667 (Edgar 2010) to create a nonredundant library, which was classified once again with RepeatClassifier. Finally, the custom library was merged with the Apis mellifera repeat library contained in Dfam-v3.1, available within the RepeatMasker-v4.1.0 software (Smit et al. 2015). A genome search for all TEs was conducted using our merged database with RepeatMasker. Statistical analysis and figures were generated in R-v3.4.3. Detailed parameters and scripts from this pipeline are available at https://github.com/nat2bee/repetitive_elements_pipeline.

Lateral gene transfers

Nuclear copies of mitochondrial DNA (NUMTs) were identified through a BLASTN search of the complete mitochondrial genome (mtGenome) of T. diversipes (Françoso et al. 2020; GenBank accession number: MN732885.1), which was used as the reference sequence against the nuclear genome described here. The search was conducted using Geneious-v2023.0.4 software (https://www.geneious.com) with default parameters and a low complexity filter. The methodology was developed based on Behura (2007) and Nacer and Raposo do Amaral (2017). NUMTs candidates were manually inspected. Only hits with an expect value equal to or smaller than 10−5 and length sequences over 300 bp with pairwise identity between 75% and 98% were considered. Hits with similarities over 98% were discarded as false positives.

To identify LGTs from bacterial origins, a preliminary screen for bacterial scaffolds was performed following Wheeler et al. (2013). Subsequently, each scaffold was divided into 1 kb fragments, and each fragment was used as query in a BLASTN search against a custom database containing 2,100 different bacterial species (Supplementary Table 2). LGT candidates at the beginning position of a large scaffold were considered as misassembled. A second BLASTN search was carried out against the T. diversipes genome free of contaminants available at GenBank (accession number: GCA_033822845.1) and the nonredundant Procaryotae GenBank database to confirm the LGTs. The final LGTs were considered sequences present in the T. diversipes genome with bitscore = 0 against the animal database and bitscore > 75 against the bacterial database.

Wolbachia endosymbiont of T. diversipes

We utilized RAST Server-v2.0 (Aziz et al. 2008) to annotate the genes of 3 scaffolds identified as originating from Wolbachia endosymbiont. Subsequently, we compared the 5 genes (gatB, coxA, hcpA, fbpA, and ftsZ) part of the multilocus sequence typing (MLST; Baldo et al. 2006) with the PubMLST database (Jolley et al. 2018) to infer the Wolbachia alleles.

Large language models usage

ChatGPT-v3.5 was used to correct grammar and improve syntax throughout the text. The following prompt was given: “Correct the grammar [text pasted].”

Results and discussion

Genomic assembly and annotation

A total of 509,778,166 reads and 1,190,034 long reads from Illumina and PacBio sequencing, respectively, were employed for the assembly of the T. diversipes genome. The assembled genome, spanning 332,342,503 bp (332 Mbp), was distributed among 2,575 contigs (NCBI accession GCA_033822845.1). The largest scaffold measured 2,952,783 bp, the N50 for contigs > 200 bp was 395,597 bp, and the quantity of N's per 100 kb was 497.59 bp. Regarding the 5,991 single-copy hymenoptera orthologues, 93.9% were complete and single copy, 1.1% were complete and duplicated, 1.0% were fragmented, and 4.0% were missing in the T. diversipes genome assembly.

Gene prediction identified 15,028 protein-coding sequences with an average size of 493.6 bp (SE ± 4.4 bp). Among these sequences, 10,437 had matches against the SwissProt database. The final gene set fasta file and the gff annotation file can be accessed at https://github.com/pkfsantos/Tetrapedia_diversipes_genome. In comparison with the Hymenoptera database in BUSCO, the entire gene set contained 89.9% of completed single copy, 1.1% of completed and duplicated, 3.1% of fragmented, and 5.9% of missing genes. The genome size, number of predicted genes, and genome quantitative and qualitative metrics fall within the expected range compared with other bee species (Fig. 2; Supplementary Table 3).

Fig. 2.

Fig. 2.

Boxplots showing a) genome assembly sizes and b) number of predicted proteins across bee families. Tetrapedia diversipes is highlighted within the Apidae family. Families with only a median line in the boxplot indicate that only 1 species is representing that family.

Repetitive elements

The amount of repetitive elements reported in bee genomes is an intriguing feature that seems to be related to the level of social organization; the more complex the social organization level (e.g. highly eusocial), the smaller the proportion of repetitive DNA (<20% or less) in the genome (Kapheim et al. 2015). Specifically, TEs might have multiple regulatory functions, including epigenetic regulation, gene expression, splicing, and genomic rearrangements (Kazazian 2004; Slotkin and Martienssen 2007; Bourque et al. 2018). In T. diversipes, the repetitive elements accounted for 38.68% of the genome, including 2.32% of other noninterspersed repeats (Fig. 3). Class II TEs (elements capable of moving from one site to another in the genome in a cut-and-paste transposition mechanism—DNA transposon) were the most frequent in the genome, accounting for 12.54% of the repeats. Among them, the most frequent superfamilies were the Tc1/IS630/Pogo (7.83%), Helitron (1.31%), PiggyBac (0.81%), and hobo-Activator (0.76%). Class I elements (elements that are transcribed into RNA as an intermediate before their reversed transcription into DNA and incorporation in a copy-and-paste transposition mechanism—retrotransposon) summed up to 7.17% of the genome, with short interspersed nuclear elements (SINEs) and long interspersed nuclear elements (LINEs) accounting for 0.01 and 3.45% of the repeats, respectively. Long terminal repeats (LTRs) elements were the most abundant TE, comprising 3.71% of the genome. In total, TEs were classified into 59 known classes/families categories (Fig. 3). Compared with 10 bee genomes assessed by Kapheim et al. (2015), the repetitive elements content in the T. diversipes genome is high (38.68%). Only 2 other solitary bees, E. mexicana (49%) and Megachile rotundata (43.23%), have a higher quantity of repetitive elements (Supplementary Table 4.1; Kapheim et al. 2015).

Fig. 3.

Fig. 3.

Repetitive elements identified in the T. diversipes genome. Top: The proportion of all repetitive elements in the genome is split by class and type. Bottom: Distribution of repeats and TE families number identified. Colors correspond to the element's major class/type. Sequences labeled as “artifact” may resemble TEs but are not considered real, functional elements, while those labeled as “unclassified” are recognized as TEs but cannot be reliably assigned to any known TE family or superfamily.

A portion of these repetitive elements (4%) occurs around gene regions and could affect gene expression dynamics by interacting with the promoter regions and transcription factors (Supplementary Table 4; Supplementary Fig. 1;  Bourque et al. 2018; Uzunović et al. 2019). Regions within 5-kb window upstream or downstream of a gene location were enriched for elements from the LINE, LTR, Maverick, and terminal inverted repeats (TIRs) groups (Supplementary Table 4.2, Fisher's exact test P < 0.01). On the other hand, 12 TE categories were not found within these regions (ARTEFACT/hAT-Tip100, DNA/Ginger-2, DNA/hAT-hAT19, DNA/MULE-MuDR, DNA/P, DNA/TcMar, DNA/TcMar-Ant1, DNA/TcMar-Pogo, LINE/RTE-BovB, LINE/Tad1, LTR/caulimovirus, and LTR/ERVL).

LGT—NUMTs and bacteria–host origin

LGT refers to the transfer of DNA between organisms without involving sexual reproduction. In animals, the most common transfers occur between mitochondrial and nuclear DNA (NUMTs), as well as between intracellular bacteria and their host (Klasson et al. 2014). We investigated both types of LGTs in the T. diversipes genome.

NUMTs are spread among eukaryotes, comprising around 0.1% of the nuclear genome. Usually, the larger the genome, the greater the frequency of NUMTs (Hazkani-Covo et al. 2010). A total of 182 possible NUMT sequences were found based on the mtGenome of T. diversipes, representing 0.037% of the nuclear genome. This amount of NUMTs found is lower than the 0.08% described for A. mellifera (Behura 2007), despite the T. diversipes genome being larger. The length of NUMT sequences ranged from 300 to 2,077 bp, and the pairwise identity ranged from 75.1 to 97.3% (Supplementary Table 5). NUMTs appear to have originated equally from all mtGenome regions (Supplementary Table 5; Supplementary Fig. 2), with the exception of the AT-rich region and adjacent, which presented fewer NUMTs when compared with the other regions. Given that NUMTs are predominantly comprised of neutral regions, the notable variability observed in pairwise identity for each gene suggests the likelihood of multiple nuclear insertions.

After excluding 26 scaffolds identified as entirely of bacterial origin, the genome scan for bacterial LGTs returned 36 candidate regions across 26 different scaffolds. Out of the 36 candidates, 15 were trimmed or removed through the contamination exclusion pipeline performed by NCBI, 1 had bitscore > 0 against the animal database, and 2 are unlikely LGTs because the region that matches to bacterial material is located at the end of the scaffold and might be the result of misassembly. The remaining 18 regions varied from 45 to 217 bp, with 9 fragments identified as being from Wolbachia spp., the most common donor of LGTs (Supplementary Table 6).

Gene transfer from Wolbachia, while commonly observed among arthropods, tends to be evolutionary recent and species-specific (Hotopp et al. 2007). Approximately one-third of invertebrate genomes exhibit recent Wolbachia gene insertions (Werren et al. 2008). The prevalence of Wolbachia extends across various bee species (Gerth et al. 2013, 2015). Moreover, LGTs from this endosymbiont to host genomes have been identified not only in bees but also in other Hymenoptera, such as ants (Dhaygude et al. 2019) and wasps (Werren et al. 2010).

Wolbachia endosymbiont of T. diversipes

In addition to the LGTs from Wolbachia in the nuclear genome, 3 scaffolds were identified as being from the endosymbiont genome. The 3 scaffolds from Wolbachia are 1,851,538, 705,933, and 25,845 bp in size and can be found at: https://github.com/pkfsantos/Tetrapedia_diversipes_genome. Santos et al. (2022) screened for Wolbachia in 5 populations of T. diversipes and found that most individuals were infected by 2 different strains concurrently, possibly from 2 different supergroups A and B. The MLST genes, expected to be present in a single copy in the endosymbiont genomes, are commonly used to identify Wolbachia strains (Baldo et al. 2006). In our analysis, we identified 2 copies of each MLST gene in 2 Wolbachia scaffolds (Supplementary Table 7). Each gene copy corresponds to different alleles when compared against the PubMLST database (Supplementary Table 8). These data corroborate the previous findings of 2 strains infecting T. diversipes individuals in a population assay. However, all gene copies but one (from ftsZ gene) were found in the longest scaffold, suggesting a misassembly that combined the genomes of the 2 strains.

Conclusions

The genome of the solitary bee T. diversipes comprises 15,028 predicted protein-coding genes distributed across 332 Mbp. Similar to other solitary bee species, it is rich in repetitive elements. Additionally, our findings suggest the possible occurrence of LGT events involving both the mitochondrial and endosymbiont genomes. The data confirm previous findings of T. diversipes population infection by 2 Wolbachia strains.

Supplementary Material

jkae264_Supplementary_Data

Acknowledgments

The authors thank Susy Coelho for technical support at the Laboratory of Genetics and Evolution of Bees, USP, São Paulo, Brazil, Sammy Cheng from the Biology Department of the University of Rochester for providing the pipeline to identify LGTs from prokaryotes, and Isabel Alves dos Santos and Sheina Koffler for granting access to the Bee Laboratory, USP, São Paulo, Brazil, and facilitating the collection of the specimens.

Contributor Information

Priscila K F Santos, Departamento de Genética e Biologia Evolutiva, Instituto de Biociências, Universidade de São Paulo, Rua do Matão, 277, CEP 05508-090, São Paulo, SP, Brazil; Department of Biology, Utah State University, 5305 Old Main Hill, Logan, UT 84322, USA.

Natalia de Souza Araujo, Department of Evolutionary Biology and Ecology, Université Libre de Bruxelles, Av. Franklin Roosevelt 50, 1050 Bruxelles, Belgium.

Elaine Françoso, Department of Biological Sciences, School of Life Sciences and the Environment, Royal Holloway University of London, Egham Hill, Egham TW20 0EX, UK.

John H Werren, Department of Biology, University of Rochester, 210 Hutchison Rd, Rochester, NY 14620, USA.

Karen M Kapheim, Department of Biology, Utah State University, 5305 Old Main Hill, Logan, UT 84322, USA.

Maria Cristina Arias, Departamento de Genética e Biologia Evolutiva, Instituto de Biociências, Universidade de São Paulo, Rua do Matão, 277, CEP 05508-090, São Paulo, SP, Brazil.

Data availability

The raw data for the T. diversipes genome assembly are available under the BioProject PRJNA866601 in the NCBI database. The assembly can be accessed on GenBank—NCBI with the accession number JAOPTO000000000—GCA_033822845.1. The mtGenome can be located under MN732885.1. Steps and commands used in the genome assembly and annotation process, R code for Fig. 2, the 3 Wolbachia scaffolds, along with the T. diversipes gene set and genome annotation file, are accessible at https://github.com/pkfsantos/Tetrapedia_diversipes_genome. The pipeline for the repetitive elements analysis can be found at https://github.com/nat2bee/repetitive_elements_pipeline. Any other analysis details will be available upon request.

Supplemental material available at G3 online.

Funding

This work was supported by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior, Brazil (CAPES) (Finance Code 001). The CAPES also provided financial support to P.K.F.S. (PhD scholarship and PDSE to visit the Dr. Kapheim's Laboratory); Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) granted research sponsorship to M.C.A. (306932/2016-4); and Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) supported research projects to M.C.A. (2013/12530-4 and 2016/24669-5) and to N.d.S.A. (2012/18531-0). J.H.W. was supported by the National Science Foundation (United States) DEB1257053 and DEB1950078.

Literature cited

  1. Alves-dos-Santos  I, Melo  GA, Rozen  JG. 2002. Biology and immature stages of the bee tribe Tetrapediini (Hymenoptera: Apidae). Am Mus Novit. 3377:1–45. doi:. [DOI] [Google Scholar]
  2. Andrews  S.  2010. FastQC: a quality control tool for high throughput sequence data; [cited 2018]. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
  3. Arias  MC, Aulagnier  S, Baerwald  EF, Barclay  RMR, Batista  JS, Beasley  RR, Bezerra  RA, Blanc  F, Bridge  ES, Cabria  MT, et al.  2016. Microsatellite records for volume 8, issue 1. Conserv Genet Resour. 8(1):43–81. doi: 10.1007/s12686-016-0522-2. [DOI] [Google Scholar]
  4. Aziz  RK, Bartels  D, Best  AA, DeJongh  M, Disz  T, Edwards  RA, Formsma  K, Gerdes  S, Glass  EM, Kubal  M, et al.  2008. The RAST server: rapid annotations using subsystems technology. BMC Genomics. 9(1):75. doi: 10.1186/1471-2164-9-75. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bairoch  A, Apweiler  R. 1996. The SWISS-PROT protein sequence data bank and its new supplement TREMBL. Nucleic Acids Res. 24(1):21–25. doi: 10.1093/nar/24.1.21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Baldo  L, Hotopp  JC, Jolley  KA, Bordenstein  SR, Biber  SA, Choudhury  RR, Hayashi  C, Maiden  MCJ, Werren  JH. 2006. Multilocus sequence typing system for the endosymbiont Wolbachia pipientis. Appl Environ Microbiol. 72(11):7098–7110. doi: 10.1128/AEM.00731-06. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Behura  SK. 2007. Analysis of nuclear copies of mitochondrial sequences in honeybee (Apis mellifera) genome. Mol Biol Evol. 24(7):1492–1505. doi: 10.1093/molbev/msm068. [DOI] [PubMed] [Google Scholar]
  8. Bolger  AM, Lohse  M, Usadel  B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 30(15):2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Bonasio  R, Zhang  G, Ye  C, Mutti  NS, Fang  X, Qin  N, Donahue  G, Yang  P, Li  Q, Li  C, et al.  2010. Genomic comparison of the ants Camponotus floridanus and Harpegnathos saltator. Science. 329(5995):1068–1071. doi: 10.1126/science.1192428. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Bossert  S, Murray  EA, Almeida  EAB, Brady  SG, Blaimer  BB, Danforth  BN. 2018. Combining transcriptomes and ultraconserved elements to illuminate the phylogeny of Apidae. Mol Phylogenet Evol. 130(2019):121–131. doi: 10.1016/j.ympev.2018.10.012. [DOI] [PubMed] [Google Scholar]
  11. Bourque  G, Burns  KH, Gehring  M, Gorbunova  V, Seluanov  A, Hammell  M, Imbeault  M, Izsvák  Z, Levin  HL, Macfarlan  TS, et al.  2018. Ten things you should know about transposable elements. Genome Biol. 19(1):199. doi: 10.1186/s13059-018-1577-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Cappellari  SC, Melo  GAR, Aguiar  AJC, Neff  JL. 2012. Floral oil collection by male Tetrapedia bees (Hymenoptera: Apidae: Tetrapediini). Apidologie. 43(1):39–50. doi: 10.1007/s13592-011-0072-2. [DOI] [Google Scholar]
  13. Cordeiro  G, Boff  S, Alves-dos-Santos  I. 2019. Trap-nesting bees communities from protected areas of Atlantic Forest, Southeastern Brazil. Sociobiology. 66(2):306–315. doi: 10.13102/sociobiology.v66i2.3448. [DOI] [Google Scholar]
  14. Dhaygude  K, Nair  A, Johansson  H, Wurm  Y, Sundström  L. 2019. The first draft genomes of the ant Formica exsecta, and its Wolbachia endosymbiont reveal extensive gene transfer from endosymbiont to host. BMC Genomics. 20(1):1–16. doi: 10.1186/s12864-019-5665-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Edgar  RC. 2010. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 26(19):2460–2461. doi: 10.1093/bioinformatics/btq461. [DOI] [PubMed] [Google Scholar]
  16. Françoso  E, de Souza Araujo  N, Ricardo  PC, Santos  PKF, Zuntini  AR, Arias  MC. 2020. Evolutionary perspectives on bee mtDNA from mito-OMICS analyses of a solitary species. Apidologie. 51(4):531–544. doi: 10.1007/s13592-020-00740-x. [DOI] [Google Scholar]
  17. Gerth  M, Röthe  J, Bleidorn  C. 2013. Tracing horizontal Wolbachia movements among bees (Anthophila): a combined approach using multilocus sequence typing data and host phylogeny. Mol Ecol. 22(24):6149–6162. doi: 10.1111/mec.12549. [DOI] [PubMed] [Google Scholar]
  18. Gerth  M, Saeed  A, White  JA, Bleidorn  C. 2015. Extensive screen for bacterial endosymbionts reveals taxon-specific distribution patterns among bees (Hymenoptera, Anthophila). FEMS Microbiol Ecol. 91(6):fiv047. doi: 10.1093/femsec/fiv047. [DOI] [PubMed] [Google Scholar]
  19. Gremme  G, Steinbiss  S, Kurtz  S. 2013. Genome tools: a comprehensive software library for efficient processing of structured genome annotations. IEEE/ACM Trans Comput Biol Bioinform. 10(3):645–656. doi: 10.1109/TCBB.2013.68. [DOI] [PubMed] [Google Scholar]
  20. Gurevich  A, Saveliev  V, Vyahhi  N, Tesler  G. 2013. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 29(8):1072–1075. doi: 10.1093/bioinformatics/btt086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Hazkani-Covo  E, Zeller  RM, Martin  W. 2010. Molecular poltergeists: mitochondrial DNA copies (numts) in sequenced nuclear genomes. PLoS Genet. 6(2):e1000834. doi: 10.1371/journal.pgen.1000834. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Holt  C, Yandell  M. 2011. MAKER2: an annotation pipeline and genome database management tool for second-generation genome projects. BMC Bioinformatics. 12(1):491–505. doi: 10.1186/1471-2105-12-491. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Hotopp  JCD, Clark  ME, Oliveira  DCSG, Foster  JM, Fischer  P, Torres  MCM, Giebel  JD, Kumar  N, Ishmael  N, Wang  S, et al.  2007. Widespread lateral gene transfer from intracellular bacteria to multicellular eukaryotes. Science. 317(5845):1753–1756. doi: 10.1126/science.1142490. [DOI] [PubMed] [Google Scholar]
  24. Hunter  S, Apweiler  R, Attwood  TK, Bairoch  A, Bateman  A, Binns  D, Bork  P, Das  U, Daugherty  L, Duquenne  L, et al.  2009. InterPro: the integrative protein signature database. Nucleic Acids Res. 37(Database):211–215. doi: 10.1093/nar/gkn785. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Johnson  M, Zaretskaya  I, Raytselis  Y, Merezhuk  Y, McGinnis  S, Madden  TL. 2008. NCBI BLAST: a better web interface. Nucleic Acids Res. 36(Web Server):W5–W9. doi: 10.1093/nar/gkn201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Jolley  KA, Bray  JE, Maiden  MCJ. 2018. Open-access bacterial population genomics: BIGSdb software, the PubMLST.org website and their applications. Wellcome Open Res. 3(124):124. doi: 10.12688/wellcomeopenres.14826.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Jurka  J, Kapitonov  VV, Pavlicek  A, Klonowski  P, Kohany  O, Walichiewicz  J. 2005. Repbase update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 110(1–4):462–467. doi: 10.1159/000084979. [DOI] [PubMed] [Google Scholar]
  28. Kapheim  KM, Pan  H, Li  C, Salzberg  SL, Puiu  D, Magoc  T, Robertson  HM, Hudson  ME, Venkat  A. 2015. Genomic signatures of evolutionary transitions from solitary to group living. Science. 348(6239):1139–1144. doi: 10.1126/science.aaa4788. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Kazazian  HH. 2004. Mobile elements: drivers of genome evolution. Science. 303(5664):1626–1632. doi: 10.1126/science.1089670. [DOI] [PubMed] [Google Scholar]
  30. Klasson  L, Kumar  N, Bromley  R, Sieber  K, Flowers  M, Ott  SH, Tallon  LJ, Andersson  SGE, Dunning Hotopp  JC. 2014. Extensive duplication of the Wolbachia DNA in chromosome four of Drosophila ananassae. BMC Genomics. 15(1):1097. doi: 10.1186/1471-2164-15-1097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Koren  S, Walenz  BP, Berlin  K, Miller  JR, Bergman  NH, Philippy  AM. 2017. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27(5):722–736. doi: 10.1101/gr.215087.116.Freely. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Korf  I. 2004. Gene finding in novel genomes. BMC Bioinformatics. 5(1):59. doi: 10.1186/1471-2105-5-59. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Nacer  DF, Raposo do Amaral  F. 2017. Striking pseudogenization in avian phylogenetics: Numts are large and common in falcons. Mol Phylogenet Evol. 115(2017):1–6. doi: 10.1016/j.ympev.2017.07.002. [DOI] [PubMed] [Google Scholar]
  34. Noll  FB, Nascimento  FS, Vasconcelos  H. 2018. Richness and composition of ground-dwelling ants in tropical rainforest and surrounding landscapes in the Colombian Inter-Andean Valley. Neotrop Entomol. 47(6):729–730. doi: 10.1007/s13744-017-0565-4. [DOI] [PubMed] [Google Scholar]
  35. Santos  PKF, Araujo  NS, Françoso  E, Zuntini  AR, Arias  MC. 2018. Diapause in a tropical oil-collecting bee: molecular basis unveiled by RNA-seq. BMC Genomics. 19(1):305. doi: 10.1186/s12864-018-4694-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Santos  PKF, do Prado  LN, Cordeiro  GD, Alves-dos-Santos  I, Arias  MC. 2022. The high Wolbachia infection does not drive Tetrapedia diversipes (Hymenoptera: Apidae) sex bias and population genetic structure. Apidologie. 53(6):66. doi: 10.1007/s13592-022-00974-x. [DOI] [Google Scholar]
  37. Santos  PKF, Françoso  E, Cordeiro  GD, Alves-dos-Santos  I, Arias  MC. 2020. Genetic analyses reveal female philopatric behavior and nest usage by multiple females of the solitary oil-collecting bee Tetrapedia diversipes (Hymenoptera: Apidae). Apidologie. 51(5):815–825. doi: 10.1007/s13592-020-00763-4. [DOI] [Google Scholar]
  38. Simao  FA, Waterhouse  RM, Ioannidis  P, Kriventseva  EV, Zdobnov  EM. 2015. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 31(19):3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]
  39. Slotkin  RK, Martienssen  R. 2007. Transposable elements and the epigenetic regulation of the genome. Nat Rev Genet. 8(4):272–285. doi: 10.1038/nrg2072. [DOI] [PubMed] [Google Scholar]
  40. Smit  AFA, Hubley  R, Green  P.  2015. RepeatMasker Open-4.0. 2013-2015; [cited 2019]. http://www.repeatmasker.org.
  41. Standage  DS, Berens  AJ, Glastad  KM, Severin  AJ, Brendel  VP, Toth  AL. 2016. Genome, transcriptome and methylome sequencing of a primitively eusocial wasp reveal a greatly reduced DNA methylation system in a social insect. Mol Ecol. 25(8):1769–1784. doi: 10.1111/mec.13578. [DOI] [PubMed] [Google Scholar]
  42. Stanke  M, Waack  S. 2003. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics. 19(suppl_2):ii215–ii225. doi: 10.1093/bioinformatics/btg1080. [DOI] [PubMed] [Google Scholar]
  43. Suzek  BE, Huang  H, McGarvey  P, Mazumder  R, Wu  CH. 2007. UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics. 23(10):1282–1288. doi: 10.1093/bioinformatics/btm098. [DOI] [PubMed] [Google Scholar]
  44. Uzunović  J, Josephs  EB, Stinchcombe  JR, Wright  SI. 2019. Transposable elements are important contributors to standing variation in gene expression in Capsella grandiflora. Mol Biol Evol. 36(8):1734–1745. doi: 10.1093/molbev/msz098. [DOI] [PubMed] [Google Scholar]
  45. Werren  JH, Baldo  L, Clark  ME. 2008. Wolbachia: master manipulators of invertebrate biology. Nat Rev Microbiol. 6(10):741–751. doi: 10.1038/nrmicro1969. [DOI] [PubMed] [Google Scholar]
  46. Werren  JH, Richards  S, Desjardins  CA, Niehuis  O, Gadau  J, Colbourne  JK; Nasonia Genome Working Group; Werren  JH, Richards  S, Desjardins  CA, et al.  2010. Functional and evolutionary insights the genomes of three parasitoid Nasonia Species. Science. 327(5963):343–349. doi: 10.1126/science.1178028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Wheeler  D, Redding  AJ, Werren  JH. 2013. Characterization of an ancient lepidopteran lateral gene transfer. PLoS One. 8(3):e59262. doi: 10.1371/journal.pone.0059262. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Zimin  AV, Marçais  G, Puiu  D, Roberts  M, Salzberg  SL, Yorke  J. 2013. The MaSuRCA genome assembler. Bioinformatics. 29(21):2669–2677. doi: 10.1093/bioinformatics/btt476. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

jkae264_Supplementary_Data

Data Availability Statement

The raw data for the T. diversipes genome assembly are available under the BioProject PRJNA866601 in the NCBI database. The assembly can be accessed on GenBank—NCBI with the accession number JAOPTO000000000—GCA_033822845.1. The mtGenome can be located under MN732885.1. Steps and commands used in the genome assembly and annotation process, R code for Fig. 2, the 3 Wolbachia scaffolds, along with the T. diversipes gene set and genome annotation file, are accessible at https://github.com/pkfsantos/Tetrapedia_diversipes_genome. The pipeline for the repetitive elements analysis can be found at https://github.com/nat2bee/repetitive_elements_pipeline. Any other analysis details will be available upon request.

Supplemental material available at G3 online.


Articles from G3: Genes | Genomes | Genetics are provided here courtesy of Oxford University Press

RESOURCES