Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Mar 1.
Published in final edited form as: Ticks Tick Borne Dis. 2022 Nov 23;14(2):102090. doi: 10.1016/j.ttbdis.2022.102090

A draft of the genome of the Gulf Coast tick, Amblyomma maculatum

Jose MC Ribeiro 1, Natalia J Bayona-Vásquez 2, Khemraj Budachetri 3,4, Deepak Kumar 2, Julia Catherine Frederick 2, Faizan Tahir 3, Brant C Faircloth 5, Travis C Glenn 2, Shahid Karim 3
PMCID: PMC9898150  NIHMSID: NIHMS1854099  PMID: 36446165

Abstract

The Gulf Coast tick, Amblyomma maculatum, inhabits the Southeastern states of the USA bordering the Gulf of Mexico, Mexico, and other Central and South American countries. More recently, its U.S. range has extended West to Arizona and Northeast to New York state and Connecticut. It is a vector of Rickettsia parkeri and Hepatozoon americanum. This tick species has become a model to study tick/Rickettsia interactions. To increase our knowledge of the basic biology of A. maculatum we report here a draft genome of this tick and an extensive functional classification of its proteome. The DNA from a single male tick was used as a genomic source, and a 10X genomics protocol determined 28,460 scaffolds having equal or more than 10 Kb, totaling 1.98 Gb. The N50 scaffold size was 19,849 Kb. The BRAKER pipeline was used to find the protein-coding gene boundaries on the assembled A. maculatum genome, discovering 237,921 CDS. After trimming and classifying the transposable elements, bacterial contaminants, and truncated genes, a set of 25,702 were annotated and classified as the core gene products. A BUSCO analysis revealed 83.4% complete BUSCOs. A hyperlinked spreadsheet is provided, allowing browsing of the individual gene products and their matches to several databases.

Introduction

The Gulf Coast tick, Amblyomma maculatum (Koch, 1844) is a vector of Rickettsia parkeri Luckman (Rickettsiales: Rickettsiaceae), which causes a febrile infection in humans [13], and also of Hepatozoon americanum, a pathogen of dogs [47]. The distribution of A. maculatum extends from the Southeastern states of the USA bordering the Gulf of Mexico, into Mexico and several other Central and South American countries. In the past decades it has extended northwards and to the West in the United States, including the states of Arkansas, Oklahoma, Kansas, and Southwestern Tennessee [8]. The Northernmost range of this tick species includes Delaware, Connecticut, and New York [911]. Current work with this tick aims to understand its relationship with its symbionts and pathogens in general, particularly to understand the tick’s immunity pathways [1215]. The availability of the genome sequence of A. maculatum would foster the pace of these research goals.

To a researcher interested in the biochemistry and physiology of ticks, the main advantage of having the organism’s genome resides in the availability of an annotated set of coding sequences (CDS) and their protein translations, which allows the building of hypotheses on the roles of these gene products and, for example, planning experiments using RNAi and genome editing to test these hypotheses. The availability of genome will also facilitate to build technologies through realizing the full potential of exploiting small RNAs, including microRNA (miRNA) and PIWI-interactacting RNA (piRNA) biology in ticks.

In this work, we used the 10X Genomics platform to sequence the genome of a single male of the Gulf Coast tick, A. maculatum. To obtain the genome’s coding genes coordinates, we used available RNASeq data to train the BRAKER pipeline [16]. The derived CDS translations were compared to several databases and mapped to a hyperlinked spreadsheet that should allow researchers to search for their genes of interest and plan their experiments. The genome of A. maculatum will provide opportunities for comparative evolutionary analysis with other tick species and arthropod vectors, and allow researchers to explore the tick-pathogen interactions and ways tick parasitize vertebrate hosts.

Material and Methods

Sample origin and DNA extraction and quality

Amblyomma maculatum ticks were maintained at the University of Southern Mississippi according to our modified methods [17]. A. maculatum uninfected and Rickettsia parkeri-infected colonies were established in our laboratory in 2013. Questing unfed adult ticks were collected from Mississippi Sandhill Crane National Wildlife Refuge, Gautier, Mississippi (using the drag cloth method on 28th July 2013. A total of 42 females and 62 males collected from the field were blood-fed on sheep and allowed to engorge and drop off. Each fully engorged female adult tick was kept separately in a snap vial for egg-laying. Individual uninfected and Rickettsia parkeri-infected egg clutches from individual gravid females were selected and allowed to hatch into unfed larva. The unfed larval ticks were blood-fed, allowing them to infest golden Syrian hamsters until they dropped off. Fully engorged larvae were allowed to molt into nymphs and then blood-fed on hamsters. Fully engorged nymphs were molted as male or female adult ticks. Closed colonies from the 6th generation of original wild-caught ticks were used in this study, from which five adult male ticks were selected. For each, the whole adult live tick was cut in four quarters and digested in 500 μL of buffer (10–100 mM Tris, 10–100 mM EDTA, 100–200 mM NaCl, 0.5–1% SDS) with 5 μL of proteinase K 10 mg/mL (QIAGEN, Hilden, Germany). The ticks and digestion mix were incubated in a dry bath overnight at 55°C, mixed by vortex ten times during that period. Then, for each sample, 5 μL of RNAse A (ThermoFisher Scientific, MA, USA) was added, vortexed, and incubated at room temperature for 30 minutes. A 400 μL aliquot was transferred to a new microcentrifuge tube and a Phenol-Chloroform-Isoamyl Alcohol (PCI) DNA extraction protocol was followed. The five extracted genomic DNA (gDNA) samples were individually hydrated in 200 μL of TE 1X buffer (Integrated DNA Technologies, Inc., IA, USA) at room temperature overnight. For verification and visualization of the products, 5 μL of each hydrated DNA sample were run in a 0.8% agarose gel.

The sample that showed the best banding pattern in the agarose gel (brightest, high-molecular weight band), a male adult and therefore with XO sexual chromosome make up, was further processed at the Georgia Genomics and Bioinformatics Core, where gDNA concentration was estimated to be 23.3 ng/μl with a Qubit® Fluorometer using the High Sensitivity protocol, and also was assessed in a Fragment Analyzer (FA) Automated CE System (Advanced Analytical Technologies, CA, USA) using the HS Large Fragment 50Kb method and FA version 1.2.0.11. The FA report revealed a peak size of 24,754 bp, 0.7707 ng/μl, ranging from 4,550 to 100,798 bp with an average size of 27,872 bp.

Linked-Reads Genomic Library Prep and Sequencing

The gDNA sample was used as input for a library prep with the Chromium Genome Library Kit using the Chromium Genome Reagent Kits v2 (CG00022 Rev C), the Chromium Genome Gel Bead Kit (PN-120216), and the Chromium Genome Chip Kit (PN-120216), all from 10x Genomics (10x Genomics, CA, USA). The protocol followed the manufacturer’s instructions. In brief, we diluted the sample according to the standard for the genome protocol, that is 1 ng/μL, and verified that the concentration range was within acceptable limits. Then, the GEM generation sample mix was prepared and combined with both the Denaturing Agent and the gDNA. The mix was loaded into the Chromium Genome Chip where the Genome Gel Beads and Partitioning Oil were also loaded in the corresponding rows. The chip was placed in the Chromium Controller where the Genome Library program was run to partition and barcode each gDNA fragment. Barcodes were added to allow tracking of each resulting read to its original gDNA fragment. Then, the chip was ejected, and the GEMs were aspirated from the recovery well, transferred to a new tube, and isothermally incubated to generate 10x barcoded amplicons. Then the GEMs were cleaned-up with DynaBeads MyOne Silane (ThermoFisher Scientific, MA, USA), rinsed with 80% ethanol twice, and hydrated in Elution Solution. The library construction was finalized following end repair and A-tailing, adaptor ligation, post-ligation clean-up with SPRIselect, sample-index PCR using set SI-GA-A4 (contains barcodes TATGATTC, CCCACAG, ATGCTGAA, and GGATGCCG), and double-sided size selection SPRIselect. Finally, the library product was analyzed in the Fragment Analyzer Automated CE System (Advanced Analytical Technologies, CA, USA) using the NGS Fragment 1–6000bp method and quantified in a Qubit® Fluorometer using the High Sensitivity protocol. The FA report showed a peak size of 533 bp, with a 3.8 ng/μL concentration; the graph ranged from 1440 bp to 5,087 bp, with an average size of 705 bp. The qubit showed the library concentration to be 52.4 ng/μL.

The library was sequenced two independent times and the resulting reads were pooled. One of the sequencing runs was performed in an Illumina NovaSeq S4 and the second in an Ilumina HiSeqX, both using PE150 kits at Novogene Co., Ltd (Beijing, China).

Genome Assembly

For each run, samples were demultiplexed using the four barcodes from the 10x sample index set, and the output files were merged together according to reads 1 and 2. Then, both sequencing runs were merged independently for read 1 and read 2. Raw data was used as input in Supernova v. 2.1.1 [18] using the run parameter allowing the use of 1,200 million reads with maxreads in an attempt to reach a 56x raw coverage and allowing the use of 28 cores and 980 Gb of memory. Then the option mkoutput was used to create raw, pseudohap, pseudohap2, and megabubble outputs. The summary files regarding the assembly characteristics can be found in supplemental file 1.

Genome annotation

The BRAKER/Augustus pipeline [16] was used to obtain the putative coding sequences (CDS) from the A. maculatum genome. The program was trained to find the CDS using RNAseq data available from the NCBI (accessions SRR959015 - salivary glands and SRR959016 - ovaries). These reads were concatenated and normalized using the Trinity program insilico_read_normalization.pl [19]. The normalized reads were mapped to the unmasked genome using the program Star [20]. The mapped reads were used to train the gene-discovery pipeline BRAKER [16], which discovered a total of 380,129 coding sequences (CDS). The BUSCO program (version 5.0.0) [21] was run with the BRAKER predicted protein sequences against the lineage dataset arachnida_odb10, created on 2020-08-05, from 10 species and 2,934 BUSCOs. The program RepeatMasker version 4.1.2-p1 was used to identify transposable elements and repeat sequences. It was run in sensitive mode with rmblastn version 2.11.0+. The query species was assumed to be Arthropoda. The databases used were FamDB: CONS-Dfam_withRBRM_3.2. Transposable elements (TE) were identified using the Hmmer tool [22] against a subset of the Dfam database [23] containing transposable element models, excluding repeats.The CDS were also compared to the RepBase [24] protein database toidentify and classify TE. To classify genes accordiing to their functional class, the deducted protein sequences were compared using blastp to a subset of the GenBank database containing sequences from the AracHnidae, to the UniprotKB [25] database, to the Expasy Enzyme (EC) [26] database and to the MEROPS [27] database. Rpsblast was used to search the protein sequences against conserved motifs from the PFAM [28], SMART [29] , KOG [30] and CDD [31] databases. To identify genes associated with a salivary function, the CDS were compared by Rpsblast to the TickSialoFam (TSF) database [32]. Matches that had a model coverage of > 66.6 % and an e-value smaller than 1e-4 were considered as related to salivary function. General functional classification was achieved by using a set of ~ 400 key words that were searched in the definition line of the matches above. Each key word was associated with a functional class. A sequence functional class was determined by the first key word found in the definition line of the match if the product of % identity and % coverage were larger > than 0.25. If no keyword was found, the sequence was assigned to a “Unknown” function.All sequences were also searched for existence of a signal peptide indicative of secretion using the SignalP v. 3.0 program [33], for transmembrane domains using the tmhmm program [34] and for O-glycosylation sites indicative of mucins using the program NetOglyc [35]. Glycosyl-phosphate-inositol membrane anchors were identified by the DGPI program [36].

The published genomes of Rhipicephalus microplus and R. sanguineus [37] where used as input to the BRAKER/Augustus pipeline [16] trained with publicly available protein sequences from these organisms.

Transcriptome mapping

Amblyomma maculatum transcriptome reads from the salivary glands and ovaries of adult ticks (NCBI accessions SRR13797277, SRR13797276, SRR13797275, SRR13797274, SRR13797296, SRR13797295, SRR13797294, SRR13797293, SRR13797292, SRR13797290, SRR13797289, SRR13797288, SRR13797287, SRR13797286, SRR13797285, SRR13797284, SRR13797283, SRR13797282, SRR13797305, SRR959015, SRR959016, SRR13797281, SRR13797280, SRR13797279, SRR13797278, SRR13797303, SRR13797302, SRR13797291, SRR13797304, SRR13797273, SRR13797272, SRR13797271, SRR13797270, SRR13797269, SRR13797268, SRR13797301, SRR13797300, SRR13797299, SRR13797298, SRR13797297) were mapped to the predicted CDS using Bowtie2 [38]. Read coverage was measured using samtools coverage program [39].

Phylogenetic Analysis

Protein sequences were aligned with Muscle [40]. Phylogenetic trees were built with the program IQ-tree [41]. The best amino acid evolutionary model was determined by ModelFinder [42]. The tree was bootstrapped using UFBoot2 [43] with the bnni correction. The resulting Newick trees were annotated with Mega X [44],

Data availability

This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession JAJIZL000000000, BioProject accession PRJNA773936 and BioSample accession SAMN22546173. The reads used to assemble the genome can be found in the Sequence Read Archives (SRA) of the National Center for Biotechnology Information (NCBI) under the accession SRR16911356. The metagenome assembled Rickettsia parkeri genome was deposited in GenBank under the accession CP101541.

Results

We obtained a total of 942,809,836 paired-end reads from both sequencing runs. The genome assembly of A. maculatum resulted in 28,460 scaffolds having equal or more than 10 Kb, totaling 1.98 Gb. The N50 scaffold size was 19,849 Kb. If we add the contigs equal or larger than 1,000 bp, the total assembly size reaches 2.27 Gb. The number of contigs ranging from 1,000 – 9,999 bp is 101,575 and the contig N50 was of 29.12 Kb length. Following search of the assembled genome for bacterial contaminants and duplicated contigs, 14 contigs, summing 1.332 Mbp were found to match known bacterial genomes, including a contig of 1.296 Mbp that matched, with 99% identity, the genome of Rickettsia parkeri, a known endosymbiont of A. maculatum [45]. 4,558 contigs were found to be exactly duplicated, adding to a total of 6.8 Mbp. These contaminants and duplicated contigs were removed from the final assembly. Table 1 lists the current available tick genomes and their characteristics. Although the N50 for the A. maculatum assembly was on the low range when comparing to other tick genome assemblies (Table 1), a BUSCO analysis of the predicted 25,631 CDS from the A. maculatum genome indicated 83.4% complete BUSCOs, 66.8% complete and single-copy BUSCOs, 16.6% complete and duplicated BUSCOs, 1.7% fragmented BUSCOS and 14.9% missing BUSCOs. These results are above the average of those shown on Table 1 which lists other tick genomes so far published.

Table 1:

Published tick genomes characteristics

Scientific name Annotation Size (Gbp) Level ContigN50 (kb) Protein-coding BioProject Compl ete BUSCO s % Reference
Amblyomma maculatum 1.9 8 Contig 198 25,70 4 PRJNA773 936 83.4 This work
Dermacentor silvarum TIGMIC Group 2.4 7 Chromos ome 340 26,69 6 PRJNA633 311 62.4 [37]
Haemaphysalis longicornis TIGMIC Group 2.5 6 Chromos ome 740 27,14 4 PRJNA633 311 60.6 [37]
Hyalomma asiaticum TIGMIC Group 1.7 1 Chromos ome 555 29,64 4 PRJNA633 311 65.0 [37]
Ixodes persulcatus TIGMIC Group 1.9 0 Scaffold 533 28,57 4 PRJNA633 311 76.1 [37]
Ixodes scapularis Vectorbase 2.0 8 Contig 836 20,48 8 PRJNA345 486 86.8 [55]
Ixodes scapularis Vectorbase 2.3 0 Contig 1,73 5 19,06 2 PRJNA678 334 81.7 [54]
Rhipicephalus annulatus 2.7 6 Contig 437 N/A PRJNA593 711 N/A [Unpublished]
Rhipicephalus microplus TIGMIC Group 2.0 1 Scaffold 16 24,21 1 PRJNA312 025 55.4 [37]
Rhipicephalus microplus TIGMIC Group 2.5 3 Chromosome 1,79 1 29,85 7 PRJNA633 311 95.9 [37]
Rhipicephalus sanguineus TIGMIC Group 2.3 7 Chromosome 542 25,71 8 PRJNA633 311 60.4 [37]

The BRAKER pipeline [16] was used to find the protein coding gene boundaries on the assembled A. maculatum genome, discovering 237,921 CDS. These were compared by blast and rpsblast [46] to several databases, including those at the NCBI (non-redundant and TSA protein sequences) deriving from Arachnida organisms and from Rickettsial bacteria, and the Uniprot database. After removing the sequences matching bacterial phages as well as those that represented fragments with less than 67% coverage to known proteins from the Uniprot and NCBI Arachnida sets, a set of 88,754 sequences were identified as TE (see below), and an additional set of 25,702 were annotated as the core gene products of A. maculatum (Supplemental spreadsheets 1 with all CDS, 2 with TEs and 3, with the core genome set).

The transposable element landscape within the genome of Amblyomma maculatum

The genome-coding DNA contains the information to determine the sequence of a peptide possibly containing 20 different amino acids and one stop codon. There are 64 possible codons, and 3 of them code for stops. So, a stop codon should arise on average once every 21–22 codons, or 63–66 bp. Accordingly, stretches of ORFs longer than 200 nucleotides are expected to indicate a region coding for polypeptides. However, transposable elements challenge the annotation of sequenced genomes, as they “contaminate” these longer ORFs with their coding sequences [47]. Transposable elements are virus -like organisms that parasitize the majority of eukaryotic genomes, frequently loading more than half of the full genomes with their sequences. Thus, to obtain an accurate transcriptome and proteome prediction of a genome, the TE coding sequences have to be filtered out. The transposable element (TE) landscape of the A. maculatum genome was explored by annotating the predicted coding sequences identified as TE based on blastp matches to sequences annotated as TE from the Swissprot database as well as to coding sequences deducted from the Repbase database [24].

Among the 88,734 transcripts identified as TE’s, we were able to classify 86,752; 80.1 % of which were of the Class I type, 19.8% were of the Class II type and 0.17 % were from endogenous retroviruses (ERV) (Table 2 and Supplemental spreadsheet 2) .Within the Class I elements, 57.7% were Long Terminal Repeat (LTR) retrotransposons and 25.4% were NON-LTR retrotransposons. Within the LTRs, Gypsy elements were most abundant, consisting 97% of the total LTR. I elements were the most abundant within the NON-LTR, reaching 40% of the 19,431 transcripts found within Non-LTR elements. Among these NON-LTR elements the BovB LINE element was identified. This element is widespread in vertebrates and it was proposed that horizontal transfer of these elements among vertebrates was vectored by ticks [48, 49]. Among the Class II elements, the P/Tigger family was most abundant, with 11,303 elements, or 65.7% of all class 2 elements found. The Mariner/TC1 family was the second most abundant, abundant, reaching 21.2% of the 17,195 Class II elements identified.

Table 2:

Coding sequences from transposable elements found on the Amblyomma maculatum genome

Class Type Family Transcripts found Percent total elements Percent of class
CLASS I LTR RETROTRANSPOSON Gypsy 48,274 55.65 96.42
CLASS I LTR RETROTRANSPOSON Bel-Pao 742
CLASS I LTR RETROTRANSPOSON CER6-I 637
CLASS I LTR RETROTRANSPOSON Copia 116
CLASS I LTR RETROTRANSPOSON Ngaro 99
CLASS I LTR RETROTRANSPOSON CER2-I 55
CLASS I LTR RETROTRANSPOSON CIRCE 37
CLASS I LTR RETROTRANSPOSON CER3-I 26
CLASS I LTR RETROTRANSPOSON CER13-I 23
CLASS I LTR RETROTRANSPOSON SKIPPER 11
CLASS I LTR RETROTRANSPOSON Tf2 11
CLASS I LTR RETROTRANSPOSON CER15-I 10
CLASS I LTR RETROTRANSPOSON DIRS 9
CLASS I LTR RETROTRANSPOSON CER11-I 6
CLASS I LTR RETROTRANSPOSON CER10-I 4
CLASS I LTR RETROTRANSPOSON SACI 3
CLASS I LTR RETROTRANSPOSON TC1 3
CLASS I LTR RETROTRANSPOSON HERV 2
Total 50,068 57.71 100.00
CLASS I NON-LTR
RETROTRANSPOSON
I 6,870 9.89 39.95
CLASS I NON-LTR
RETROTRANSPOSON
RTE 4,128 24.01
CLASS I NON-LTR
RETROTRANSPOSON
REP2 1,804
CLASS I NON-LTR
RETROTRANSPOSON
Loa 1,083
CLASS I NON-LTR
RETROTRANSPOSON
Tad1 814
CLASS I NON-LTR
RETROTRANSPOSON
Outcast 759
CLASS I NON-LTR
RETROTRANSPOSON
Ingi 605
CLASS I NON-LTR
RETROTRANSPOSON
Jockey 591
CLASS I NON-LTR
RETROTRANSPOSON
Nimb 517
CLASS I NON-LTR
RETROTRANSPOSON
LINE 502
CLASS I NON-LTR
RETROTRANSPOSON
R1 432
CLASS I NON-LTR
RETROTRANSPOSON
Tx1 303
CLASS I NON-LTR
RETROTRANSPOSON
L1 232
CLASS I NON-LTR
RETROTRANSPOSON
RTEX 159
CLASS I NON-LTR
RETROTRANSPOSON
Penelope 147
CLASS I NON-LTR
RETROTRANSPOSON
R2 143
CLASS I NON-LTR
RETROTRANSPOSON
ORTE 94
CLASS I NON-LTR
RETROTRANSPOSON
Crack 66
CLASS I NON-LTR
RETROTRANSPOSON
CRE 51
CLASS I NON-LTR
RETROTRANSPOSON
NeSL 42
CLASS I NON-LTR
RETROTRANSPOSON
RandI 28
CLASS I NON-LTR
RETROTRANSPOSON
CR1 18
CLASS I NON-LTR
RETROTRANSPOSON
SR2B 12
CLASS I NON-LTR
RETROTRANSPOSON
LIN10 9
CLASS I NON-LTR
RETROTRANSPOSON
Proto1 6
CLASS I NON-LTR
RETROTRANSPOSON
R4 5
CLASS I NON-LTR
RETROTRANSPOSON
GENIE1 4
CLASS I NON-LTR
RETROTRANSPOSON
Vingi 3
CLASS I NON-LTR
RETROTRANSPOSON
Daphne 2
CLASS I NON-LTR
RETROTRANSPOSON
Ambal 1
CLASS I NON-LTR
RETROTRANSPOSON
Hero 1
Total 19,431 22.40 100.00
CLASS I total 69,499 80.11 100.00
CLASS II DNA TRANSPOSON P/Tigger 11,303 13.03 65.73
CLASS II DNA TRANSPOSON Mariner/Tc1 3,659 4.22 21.28
CLASS II DNA TRANSPOSON Ginger 793 0.91 4.61
CLASS II DNA TRANSPOSON Harbinger 520 0.60 3.02
CLASS II DNA TRANSPOSON Kolobok 318 0.37 1.85
CLASS II DNA TRANSPOSoN UNKNOWN 130
CLASS II DNA TRANSPOSON ISL2EU 129
CLASS II DNA TRANSPOSON EnSpm 110
CLASS II DNA TRANSPOSON hAT 64
CLASS II DNA TRANSPOSON piggyBac 33
CLASS II DNA TRANSPOSON Zisupton 27
CLASS II DNA TRANSPOSON CACTA 24
CLASS II DNA TRANSPOSON Helitron 20
CLASS II DNA TRANSPOSON Merlin 18
CLASS II DNA TRANSPOSON MuDR 16
CLASS II DNA TRANSPOSON MiniSatellite 9
CLASS II DNA TRANSPOSON mule 8
CLASS II DNA TRANSPOSON THAP9 6
CLASS II DNA TRANSPOSON LOOPER 4
CLASS II DNA TRANSPOSON Academ 2
CLASS II DNA TRANSPOSON PIF-Harbinger 2
CLASS II total 17,195 19.82
ERV ENDOGENOUS
RETROVIRUS
ERV3 38
ERV ENDOGENOUS
RETROVIRUS
ERV1 15
ERV ENDOGENOUS
RETROVIRUS
Endogenous
Retrovirus
5
Endogenous retrovirus total 58 0.07
Grand total 86,752 100.00

Among the predicted CDS coding for Mariner/TC1 transposases, there were 365 sequences with predicted peptide length between 400 and 600 aa (The average length of full-length Mariner transposases is near 410 aa [50]), without internal stop codons, and containing Pfam domains coding for the DDE superfamily endonuclease and domain HTH_Tnp_Tc5, coding for theTc5 transposase DNA-binding domain. Mariner/TC1 elements have been domesticated in vertebrates, including the centromere-associated protein B (CENPB) and the genes named Tigger transposable element-derived 2 to 7 (TIGD2–7) so far found only in vertebrates [51, 52]. Representatives of these sequences were submitted for phylogenetic analysis, together with the here deduced Mariner/TC1 sequences from A. maculatum and other similar proteins from other tick species found by blast of A. maculatum sequences against the non-redundant database from NCBI. Interestingly, a clade with high (99% bootstrap) support (Clade XI, supplemental figure 1) contained, in subclade XIb, the mammal sequences orthologous to the human TGD6 protein and tick proteins, in subclade XIa from R. microplus, R. sanguineus, I. scapularis and A. maculatum. Transcription of g129797 was found in ovaries, attaining a FPKM (Fragment Per Kilobase of transcript per Million mapped reads) of 8.78 and linear sequence coverage of 98.9%, while g180094 was found expressed in the salivary glands with a FPKM of 7.09 and linear sequence coverage of 97.9%. It is possible that these transposable elements have been also domesticated in ticks.

To compare the TE identification based on putative coding transcripts which are based on protein sequence identity with the TE predictions done from DNA sequence homologies (that are not disturbed by intruding stop codons), we used the program RepeatMasker which identified 1,323,280 TE and other repetitive elements in the A. maculatum genome, representing 25% of the 2.35 GBases of scanned genome (Table 3). Class I elements covered 12.73% of the genome, totaling 838,798 elements, while class II elements (DNA transposons) represented 0.26% of the genome with 82,533 elements, the majority being from the Mariner/TC1 family (36,962 elements). Table 3 has additional information regarding TE and repetitive elements found in the A. maculatum genome.

Table 3:

Transposable elements identified in by RepeatMasker the Amblyomma maculatum genome. Total genome size scanned = 2,350,858,905 bases.

Element family Number of elements Base Pairs Percentage of Genome
Retroelements 667,681 227,801,712 9.24%
 SINEs: 189,995 30,535,656 1.24%
 Penelope 9,123 923,733 0.04%
 LINEs: 306,569 111,282,227 4.51%
 CRE/SLACS 7 415 0.00%
 L2/CR1/Rex 92,277 13,151,584 0.53%
 R1/LOA/Jockey 44,875 7,279,321 0.30%
 R2/R4/NeSL 3,298 305,707 0.01%
 RTE/Bov-B 127,837 87,675,663 3.56%
 L1/CIN4 1,891 100,605 0.00%
LTR elements: 171,117 85,983,829 3.49%
 BEL/Pao 10,454 1,214,535 0.05%
 Ty1/Copia 5,007 265,943 0.01%
 Gypsy/DIRS1 154,051 84,423,689 3.42%
 Retroviral 0 0 0.00%
Total Class I elements 838,798 313,785,541
DNA transposons 82,533 6,360,904 0.26%
 hobo-Activator 8,996 630,064 0.03%
 Tc1-IS630-Pogo 36,962 3,489,969 0.14%
 En-Spm 0 0 0.00%
 MuDR-IS905 0 0 0.00%
 PiggyBac 922 115,270 0.00%
 Tourist/Harbinger 945 118,864 0.00%
 Other (Mirage, P-element, Transib) 2,924 179,953 0.01%
Rolling-circles 11,361 683,173 0.03%
Unclassified: 5,696 432,101 0.02%
Total interspersed repeats 234,594,717 9.51%
Small RNA: 190,873 30,766,714 1.25%
Satellites: 1,573 135,324 0.01%
Simple repeats: 0 0 0.00%
Low complexity: 0 0 0.00%
Total 1,323,280 617,660,512 25.07
*

most repeats fragmented by insertions or deletions have been counted as one element

The query species was assumed to be arthropoda

RepeatMasker version 4.1.2-p1, sensitive mode

Endogenous viral sequences

The CDS g178917.t1 codes for a nucleocapsid protein from a rhabdovirus [53] which appears to have been incorporated into the genomes of various tick species, as represented by the similar sequences found in the genomes of R. sanguineus (XP_037519053.1), R. microplus (XP_037281023.1), Dermacentor silvarum (XP_037579436.1), I. ricinus (ASY03265.1), I. persulcatus (KAG0426363.1) and I. scapularis (XP_040355436.1).

Annotation of the core genome of Amblyomma maculatum

By comparing the predicted gene products with several databases (see methods), 25,702 gene products were annotated in 29 classes, including 7,976 that were classified as “Unknown” (Table 4 and supplemental spreadsheet 3).

Table 4:

Classification and number of core gene products identified in the Amblyomma maculatum genome

Class Number of gene products
Putative salivary secreted 2,277
Cytoskeletal proteins 638
Detoxification 233
Oxidant metabolism/Detoxification 157
Extracellular matrix 383
Immunity 187
Amino acid metabolism 358
Carbohydrate metabolism 334
Energy metabolism 515
Intermediary metabolism 139
Lipid metabolism 682
Nucleotide metabolism 206
Nuclear export 33
Nuclear regulation 558
Protein export 2,361
Protein modification 691
Proteasome machinery 641
Protein synthesis machinery 606
Secreted protein 914
Signal transduction 3,293
Storage 39
Transcription factor 50
Transcription machinery 1,392
Transporters and channels 1,040
Unknown conserved 349
Unkown conserved membrane protein 211
Unknown product 7,065
Unkown membrane protein 351
Viral product 1
Total 25,704
Total - Unknown 17,728

Salivary proteins

The search for genes associated with secreted salivary proteins was done by matches of the predicted proteins against the TSF database revealing 2,277 gene products possibly coding for salivary proteins (Table 5 and supplemental spreadsheet 3). Among these, 170 lipocalins, 38 members of the anti-complement/8.9 kDa protein family and 17 evasins were found. Comparisons of the number of members of these protein families found in the proteome annotation of published tick species [37, 54, 55] revealed a much-increased diversity of these protein families in A. maculatum (Table 6A). A possible reason for this discrepancy could be the failure of annotating the salivary-coding transcripts in tick genomes, possibly due to their unique sequences. In support of this this hypothesis, we found larger number of these sequences in the ab initio predicted proteins of the genomes of R. microplus and R. sanguineus. Additionally, we searched the published salivary transcriptomes of R. microplus [56] and R. sanguineus [57], where we found larger number of these protein family members than in the annotated genomes (Table 6B).

Table 5:

Classification and abundance of putative salivary expressed genes from Amblyomma maculatum predicted by the TicskSialoFam database

Family
 Subfamily
Number of CDS
12kDa family
 Generic 5
 Metastriate 11
 pk4/12kDa 3
12kDaBasic 1
13–14kDa
 13kDa 10
13kDa-Basic 1
23–24kDa
 23kDa 15
15kDaBasic 3
18kDa 19
19kDa 1
23–24kDa family
 23kDa 1
 24kDa 36
28kDa 8
8.9kDa 38
8kDa
 8 kDa metastriate 2
AlaRich 63
Amb-25–357 4
Antigen-5 8
BSMAP 1
Basic tail family
 Generic 22
 TSGP1 5
CalreticulinCalnexin 3
Cell adhesion molecule 7
Coiled-coil domain-containing 4
Complement receptor 1
Complement-binding protein 9
CUTA1 2
CystineKnotToxin 3
Cytotoxin 44
DAP-36 2
Down syndrome family of cell adhesion molecules
 Generic 2
 Ig_3 28
 IG_like 8
 Ig-domain 6
EFh_CREC_Calumenin_like 4
Evasin
 EvasinA 16
 EvasinB 1
Fasciclin-1 1
Ficolin/Ixoderin 11
Fukutin
Glycine Rich protein family
 Generic 7
 Cement 1
 GRP_cement_450 1
 GRP_cement_833 1
 Collagen-like 1
 Chitin binding 48
 Dystroglycan 1
 GGY 6
 GRP21 6
 Grp7_allergen 16
 Large GYY 8
 Large_GRP_II 2
Hematopoietic stem/progenitor cells -like 1
HVA22/Cytokine 1
Hyp_94 1
Hyp2009 2
Insulin_growth_factor 11
Integrin
 Alpha subunit 1
 Beta subunit 3
Interleukin17-like 14
Ixodegrin 25
Ixodegrin-like 1
Kielin/chordin-like 1
Laminin 9
Lipocalin 170
 His binding 98
 Generic 47
 lipocal-1 1 1
 Metastriate IgG-binding lipocalin 17
 94 7
Low-density lipoprotein receptor 21
ML_domain 20
Mucin
 Generic 78
 HRP 6
 Peritrophin 8
 Sialomucin 1
MYS-2 1
Mys-25–289 1
Mys-25–299 8
Mys-30–170 2
Mys-30–60 6
Mys-30–94 5
Niemann-Pick 6
OneOfEach 38
OSTMP1 1
Papa 2
Peptidoglycan_recognition_protein 5
Phosphatidylethanolamine binding 16
Prich 15
Prich 3
Rapp-25–325 1
Salp15/Ixostatin
 Ixostatin 7
Saposin 1
Selenoprotein 1
Serum amyloid A 4
Synaptotagmin 1 1
TGF-beta propeptide 7
TGF-beta propeptide 1
Tick Hirudin 1
Tick-MYS1 1
TMEM9 1
Toll4_associated 1
Toll-like 57
Tolloid-like 2
translocon-associated protein subunit alpha 1
Vitellogenin 4
Vitelogenin-VWF 4
YRP 2
HVA22/Cytokine 1
Hyp669 2
Malectin 1
Toll-like 5
Antimicrobial
5.3kDa
  Metastriate_5 2
DAE-2 1
Defensin 6
  Is4 6
Lysozyme 4
Microplusin 15
  Microplusin_2 20
Enzymes
 5’nucleotidase/Apyrase 9
 Coesterase 99
  Phospholipase A2 6
 Cysteinyl_peptidase 26
 Dehydrogenase 82
  Angiotensin converting enzyme 5
 Ectonucleotide pyrophosphatase/phosphodiesterase 3
 Endonuclease 14
 Epoxide hydrolase 18
 IPPase 4
  Multiple inositol polyphosphate phosphatase 2
 M13_peptidase 377
 Metalloprotease 63
 Sphingomyelinase 13
 Serine carboxypeptidase 22
 Zinc carboxypeptidase 2
 Serine protease 62
 Peroxidase 12
 Selenium dependent glutathione peroxidase 4
 Superoxide dismutase, cu2+/zn2+ superoxide dismutase sod1 8
 Tyrosine sulfotransferase 1
Catalytically inactive chitinase-like lectin 71
Proteinase inhibitors
 Longistatin 4
 Carboxypeptidase inhibitor 1
 Cystatin 14
 Thyropin 2
 Kunitz 90
 Serpin 68
 Kazal 1
 SPARC/Kazal 24
 TIL 20
Total 2,278

Table 6A:

Number of gene products coding for typical salivary proteins in tick genomes.

Species Lipocalins 8.9 kDA Evasins Reference
A. maculatum 170 38 17 This work
D. silvarum 27 3 0 [37]
H. asiaticum 48 7 0 [37]
H. longicornis 1 0 0 [37]
I. persulcatus 12 2 0 [37]
I. scapularis SE6 37 12 2 [54]
I. scapularis Wikel 41 15 2 [55]
R. microplus 20 1 0 [37]
R. sanguineus 29 1 0 [37]

Table 6B:

Number of gene products or coding sequences coding for typical salivary proteins in published tick genomes, ab-initio genomes or transcriptomes.

Species Lipocalins 8.9 kDA Evasins Reference
R. sanguineus genome 29 1 0
R. sanguineus ab initio 48 8 1
R. sanguineus transcriptome 141 34 17 [57]
R. microplus genome 20 1 0
R. microplus ab initio 48 3 0
R. microplus transcriptome 140 22 12 [56]

Digestive enzymes

The sole food of ticks is blood, which is digested intracellularly with the aid of lysosomal cathepsins [58]. Serine proteases may be involved in the late phase of tick engorgement [59]. We have annotated 370 protease genes in the A. maculatum genome, including metalloproteases, calpains, legumains, serine and cysteinyl cathepsins, serine proteases, dipeptidyl peptidases, amino and carboxy peptidases, and protein modification enzymes (Table 7 and supplemental spreadsheet 3, worksheet “Proteases”). Of notice is the expansion of the M13 metalloproteases, with 447 genes, compared to 255 found in R. microplus and 41 on the I. scapularis annotated proteomes (Table 8). Other peptidases are listed on the worksheet “Protein modification” of supplemental spreadsheet 3.

Table 7:

Annotated proteases found in the Amblyomma maculatum genome

Class Number of genes
M10 metalloproteases 2
M12B metalloproteases 17
M13 metalloproteases 185
Calpains 5
Cathepsin B 4
Cathepsin D (Pepsin) 7
Cathepsin K (Papain) 2
Cathepsin L (Papain) 10
Cathepsin O (Papain) 2
Serine proteases 35
Dipeptidyl peptidase 2
Legumains 22
Amino and Carboxypeptidases 59
Other peptidases 13
Protein modification enzymes 5
Total 370

Table 8:

Number of gene products coding for M13 proteases within tick genomes

Species Number of genes Reference
A. maculatum 447 This work
D. silvarum 174 [37]
H. asiaticum 165 [37]
H. longicornis 120 [37]
I. persulcatus 59 [37]
R. sanguineus 129 [37]
R. microplus 115 [37]
I. scapularis Wikel 41 [55]
I. scapularis SE6 41 [54]

Protein modification enzymes

Within the “protein modification enzymes” we highlight the finding of a putative tyrosine sulfotransferase, an enzyme that adds a sulfate group to a tyrosine residue, an important protein modification in tick hormones [60] and some tick salivary peptides [61, 62].

Among other protein modification enzymes, we found several genes coding for members of the prolyl hydroxylase complex, which are important in the production of mature collagen proteins [63]. These can be browsed in the worksheet “Protein modification” from supplemental spreadsheet 3.

Protein glycosyl transferases adds carbohydrate residues to proteins. In ticks, these enzymes have received recent attention due to the epidemics of alpha-gal allergies, which are thought to be triggered by alpha-galactosyl residues decorating the salivary proteins of some tick species, including Amblyomma americanum and Ixodes scapularis, but not in Dermacentor variabilis or A. maculatum [64]. In I. scapularis, typical α-Gal transferases (GALT) were absent in the genome, but enzymes of the α1–4 and β−14 GALT families were able to generate protein α-Galactosylation [65]. These enzymes can be recognized by the “Lactosylceramide 4-alpha-galactosyltransferase” TSFam motif [32]. No enzymes matching this motif or other α-GALT enzymes were found in the A. maculatum genome. The worksheet named “glycosyltransferases” of supplemental spreadsheet 2 presents data on 192 glycosyltransferases.

Cytoskeletal and extracellular proteins

On supplemental spreadsheet 3, worksheet “Cytoskeletal”, annotations can be found for myosins, actins, tubulins, their interacting proteins, and diverse collagen proteins, proteoglycans, and their related enzymes, cuticles and other chitin binding proteins, and gap-junction Innexin proteins.

Immunity-related products

Annotation of genes coding for products associated with immunity revealed proteins coding for: (1) the antimicrobial peptides Defensins, microplusins, lysozymes, Is4 and DAE-2 (Supplemental spreadsheet 3, see results on both Salivary and Immunity worksheets), (2) the RNAi/antiviral response, including Argonaute, Armitage, Aubergine, Tudor, RM62 and Serrate, (3) several members of the alpha-macroglobulin family of complement-like thio-ester esterases (4), several proteins associated with the interferon response (5), three products with similarities to Interleukin-16 and IL-17 (6), Chemokine-like products (7). Several proteins associated with the Tumor Necrosis Factor (TNF) response, including the TNF receptor protein (8), members of the IMD pathway such as Bendless, Caspar, Caudal, Effete, IKK famma - protein kinase, TAB2, TAK1, Uev1a, IAP2 and akirins (9), several products associated with pathogen-recognition motifs (10), members of the SOCS-JAK Stat pathway such as JAK Hopscotch Tyrosine protein kinase, JAK Receptor (Domeless), PIAS Sumo ligase, SOCS box SH2-domain-containing protein and Stat3 (10), members of the TOLL pathway Cactus, Dorsal, MYD88, Pelle, Tube, Spaetzle and several Toll-like receptors.

Epigenetic control and transcription factors

Products affecting epigenetic control, such as histone lysine methyl transferases, histone acetylases and acetyltransferases, histone deacetylases, sirtuins and several members of the chromatin remodeling complex are identified in the supplemental spreadsheet 3 under the row named “Epigenetic control. Transcription factors (47 sequences) are also annotated in Supplemental spreadsheet 2.

Oxidative and detoxification metabolisms

Catalases, peroxidases, superoxide dismutases, Cytochrome −450, Cytoglobins, Selenoproteins, Thioredoxins, Sulfotransferases, Aryl and Glycosyl sulfatases and Glutathione transferases are listed on the worksheet named “Detoxification” on supplemental spreadsheet 3.

Signal transduction

Worksheet “Signal transduction” of supplemental spreadsheet 3 lists several transcripts giving best matches to proteins annotated as 7 transmembrane receptors, G protein-coupled receptors, alpha-1a adrenergic receptor, and receptors for acetylcholine, dopamine, adenosine, serotonin. histamine, adiponectin, rmrfamide, ecdysone, allatostatins, leucokinin atrial natriuretic factor, calcitonin, cholecystokinin, corticotropin, gaba, glycine, octopamine, gonadotropin-releasing hormone, melanocortin neuropeptide y receptor, pyrokinin, relaxin, sifamide and vasopressin. These receptors can be targets of novel acaricides. Several hormonal precursors are also listed, including for the crustacean chh/mih/gih neurohormone family, neurohypophysial hormones and several prohormones.

Additional annotations

Supplemental spreadsheet 3 also details genes coding for proteins implicated on nuclear regulation and nuclear export, transcription and translation machineries, protein export, amino acid, carbohydrate, lipid, and energy metabolisms and proteasome machinery,

Discussion

Using the Chromium Genome Library Kit and the 10X Genomics platform, we obtained a draft genome sequence of the tick Amblyomma maculatum, the first genome for this tick genus, using the DNA extracted from a single male tick. A total of 237,921 putative coding sequences were discovered by the Augustus/BRAKER pipeline trained with public RNAseq data. After excluding transposable elements and truncated sequences, we arrived at a core set of 25,702 coding genes that were functionally annotated and available for browsing in hyperlinked spreadsheets, which we hope will be valuable for further research with this tick species and contributing to the understanding of tick phylogeny.

Analysis of the expanded salivary gland expressed families (such as lipocalin) from the genome of A. maculatum and 3 other tick species show a considerable absence of sequences predicted by transcriptome assembly. It is possible that the “missing “ salivary-coding genes could derive from a higher polymorphism of these genes. Indeed, variable mutation rates are known to occur among different genes [66], associated with those having high transcription [67] or associated with adaptation to variable environments, such as those caused by the host immune response [68], conditions that are found for the highly expressed salivary-coding genes, such as those coding for the lipocalins or metalloproteases. Additionally, increased recombination rates within salivary-coding genes, as observed in some organisms [69, 70], could cause large sequence variation among the individual tick genomes, causing the repertoire of genes at the level of the population being much larger than at the individual level. This hypothesis could be tested by comparing the abundance and similarities of salivary-coding genes from genomes assembled from different individuals.

Supplementary Material

1
2
3

Acknowledgements:

This work used the Georgia Advanced Computing Resource Center and the Georgia Genomics and Bioinformatics Core at UGA, the HPC@LSU and the computational resources of the NIH HPC Biowulf cluster (http://hpc.nih.gov). We are grateful to Drs. John Andersen , Ben Mans and Isabel Santos for helpful comments on the manuscript.

Funding:

This publication was made possible by an Institutional Development Award (IDeA) from the National Institute of General Medical Sciences of the National Institutes of Health under Grant #P20GM103476; USDA NIFA (2017-67017-26171 & 2017-67016-26864); Pakistan-U.S. Science and Technology Cooperation Program (Phase 7) (10003290), the National Science Foundation Grant No. DGE-1545433 (JCF). JMCR was supported by the Intramural Research Program of the National Institute of Allergy and Infectious Diseases (Vector-Borne Diseases: Biology of Vector Host Relationship, Z01 AI000810-18).

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References:

  • 1.Sumner JW, Durden LA, Goddard J, Stromdahl EY, Clark KL, Reeves WK, Paddock CD: Gulf coast ticks (Amblyomma maculatum) and Rickettsia parkeri, United States. Emerging Infectious Diseases 2007, 13(5):751. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Paddock CD, Finley RW, Wright CS, Robinson HN, Schrodt BJ, Lane CC, Ekenna O, Blass MA, Tamminga CL, Ohl CA: Rickettsia parkeri rickettsiosis and its clinical distinction from Rocky Mountain spotted fever. Clinical Infectious Diseases 2008, 47(9):1188–1196. [DOI] [PubMed] [Google Scholar]
  • 3.Cumbie AN, Espada CD, Nadolny RM, Rose RK, Dueser RD, Hynes WL, Gaff HD: Survey of Rickettsia parkeri and Amblyomma maculatum associated with small mammals in southeastern Virginia. Ticks and Tick-borne Diseases 2020, 11(6):101550. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Mathew J, Ewing S, Panciera R, Woods J: Experimental transmission of Hepatozoon americanum Vincent-Johnson et al., 1997 to dogs by the Gulf Coast tick, Amblyomma maculatum Koch. Veterinary parasitology 1998, 80(1):1–14. [DOI] [PubMed] [Google Scholar]
  • 5.Ewing S, DuBois J, Mathew J, Panciera R: Larval Gulf Coast ticks (Amblyomma maculatum)[Acari: Ixodidae] as host for Hepatozoon americanum [Apicomplexa: Adeleorina]. Veterinary Parasitology 2002, 103(1–2):43–51. [DOI] [PubMed] [Google Scholar]
  • 6.Mathew JS, Ewing SA, Panciera RJ, Kocan KM: Sporogonic development of Hepatozoon americanum (Apicomplexa) in its definitive host, Amblyomma maculatum (Acarina). The Journal of parasitology 1999, 85(6):1023–1031. [PubMed] [Google Scholar]
  • 7.Ewing SA, Panciera RJ: American canine hepatozoonosis. Clinical microbiology reviews 2003, 16(4):688–697. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Anderson JM, Moore IN, Nagata BM, Ribeiro JMC, Valenzuela JG, Sonenshine DE: Ticks, Ixodes scapularis, feed repeatedly on white-footed mice despite strong inflammatory response: an expanding paradigm for understanding tick-host interactions. Frontiers in immunology 2017, 8:1784. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Maestas LP, Reeser SR, McGay PJ, Buoni MH: Surveillance for Amblyomma maculatum (Acari: Ixodidae) and Rickettsia parkeri (Rickettsiales: Rickettsiaceae) in the State of Delaware, and Their Public Health Implications. Journal of medical entomology 2020, 57(3):979–983. [DOI] [PubMed] [Google Scholar]
  • 10.Molaei G, Little EAH, Khalil N, Ayres BN, Nicholson WL, Paddock CD: Established Population of the Gulf Coast Tick, Amblyomma maculatum (Acari: Ixodidae), Infected with Rickettsia parkeri (Rickettsiales: Rickettsiaceae), in Connecticut. Journal of medical entomology 2021, 58(3):1459–1462. [DOI] [PubMed] [Google Scholar]
  • 11.Ramirez-Garofalo JR, Curley SR, Field CE, Hart CE, Thangamani S: Established Populations of Rickettsia parkeri-Infected Amblyomma maculatum Ticks in New York City, New York, USA. Vector borne and zoonotic diseases (Larchmont, NY 2021. [DOI] [PubMed] [Google Scholar]
  • 12.Adamson SW, Browning RE, Budachetri K, Ribeiro JM, Karim S: Knockdown of selenocysteine-specific elongation factor in Amblyomma maculatum alters the pathogen burden of Rickettsia parkeri with epigenetic control by the Sin3 histone deacetylase corepressor complex. PLoS ONE 2013, 8(11):e82012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Budachetri K, Kumar D, Karim S: Catalase is a determinant of the colonization and transovarial transmission of Rickettsia parkeri in the Gulf Coast tick Amblyomma maculatum Insect molecular biology 2017, 26(4):414–419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Saito TB, Bechelli J, Smalley C, Karim S, Walker DH: Vector tick transmission model of spotted fever rickettsiosis. The American journal of pathology 2019, 189(1):115–123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Karim S, Kumar D, Budachetri K: Recent advances in understanding tick and rickettsiae interactions. Parasite immunology 2021:e12830. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Hoff KJ, Lomsadze A, Borodovsky M, Stanke M: Whole-genome annotation with BRAKER. In: Gene prediction. Springer; 2019: 65–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Budachetri K, Kumar D, Crispell G, Beck C, Dasch G, Karim S: The tick endosymbiont Candidatus Midichloria mitochondrii and selenoproteins are essential for the growth of Rickettsia parkeri in the Gulf Coast tick vector. Microbiome 2018, 6(1):1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Weisenfeld NI, Kumar V, Shah P, Church DM, Jaffe DB: Direct determination of diploid genome sequences. Genome research 2017, 27(5):757–767. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, Couger MB, Eccles D, Li B, Lieber M: De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nature protocols 2013, 8(8):1494–1512. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Dobin A, Gingeras TR: Mapping RNA-seq reads with STAR. Current protocols in bioinformatics 2015, 51(1):11.14. 11–11.14. 19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM: BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics (Oxford, England) 2015, 31(19):3210–3212. [DOI] [PubMed] [Google Scholar]
  • 22.Potter SC, Luciani A, Eddy SR, Park Y, Lopez R, Finn RD: HMMER web server: 2018 update. Nucleic acids research 2018, 46(W1):W200–W204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Hubley R, Finn RD, Clements J, Eddy SR, Jones TA, Bao W, Smit AF, Wheeler TJ: The Dfam database of repetitive DNA families. Nucleic acids research 2016, 44(D1):D81–D89. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Bao W, Kojima KK, Kohany O: Repbase Update, a database of repetitive elements in eukaryotic genomes. Mobile DNA-Uk 2015, 6(1):1–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Poux S, Arighi CN, Magrane M, Bateman A, Wei C-H, Lu Z, Boutet E, Bye-A-Jee H, Famiglietti ML, Roechert B: On expert curation and scalability: UniProtKB/Swiss-Prot as a case study. Bioinformatics (Oxford, England) 2017, 33(21):3454–3460. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Bairoch A: The ENZYME database in 2000. Nucleic acids research 2000, 28(1):304–305. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Rawlings ND, Barrett AJ, Finn R: Twenty years of the MEROPS database of proteolytic enzymes, their substrates and inhibitors. Nucleic acids research 2016, 44(D1):D343–D350. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, Mitchell AL, Potter SC, Punta M, Qureshi M, Sangrador-Vegas A: The Pfam protein families database: towards a more sustainable future. Nucleic acids research 2016, 44(D1):D279–D285. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Schultz J, Copley RR, Doerks T, Ponting CP, Bork P: SMART: a web-based tool for the study of genetically mobile domains. Nucleic acids research 2000, 28(1):231–234. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN: The COG database: an updated version includes eukaryotes. BMC bioinformatics 2003, 4(1):1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Lu S, Wang J, Chitsaz F, Derbyshire MK, Geer RC, Gonzales NR, Gwadz M, Hurwitz DI, Marchler GH, Song JS: CDD/SPARCLE: the conserved domain database in 2020. Nucleic acids research 2020, 48(D1):D265–D268. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Ribeiro JMC, Mans BJ: TickSialoFam (TSFam): A Database That Helps to Classify Tick Salivary Proteins, a Review on Tick Salivary Protein Function and Evolution, With Considerations on the Tick Sialome Switching Phenomenon. Frontiers in cellular and infection microbiology 2020, 10:374. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Bendtsen JD, Nielsen H, Von Heijne G, Brunak S: Improved prediction of signal peptides: SignalP 3.0. Journal of molecular biology 2004, 340(4):783–795. [DOI] [PubMed] [Google Scholar]
  • 34.Sonnhammer EL, Von Heijne G, Krogh A: A hidden Markov model for predicting transmembrane helices in protein sequences. In: Ismb: 1998; 1998: 175–182. [PubMed] [Google Scholar]
  • 35.Hansen JE, Lund O, Tolstrup N, Gooley AA, Williams KL, Brunak S: NetOglyc: prediction of mucin type O-glycosylation sites based on sequence context and surface accessibility. Glycoconjugate J 1998, 15(2):115–130. [DOI] [PubMed] [Google Scholar]
  • 36.Kronegg J, Buloz D: Detection/prediction of GPI cleavage site (GPI-anchor) in a protein (DGPI). URL: http://129194 1999, 185.
  • 37.Jia N, Wang J, Shi W, Du L, Sun Y, Zhan W, Jiang JF, Wang Q, Zhang B, Ji P et al. : Large-Scale Comparative Analyses of Tick Genomes Elucidate Their Genetic Diversity and Vector Capacities. Cell 2020, 182(5):1328–1340 e1313. [DOI] [PubMed] [Google Scholar]
  • 38.Langmead B, Salzberg SL: Fast gapped-read alignment with Bowtie 2. Nature methods 2012, 9(4):357–359. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM: Twelve years of SAMtools and BCFtools. GigaScience 2021, 10(2):giab008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic acids research 2004, 32(5):1792–1797. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, Von Haeseler A, Lanfear R: IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Molecular biology and evolution 2020, 37(5):1530–1534. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Kalyaanamoorthy S, Minh BQ, Wong TK, Von Haeseler A, Jermiin LS: ModelFinder: fast model selection for accurate phylogenetic estimates. Nature methods 2017, 14(6):587–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Hoang DT, Chernomor O, Von Haeseler A, Minh BQ, Vinh LS: UFBoot2: improving the ultrafast bootstrap approximation. Molecular biology and evolution 2018, 35(2):518–522. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Kumar S, Stecher G, Li M, Knyaz C, Tamura K: MEGA X: molecular evolutionary genetics analysis across computing platforms. Molecular biology and evolution 2018, 35(6):1547. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Budachetri K, Browning RE, Adamson SW, Dowd SE, Chao C-C, Ching W-M, Karim S: An insight into the microbiome of the Amblyomma maculatum (Acari: Ixodidae). Journal of medical entomology 2014, 51(1):119–129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Madden T: The BLAST sequence analysis tool. In: The NCBI Handbook [Internet] 2nd edition. National Center for Biotechnology Information (US); 2013. [Google Scholar]
  • 47.Permal E, Flutre T, Quesneville H: Roadmap for annotating transposable elements in eukaryote genomes. Methods in molecular biology (Clifton, NJ 2012, 859:53–68. [DOI] [PubMed] [Google Scholar]
  • 48.Walsh AM, Kortschak RD, Gardner MG, Bertozzi T, Adelson DL: Widespread horizontal transfer of retrotransposons. Proceedings of the National Academy of Sciences 2013, 110(3):1012–1016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Mans BJ, De Klerk D, Pienaar R, De Castro MH, Latif AA: Next-generation sequencing as means to retrieve tick systematic markers, with the focus on Nuttalliella namaqua (Ixodoidea: Nuttalliellidae). Ticks and tick-borne diseases 2015, 6(4):450–462. [DOI] [PubMed] [Google Scholar]
  • 50.Robertson HM, Lampe DJ: Distribution of transposable elements in arthropods. Annual review of entomology 1995, 40:333–357. [DOI] [PubMed] [Google Scholar]
  • 51.Etchegaray E, Naville M, Volff J-N, Haftek-Terreau Z: Transposable element-derived sequences in vertebrate development. Mobile DNA-Uk 2021, 12(1):1–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Gao B, Wang Y, Diaby M, Zong W, Shen D, Wang S, Chen C, Wang X, Song C: Evolution of pogo, a separate superfamily of IS630-Tc1-mariner transposons, revealing recurrent domestication events in vertebrates. Mob DNA 2020, 11:25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Walker PJ, Firth C, Widen SG, Blasdell KR, Guzman H, Wood TG, Paradkar PN, Holmes EC, Tesh RB, Vasilakis N: Evolution of genome size and complexity in the Rhabdoviridae. PLoS pathogens 2015, 11(2):e1004664. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Miller JR, Koren S, Dilley KA, Harkins DM, Stockwell TB, Shabman RS, Sutton GG: A draft genome sequence for the Ixodes scapularis cell line, ISE6. F1000Research 2018, 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Gulia-Nuss M, Nuss AB, Meyer JM, Sonenshine DE, Roe RM, Waterhouse RM, Sattelle DB, De La Fuente J, Ribeiro JM, Megy K: Genomic insights into the Ixodes scapularis tick vector of Lyme disease. Nature communications 2016, 7(1):1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Tirloni L, Braz G, Nunes RD, Gandara ACP, Vieira LR, Assumpcao TC, Sabadin GA, da Silva RM, Guizzo MG, Machado JA et al. : A physiologic overview of the organ-specific transcriptome of the cattle tick Rhipicephalus microplus. Scientific reports 2020, 10(1):18296. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Tirloni L, Lu S, Calvo E, Sabadin G, Di Maggio LS, Suzuki M, Nardone G, da Silva Vaz I Jr., Ribeiro JMC: Integrated analysis of sialotranscriptome and sialoproteome of the brown dog tick Rhipicephalus sanguineus (s.l.): Insights into gene expression during blood feeding. Journal of proteomics 2020, 229:103899. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Horn M, Nussbaumerová M, Šanda M, Kovářová Z, Srba J, Franta Z, Sojka D, Bogyo M, Caffrey CR, Kopáček P: Hemoglobin digestion in blood-feeding ticks: mapping a multipeptidase pathway by functional proteomics. Chemistry & biology 2009, 16(10):1053–1063. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Reyes J, Ayala-Chavez C, Sharma A, Pham M, Nuss AB, Gulia-Nuss M: Blood digestion by trypsin-like serine proteases in the replete Lyme disease vector tick, Ixodes scapularis. Insects 2020, 11(3):201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Donohue KV, Khalil SM, Ross E, Grozinger CM, Sonenshine DE, Roe RM: Neuropeptide signaling sequences identified by pyrosequencing of the American dog tick synganglion transcriptome during blood feeding and reproduction. Insect biochemistry and molecular biology 2010, 40(1):79–90. [DOI] [PubMed] [Google Scholar]
  • 61.Franck C, Foster SR, Johansen-Leete J, Chowdhury S, Cielesh M, Bhusal RP, Mackay JP, Larance M, Stone MJ, Payne RJ: Semisynthesis of an evasin from tick saliva reveals a critical role of tyrosine sulfation for chemokine binding and inhibition. Proceedings of the National Academy of Sciences 2020, 117(23):12657–12664. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Thompson RE, Liu X, Ripoll-Rozada J, Alonso-Garcia N, Parker BL, Pereira PJB, Payne RJ: Tyrosine sulfation modulates activity of tick-derived thrombin inhibitors. Nature chemistry 2017, 9(9):909–917. [DOI] [PubMed] [Google Scholar]
  • 63.Gorres KL, Raines RT: Prolyl 4-hydroxylase. Critical reviews in biochemistry and molecular biology 2010, 45(2):106–124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Crispell G, Commins SP, Archer-Hartman SA, Choudhary S, Dharmarajan G, Azadi P, Karim S: Discovery of alpha-gal-containing antigens in North American tick species believed to induce red meat allergy. Frontiers in immunology 2019, 10:1056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Cabezas-Cruz A, Espinosa PJ, Alberdi P, Šimo L, Valdés JJ, Mateos-Hernández L, Contreras M, Rayo MV, de la Fuente J: Tick galactosyltransferases are involved in α-Gal synthesis and play a role during Anaplasma phagocytophilum infection and Ixodes scapularis tick vector development. Scientific reports 2018, 8(1):1–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Hodgkinson A, Eyre-Walker A: Variation in the mutation rate across mammalian genomes. Nature reviews genetics 2011, 12(11):756–766. [DOI] [PubMed] [Google Scholar]
  • 67.Park C, Qian W, Zhang J: Genomic evidence for elevated mutation rates in highly expressed genes. EMBO reports 2012, 13(12):1123–1129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Matic I: Mutation rate heterogeneity increases odds of survival in unpredictable environments. Molecular cell 2019, 75(3):421–425. [DOI] [PubMed] [Google Scholar]
  • 69.Wallberg A, Glémin S, Webster MT: Extreme recombination frequencies shape genome variation and evolution in the honeybee, Apis mellifera. PLoS genetics 2015, 11(4):e1005189. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Hey J: What’s so hot about recombination hotspots? PLoS biology 2004, 2(6):e190. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1
2
3

Data Availability Statement

This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession JAJIZL000000000, BioProject accession PRJNA773936 and BioSample accession SAMN22546173. The reads used to assemble the genome can be found in the Sequence Read Archives (SRA) of the National Center for Biotechnology Information (NCBI) under the accession SRR16911356. The metagenome assembled Rickettsia parkeri genome was deposited in GenBank under the accession CP101541.

RESOURCES