Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2017 Nov 22;12(11):e0188044. doi: 10.1371/journal.pone.0188044

Ultraconserved elements (UCEs) resolve the phylogeny of Australasian smurf-weevils

Matthew H Van Dam 1,¤,*, Athena W Lam 1,¤, Katayo Sagata 2, Bradley Gewa 3, Raymond Laufa 3, Michael Balke 1,4, Brant C Faircloth 5, Alexander Riedel 6
Editor: Antonis Rokas7
PMCID: PMC5699822  PMID: 29166661

Abstract

Weevils (Curculionoidea) comprise one of the most diverse groups of organisms on earth. There is hardly a vascular plant or plant part without its own species of weevil feeding on it and weevil species diversity is greater than the number of fishes, birds, reptiles, amphibians and mammals combined. Here, we employ ultraconserved elements (UCEs) designed for beetles and a novel partitioning strategy of loci to help resolve phylogenetic relationships within the radiation of Australasian smurf-weevils (Eupholini). Despite being emblematic of the New Guinea fauna, no previous phylogenetic studies have been conducted on the Eupholini. In addition to a comprehensive collection of fresh specimens, we supplement our taxon sampling with museum specimens, and this study is the first target enrichment phylogenomic dataset incorporating beetle specimens from museum collections. We use both concatenated and species tree analyses to examine the relationships and taxonomy of this group. For species tree analyses we present a novel partitioning strategy to better model the molecular evolutionary process in UCEs. We found that the current taxonomy is problematic, largely grouping species on the basis of similar color patterns. Finally, our results show that most loci required multiple partitions for nucleotide rate substitution, suggesting that single partitions may not be the optimal partitioning strategy to accommodate rate heterogeneity for UCE loci.

Introduction

The Curculionoidea represents a diverse group of phytophagous insects with ~62,000 described species and an estimated 200,000 undescribed species [1]. They are not only highly diverse, but also of major economic importance because they cause destruction to many crop and ornamental plant species worldwide. Curculionoidea are a relatively old group dating to the Upper Jurassic 161–151 Ma [13]. Despite both their diversity and economic importance, there are still few resources for conducting phylogenomic research on this group. Here, we test a set of target enrichment baits designed to enrich ultraconserved elements (UCEs) across Coleoptera [4], and we use the resulting sequence data to examine the phylogeny of Australasian smurf-weevils in the tribe Eupholini. We had two main goals: 1) to examine the utility of these newly designed universal Coleoptera UCE baits for phylogenetic studies of weevils, using freshly collected individuals as well as older museum specimens (up to 65 years); and 2) to test the monophyly of the three main genera within the Eupholini, examining the validity of previous ecomorphology-driven classification efforts.

Ultraconserved genomic elements (UCEs, sensu Faircloth et al. 2012 [5]), provide a powerful approach to sequence many independent regions of the genome. UCEs have proven useful in resolving phylogenies at multiple phylogenetic scales both shallow and deep [69]. UCEs, like many other reduced representation techniques that rely on oligonucleotide “bait” capture procedures, also have the potential to perform well when collecting data from museum samples [10,11]. Given the challenges with obtaining samples for many rare taxa and/or from logistically challenging regions of the world, such approaches can be highly beneficial [12,10]. The challenge of obtaining weevil samples is even more complex because many species can be found only on specific host plants at very particular times of the year, and many weevil hosts are unknown. In addition to the challenges of sampling, many weevil groups have undergone rapid radiations [2,13] necessitating the development of large genome-wide data sets to take on such challenges.

The Eupholini represent a tribe of weevils with approximately 300 known species. New Guinea is the Eupholini’s center of diversity both in terms of species and genera. Currently, there are three main genera recognized within the Eupholini, although the monophyly of each is somewhat questionable: Eupholus Boisduval (63 species), Gymnopholus Heller (76 species) and Rhinoscapha Montrouzier (137 species) [14,15]. Eupholus species are brilliantly colored and can be divided in two morphologically distinct groups, i.e. the E. schoenherri-group and the E. loriae-group [16]. Members of the E. schoenherri-group appear to feed exclusively on toxic wild yams (Dioscorea) (Fig 1). Most Eupholus species (~58) occur in habitats below 1,000m elevation with few (~5) above 1,500m. Members of Gymnopholus are of dark or cryptic coloration and usually occur above 1,500m. Gymnopholus species in the subgenus Symbiopholus (33 species) occur above 2,000m and exhibit epizoic symbiosis, i.e. lichens, algae and moss growing on their integument that enhance camouflage (Fig 1) [17,18]. Gymnopholus species are polyphagous, browsing on a variety of plants. Species in the third genus, Rhinoscapha, are sometimes brilliantly colored and other times more cryptic. Although their hosts are less well known, Rhinoscapha species can be found at all elevational zones, with certain clades showing some elevational zonation.

Fig 1. ASTRAL species tree, input trees derived from multi-partitioned MrBayes analyses of individual gene trees (i.e. using the 6 character in sets in Fig 2 to inform the partitioning strategy).

Fig 1

Node values indicate support values of MrBayes posterior used as ASTRAL bootstrap replicates, branch colors correspond to species clades. Weevils from top clockwise: Eupholus azureus Macleay, Eupholus schoenherrii schoenherrii (Guérin-Méneville), Celebia arrogans (Boisduval), Rhinoscapha sp. “Large-brown Mt.Wilhelm 2700m”, Gymnopholus (Symbiopholus) acarifer Gressitt, Gymnopholus regalis Gressitt, Eupholus cuvierii (Guérin-Méneville), Rhinoscapha tricolor Faust, Rhinoscapha doriae Pascoe, Gymnopholus nitidus Gressitt & Sedlacek.

Materials and methods

UCE probe design

The 1,172 UCE loci that we tested with Eupholini were identified in Faircloth 2017. The 13,674 baits targeting these loci were designed from six different Coleoptera genomes and one genome from the sister Order Strepsiptera. Of these taxa, Dendroctonus ponderosae (Curculionidae: Scolytinae) is most closely related to the Eupholini (Curculionidae: Entiminae): they shared a most recent common ancestor ~105 Ma [2]. We had the bait set synthesized by MYcroarray (MYcroarray, Ann Arbor MI, USA) as described, with no modifications.

Laboratory protocols

We extracted DNA non-destructively from 48 samples, using DNeasy Tissue kits (Qiagen; Macherey-Nagel). Nine of these 48 samples were pinned museum specimens that ranged in age from 1952 to 2006. Tissue was extracted from the pronotum and/or leg of each specimen for specimens preserved in ethanol. For pinned specimens, tissue was taken out of the pronotum, mesothorax and abdomen using sterilized forceps (flame treated then soaked in 10% bleach between use) and placed in a 1.5–2.0 ml centrifuge tube. In the case of large pinned specimens, the tissue from the pronotum, mesothorax and abdomen were placed in separate centrifuge tubes, the resulting eluates from the specimen parts were later combined. We added 4μl of RNase-A to remove RNA from the samples during the extraction process according to manufacturer's protocol (http://www.bea.ki.se/documents/EN-DNeasy%20handbook.pdf). In the case of extractions obtained without the use of RNase-A during the extraction process, 1μl of 10 μg/mL RNase-A was added to each genomic DNA (gDNA) sample, the mixture was then incubated at 37°C for 30 minutes, followed by purification via ethanol precipitation. Extracted DNA quantity ranged from 253–1600 ng (mean 556 ng) measured by Qubit fluorometer (Life Technologies, Inc.). Subsequent library preparation and sequence capture steps were performed by RAPiD Genomics (Gainesville, FL, USA). Briefly, to confirm that ethanol preserved samples’ DNA was not degraded we visualized their DNA on a polyacrylamide gel for a few exemplar specimens, while all museum samples’ DNA was visualized on a polyacrylamide gel to assess the amount of degradation. Following visual examination, samples that were not highly degraded (>500 bp) were sonicated to a target size of 450 bp. Capture was performed with two pools of 24 samples each. Libraries were constructed by repairing the ends of the sheared fragments followed by the ligation of an adenine residue to the 3’-end of the blunt-end fragments. Next, custom indexed adapters suited for Illumina Sequencing platform were ligated to the libraries (see S1 Supporting Information Links). Finally, ligated fragments were PCR-amplified using standard cycling protocols (e.g. Mamanova et al. 2010 [19]), with Pre-capture PCR having 8–12 cycles, and Post-capture PCR with 13 cycles. Enrichment procedures followed the MYcroarray MYBaits kit Version-3 protocol [20], except that RAPiD Genomics used chicken C0t-1 DNA as blocking reagent. In Hymenoptera, chicken C0t-1 was shown to work better than “homebrew”, taxon specific C0t-1 DNA [21]. The 48 indexed libraries were pooled in equimolar ratios to a total of 250 ng/μl. The 48 MYBaits reactions were used at the full concentration. We sequenced two pools of these samples each using a half lane of paired-end, 100 base pair reads on an illumina HiSeq 3000. We performed these two sequencing runs primarily to see if this substantially improved the number of loci captured. Voucher specimens are stored in museum collections (SNSB-Zoological State Collection, München (ZSM), State Museum of Natural History Karlsruhe (SMNK), California Academy of Sciences (CAS)), see S3 Table for details.

Analysis of captured UCE data

We used ILLUMIPROCESSOR [22], which is a parallel wrapper of TRIMMOMATIC [2325], to clean and trim reads. We performed this step for each of the two sequencing runs, we then concatenated the two different sequencing runs together. After trimming, we generated summary stats on the trimmed reads using the PHYLUCE v1.6 package [26], script “phyluce_assembly_get_fastq_lengths”. All programs hereafter beginning with “phyluce” are PYTHON programs part of the PHYLUCE package. We assembled the cleaned/trimmed reads using “phyluce_assembly_assemblo_trinity” with TRINITY v2013-02-25 [27]. Next we generated summary statistics (counts/lengths) of the assembled contigs using “phyluce_assembly_get_fasta_lengths”. To identify UCE loci from our contigs we needed to match these to the probes, we used “phyluce_assembly_match_contigs_to_probes”, this was set to a minimum coverage of 80% and minimum identity of 80%. We used “phyluce_assembly_get_match_counts” to create an initial database of loci counts per taxon. To obtain average read depth of our contigs by taxon we used “phyluce_assembly_get_trinity_coverage”, the results of which were then used to calculate average read depth among our UCE loci per taxon with “phyluce_assembly_get_trinity_coverage_for_uce_loci” using the sqlite database created in the “phyluce_assembly_match_contigs_to_probes” step. Next, we used “phyluce_assembly_get_fastas_from_match_counts” to get a count of the number of UCE loci captured for each taxa. To examine the effect of preservation type on number of loci captured, we performed a Welch Two Sample t-test between museum-pinned and ethanol preserved samples. We then divided each UCE loci into a separate fasta file using “phyluce_assembly_explode_get_fastas_file”, as we later wanted to construct individual gene tree phylogenies, and then aligned the sequences and trimmed the edges using “phyluce_align_seqcap_align” which implements MAFFT v7.130b [28]. We did not use internal alignment trimming because our taxa are relatively closely related. However we eliminated columns that were entirely composed of “-”,”n” and or “?” with a custom R script using "deleteEmptyCells" function in the "ips" library [29]. Lastly we used “phyluce_align_get_only_loci_with_min_taxa” to select the minimum percentage of missing taxa in our UCE loci alignments.

UCE phylogenomics

We performed two different types of phylogenomic analyses: (1) a concatenated analyses using RAxML v8.0.19 [30], and (2) several species tree analyses using ASTRAL v5.1.0 with TRee ALgorithm III (https://github.com/chaoszhang/ASTRAL/tree/multiind) [3133] and SVDQuartets [34,35] which is implemented in PAUP* v4.0a152 [36].

Concatenated analyses

We set each UCE locus as its own character set and then used PartitionFinder v2.1.1 [37] implementing the “greedy” search algorithm [38] to select for the best partitioning strategy for the data under the General Time Reversible + gamma (GTRGAMMA) site rate substitution model using the AICc metric. We then conducted 20 maximum-likelihood (ML) searches in RAxML. We also performed non-parametric bootstrap replicates under GTRGAMMA using the autoMRE option to optimize the number of bootstrap replicates for this large dataset. We reconciled the bootstrap replicates with the best fitting ML tree.

Species tree analyses

Here, we performed several different analyses. First we reconstructed the ML gene tree estimated for each of the UCE loci in RAxML, below we list the specific process and settings in RAxML for our partitioned gene trees. Here we used an initial character set of the UCE core central (160bp) and divided up the remaining length of the locus into fifths. This character set was chosen based on a visual inspection of the number of phylogenetically informative sites over the proportion of loci length (Fig 2). This resulted in 5 character sets (6 including the UCE core), that were grouped together based on their proportional length from the conserved UCE core. We then used PartitionFinder v2.1.1 [37] implementing the “greedy” search algorithm [38] to select the best partitioning strategy for each locus, under the GTRGAMMA site rate substitution model according to the AICc metric. Next, we conducted 20 maximum-likelihood (ML), best tree searches for each partitioned locus in RAxML, as well as 200 non-parametric bootstrap replicates. We reconciled the bootstrap replicates with the best fitting ML tree of each locus. We also wanted to examine the effect of removing poorly supported/oversaturated gene trees on the topology and or support of the species tree. Using modified R code from Borowiec et al. 2015 [39], we first calculated the average of the bootstrap support values for each gene tree and then eliminated gene trees that were in the lowest 10% quantile of average bootstrap support. We then identified potentially over-saturated loci, based on departure from a linear regression between uncorrected p-distances and inferred distances of the tips for each of the UCE loci [40]. We then eliminated the outlier trees that were potentially over-saturated. Next we estimated a species tree using ASTRAL and assessed support with 200 bootstrap replicates for the complete set of genes, the set without the lowest 10% average bootstrap support, and the remaining trees without potentially oversaturated loci. Lastly, for comparative purposes, we also reconstructed a species tree in ASTRAL based on single character sets / partition and performed the same RAxML analyses as described above for the complete dataset.

Fig 2. Shows approximate position of character sets used for each locus in PartitionFinder 2 for ASTRAL analyses, overlaid on the frequency of PIS in the final UCE data set.

Fig 2

UCE Core refers to the section of the locus that corresponds to the length of UCE probe. Numbers 1–5 on left of UCE Core correspond to matching characters sets on the right, such that e.g. both sections 5 are the same character set. Character sets 1–5 correspond to one fifth of the length of the locus (left and right of UCE Core) minus the sites from the UCE Core.

Based on the results of the species tree topology and support values described above, we proceeded with gene tree reconstruction in MrBayes v3.2.5 [41]. We also used the same partitioning strategy described above, estimated the most appropriate site rate substitution model for MrBayes using PartitionFinder v2.1.1. We conducted 2 independent runs of 1 cold chain and 3 heated chains (default settings) for 1x107 Markov chain Monte Carlo (MCMC) generations sampling every 1000 generations in MrBayes. We removed the first 25% as the burnin. We combined the two independent runs and created a maximum clade credibility topology (MCCT) for each locus with “sumtrees.py” part of the DendroPy package [42] as well as with custom scripts, see S1 Supporting Information Links. These gene trees were then input to ASTRAL to create a species tree. In order to assess support of the species tree we used each of the MCMC generations as a bootstrap replicate in ASTRAL (combined 15000 samples of the run after the burnin for each of the gene trees), thus creating support values akin to posterior probabilities of nodes. Lastly, for comparative purposes, we also reconstructed a species tree in ASTRAL based on single character sets/partitions per-locus and performed the same model selection and gene tree reconstruction as described above in MrBayes. We will refer to the analyses based on single character sets / partitions per-locus as ASTRAL-(RAxML or MrBayes)-single-partition analyses and those based on multiple partitions as ASTRAL-(RAxML or MrBayes)-multi-partition analyses.

Finally, we used another species tree method to look for congruence between methods. We created a species tree using SVDQuartets [34,35] where we evaluated all of the quartets with the ‘evalQuartets = all’ command using the Quartet FM (QFM) algorithm [43]. We assessed support with 200 bootstrap replicates.

We also wanted to more objectively examine the differences that resulted from these methods rather than just a visual comparison. Therefore, we calculated three tree distance metrics that rely on the tree topology between two trees in the R package Phangorn [44]; the Robinson-Foulds distance (RF-dist) [45], the path distance metric between pairs of taxa (Path-dist) [46] and the approximate subtree prune and regraft distance (SPR-dist) [47,48]. Visualizing support values and color coding on trees was implemented in the R package phytools [49].

UCE capture and preservation

We investigated if specimen age or preservation type (museum pinned or ethanol preservation) had a greater effect on the number of UCEs captured using generalized linear models (GLM). Next, we compared our dataset to those of two other studies which utilized UCEs and museum samples, specifically Xylocopa (carpenter bees) [11] and Aphelocoma (scrub-jays) [10]. These studies were chosen because of their data availability and their systematic sampling by specimen age. We wanted to see if the number of UCE loci captured, given specimen age, was similar between studies. We first transformed the fraction of UCE loci captured within each taxon (number of UCE loci captured per-specimen divided by the maximum number of loci captured, within a clade e.g. Eupholini weevils or scrub-jays), and then logit transformed the data and performed a likelihood-ratio test, to see if a model with independent slopes vs one with a single slope fit the data better.

Results

We sequenced on average ~1.2x107 reads per sample with an average length of ~100 bp, see S1 Table. The UCE loci alignment results from the first round of sequencing and UCE loci alignment results from concatenation of reads (1st and 2nd rounds together), showed only a modest increase in the number of loci captured, for example at the 70% complete level the difference was only 36 loci (359 1st vs 395 2nd round of sequencing), see S2 Table. The following results refers to only those of the final concatenated dataset. The average per-sample read depth of all contigs was 11.8X, while the average per-sample read depth of UCE loci was 28.9X. Across all samples, we captured a total of 940 out of 1,172 UCE loci. After removing duplicate loci, we recovered an average of 451 UCE loci from each sample, with a range of 104–610 loci; SE ±13.0 loci. We recovered an average of 470 loci with a range of 251–610; SE ±11.0 loci from ethanol preserved specimens, and we recovered fewer loci from the 9 museum samples (mean of 370 loci with a range of 104–545; SE ±42.6 loci; ethanol vs museum pinned, Welch Two Sample t-test p-value of 0.04812) (Fig 3). Of the 806 UCE alignments containing >2 taxa, we selected 368 alignments to create a ≥70% complete matrix (n = 33 taxa) with a total of 171,290 bp for all further phylogenetic analyses. The mean number of phylogenetically informative sites (PIC) per locus was 195.2 (range 76–387 PIC; SE ±3.1) (Fig 4), the loci had a mean length of 465.4 sites (range of 231–802 sites; SE ±5.6), and locus length was correlated with the number of PIC (R-squared 0.5139, p-value 2.2e-16) (Fig 4).

Fig 3. Left: Boxplot of number of UCEs by preservation type.

Fig 3

Average number of UCEs for modern ethanol preserved specimens was higher than for museum pinned specimens. Right: Plot of number of UCEs vs their date of collection. Plot shows general trend of fewer UCEs captured for the older the specimens, however the exact rate of decrease would require more specimens systematically sampled by precise preservation type.

Fig 4. Left: Histogram of the number of potentially informative sites per locus from final data matrices, calculated with pis function it the R library ips.

Fig 4

Right shows linear regression between the number of phylogenetically informative sites and loci length. Loci of shorter length tend to have a narrower range of informative sites whereas longer loci tend to have a wider range, giving the distribution a clubbed appearance.

Phylogenetic analyses

Concatenated analyses

PartitionFinder suggested that the optimal scheme for the concatenated alignment contained 126 partitions, and ML analysis inferred a tree where most of the nodes were highly supported (≥95% bootstrap support (bs)) (Fig 5). We recovered the Leptopini as a paraphyletic grade containing the Eupholini. The genus Celebia was sister to all of the other Eupholini in all analyses. The three main genera of the Eupholini (Rhinoscapha, Eupholus and Gymnopholus) were polyphyletic. Rhinoscapha appeared in three separate places, the R. tricolor-clade sister to R. usta-clade which is sister to the Gymnopholus clade containing the two subgenera Gymnopholus s. str. and Symbiopholus. The Eupholus loriae-group was sister to the remaining taxa. These include the third subgenus of Gymnopholus, Niphetoscapha which was sister to the R. tuberculata-clade. The clade of Niphetoscapha+R. tuberculata-clade was sister to the R. doriae-clade. The clade of Niphetoscapha+R. tuberculata-clade+R. doriae-clade is sister to the remaining members of Eupholus.

Fig 5. Phylogenetic tree results of the Eupholini weevils, branch colors correspond to species clades: LEFT: RAxML tree from concatenation of loci.

Fig 5

Dashed lines denote nodes that differ between trees. Node values indicate bootstrap support values. RIGHT: ASTRAL species tree, input trees derived from multi-partitioned MrBayes analyses of individual gene trees. Node values are derived from posterior distribution of MrBayes trees (minus burn-in) where each sample of the MCMC generation represents a bootstrap sample to ASTRAL.

Species tree analyses

The PartitionFinder results, using the initial 6 character sets under the GTRGAMMA site rate substitution model (for use in RAxML), had the majority of loci with between 1–2 character partitioning schemes: 53 loci with a single partition, 204 loci with 2 partitions, 102 loci with 3 partitions and 9 loci with 4 partitions. There was no difference in topology between the ASTRAL species tree for the full analyses, and the ASTRAL species tree where we removed loci producing gene trees having the lowest 10% quantile of average bootstrap support removed, and the ASTRAL species tree where we removed both these low support species trees and loci we identified as being potentially over-saturated (see S1 and S2 Figs). There was also essentially no difference in support between these tree subsets. Although some nodes varied in their support across analyses, poorly supported nodes did not show increased support between the reduced subsets of trees and the complete set (S1 and S2 Figs). The relationships between the Leptopini and Eupholini remained the same between species tree topologies and concatenated gene tree. The species trees recovered the same species groups as in the concatenated analyses, but the backbone of the tree was substantially different. For the particular differences between species groups see Figs 59. The PartitionFinder results, using the initial 6 character sets in Bayesian phylogenetic analyses (MrBayes MCCT trees) selected the majority of loci with more than 2 partitions according to the AICc metric: 16 loci with 1 partition, 109 loci with 2 partitions, 148 loci with 3 partitions, 81 loci with 4 partitions and 14 loci with 5 partitions, 0 loci had 6 partitions. We also compared locus length to the number of partitions in the MrBayes and RaxML analyses, and found that in both analyses locus length was positively correlated with the number of partitions ([MrBayes: p-value = 5.251e-08, R-squared = 0.078], [RaxML: p-value = 2.2e-16, R-squared = 0.2237]), (see S1 File, partitioning data). We assessed convergence between the two independent MrBayes runs using the average standard deviation of split frequencies using a custom unix script (see S1 Supporting Information Links). All but two analyses were at or below 0.01 in average standard deviation of split frequencies, the exceptions being 0.02 and 0.03, we subsequently produced the MrBayes MCCT trees.

Fig 9. Phylogenetic tree results of the Eupholini weevils, branch colors correspond to species clades: LEFT: ASTRAL species tree, input trees derived from single-partitioned RAxML analyses (each gene tree reconstructed using a single partition), of individual gene trees.

Fig 9

RIGHT: ASTRAL species tree, input trees derived from multi-partitioned RAxML analyses of individual gene trees. Node values indicate bootstrap support values.

The most substantial topological difference is between the RAxML-concatenated gene tree and the species tree analyses (see Fig 5, Table 1). Although the partitioning results differed in the number of partitions, the ASTRAL species trees derived from MrBayes-multi-partition gene trees were only slightly different from the RAxML-multi-partition derived gene trees (Fig 6). The Eupholus loriae-group and the Rhinoscapha tricolor-clade switched as being the sister clade to the remaining Eupholini. This node was not well supported in both analyses. These clades showed similar differences between both ASTRAL-RAxML-multi-partition and ASTRAL-RAxML-single-partition (Fig 7). The same difference also appears between the Eupholus loriae-group and the Rhinoscapha tricolor-clade in the ASTRAL MrBayes-single-partition and ASTRAL-MrBayes-multi-partition (Fig 8). The two MrBayes (single and multi-partitioned) derived ASTRAL species trees had the additional difference between the placement of the E. geoffroyi clade. The ASTRAL-RAxML-single-partition and MrBayes-multi-partition species trees only differed between the placement of the E. geoffroyi clade and the placement of Letopius clavis (Fig 9).

Table 1. Pairwise comparisons of topological distances between different tree reconstruction methods used.

Tree to tree topological distance metrics used the subtree prune and regraft distance (SPR_DIST), Robinson-Foulds distance (RF_DIST), path distance metric between pairs of taxa (PATH.DIFF_DIST). The ASTRAL trees based on multiple partitions are the ASTRAL RAxML tree using six character-sets (astral_raxmltree), ASTRAL RAxML tree based on single character-sets/partitions (astral_singlepart_raxmltree), ASTRAL MrBayes tree using six character-sets (astral_mrbtree) and ASTRAL MrBayes tree based on single character-sets/partitions (astral_singlepart_mrbtree). Results of lower triangle shown.

SPR_DIST raxml_concat astral_raxmltree astral_singlepart_raxmltree astral_mrbtree astral_singlepart_mrbtree svdq_tree
raxml_concat -          
astral_raxmltree 5 -        
astral_singlepart_raxmltree 5 2 -      
astral_mrbtree 7 3 3 -    
astral_singlepart_mrbtree 5 1 1 4 -  
svdq_tree 3 5 3 5 4 -
RF_DIST raxml_concat astral_raxmltree astral_singlepart_raxmltree astral_mrbtree astral_singlepart_mrbtree svdq_tree
raxml_concat -          
astral_raxmltree 14 -        
astral_singlepart_raxmltree 14 4 -      
astral_mrbtree 18 8 8 -    
astral_singlepart_mrbtree 12 2 2 10 -  
svdq_tree 10 10 6 10 8 -
PATH.DIFF_DIST raxml_concat astral_raxmltree astral_singlepart_raxmltree astral_mrbtree astral_singlepart_mrbtree svdq_tree
raxml_concat -          
astral_raxmltree 46.5 -        
astral_singlepart_raxmltree 44.8 27.7 -      
astral_mrbtree 54.4 24.1 29.1 -    
astral_singlepart_mrbtree 37.5 23.0 15.9 33.3 -  
svdq_tree 42.0 33.3 19.0 32.7 24.6 -
Fig 6. LEFT: ASTRAL species tree derived from multi-partitioned RAxML trees.

Fig 6

Node values indicate bootstrap support values. RIGHT: ASTRAL species tree, input trees derived from multi-partitioned MrBayes analyses of individual gene trees. Node values are derived from posterior distribution of MrBayes trees (minus burn-in) where each sample of the MCMC generation represents a bootstrap sample to ASTRAL.

Fig 7. Phylogenetic tree results of the Eupholini weevils, branch colors correspond to species clades: LEFT: SVDQuartets species tree.

Fig 7

Dashed lines denote nodes that differ between trees. Node values indicate bootstrap support values. RIGHT: ASTRAL species tree, input trees derived from multi-partitioned MrBayes analyses of individual gene trees. Node values indicate support values of MrBayes posterior (minus burn-in) used as ASTRAL bootstrap replicates.

Fig 8. Phylogenetic tree results of the Eupholini weevils, branch colors correspond to species clades: LEFT: ASTRAL species tree, input trees derived from single-partitioned MrBayes analyses (each gene tree reconstructed using a single partition), of individual gene trees.

Fig 8

RIGHT: ASTRAL species tree, input trees derived from multi-partitioned MrBayes analyses of individual gene trees. Node values indicate support values of MrBayes posterior used as ASTRAL bootstrap replicates.

The results from the SVDQuartets species tree were much the same as those from the other two species tree analyses. In particular, the backbone topology was most similar to the results of the ASTRAL species tree derived from the MrBayes-multi-partition analyses. One difference is that E. compositus is sister to E. azureus in the SVDQuartets species tree (Fig 10). Also, E. cf. schoenherrii is sister to the E. geoffroyi-clade in the SVDQuartets and ASTRAL-MrBayes-multi-partition trees, whereas the ASTRAL-RAxML-multi-partition tree have the E. geoffroyi-clade sister to the other E. schoenherrii species (Figs 6 and 7). Only the placement of E. cf. schoenherii was strongly supported in both the SVDQuartets and ASTRAL-MrBayes-multi-partition trees, but no other conflicting nodes were strongly supported.

Fig 10. Linear regression of logit proportion of UCE loci captured versus specimen age.

Fig 10

Number of UCE loci and specimen age for; Xylocopa (carpenter bees) from Blaimer et al. 2016, Aphelocoma (scrub-jays) from McCormack et al. 2016, Eupholini (smurf weevils) from this study. Specimen age is in years from when individual was first collected and preserved. Regressions show a decline in the number of UCE loci captured as specimen age increases, the rate of decline is similar between studies.

The topological tree-to-tree metrics derived from the RF-dist, Path-dist and SPR-dist all give relatively the same result (Table 1). These values show that the ASTRAL-RAxML-single-partition and ASTRAL-MrBayes-single-partition trees have the most similar topology. They are then most similar to the ASTRAL-RAxML species tree based on multiple partitions. The ASTRAL-RAxML-single-partition species tree is the most similar of the species tree methods to the RAxML-concatenated gene tree. The Path-dist metric shows that the ASTRAL-MrBayes species tree based on multiple partitions and SVDQuartets species trees are slightly more similar to each other than is the ASTRAL-RAxML-multi-partition species tree to the SVDQuartets species trees, while the other metrics give equal difference between the ASTRAL species trees and the SVDQuartets species trees. Most importantly the species tree methods are more similar to each other, except for the SVDQuartets species tree as mentioned above, than they are to the concatenated RAxML topology.

UCE capture and preservation

Within the Eupholini dataset, we used a GLM to test which variable (specimen age and or preservation type) has the most influence on the UCE number, we found that specimen age was the only significant contributor to the number of UCEs captured (p-value of 0.00886 for the coefficient of specimen age vs p-value of 0.75418 for collection type). Next, we compared our dataset to those of two other studies which utilized UCEs and museum samples, specifically Xylocopa (carpenter bees) [11] and Aphelocoma (scrub-jays) [10]. From the likelihood-ratio test results we find that the preferred model is one with a single slope (Chi-square test p-value 0.8598), indicating that UCE capture rate through time is approximately the same between datasets (Fig 11).

Fig 11. Linear regression of logit proportion of UCE loci captured verses specimen age.

Fig 11

Number of UCE loci and specimen age for; Xylocopa (carpenter bees) from Blaimer et al. 2016, Aphelocoma (scrub-jays) from McCormack et al. 2016, Eupholini (smurf weevils) from this study. Specimen age is in years from when individual was first collected and preserved. Regressions show a decline in the number of UCE loci captured as specimen age increases, the rate of decline is similar between studies.

Discussion

UCEs are now widely used for different vertebrate taxa [5,7,5056]. In invertebrates, UCEs have been used in Hymenoptera (primarily ants [8,21,11,5760]), but also Coleoptera [61] and arachnids [62]. Transcriptome-based analyses constitute another major source of phylogenomic data in arthropods [6369]. For hyperdiverse beetles, there is great potential for both UCE- and exon-based markers to help resolve relationships within this diverse group. Here, we tested the utility of UCEs designed across Coleoptera on a radiation of weevils from Australasia. We captured significantly more loci than Baca et al. (2017) [61] which focused on Adephaga beetles (305 at 50% vs [368 at 70%, 537 at 50% this study]). This is probably due to the phylogenetic distance between the species from which the probes were designed and the target taxa. The Coleoptera UCE markers capture a similar number of loci when compared to the 1st round of Hymenoptera UCEs. This suggests that with more forthcoming weevil genomes (personal comm. D. McKenna), capturing thousands of UCE loci could be possible. Why UCEs are not more universal within Coleoptera as compared to vertebrate lineages is still an open question. One possible explanation is that beetles are extremely diverse both in terms of life history and genomics. Recent studies of beetle genomes show a large diversity in their size and complexity, potentially even incorporating fungal genomic elements [70]. Additionally, Coleoptera originated ~300 Ma [71,72], compounding the problem, for comparison the Mammalia–Root ~223 Ma [73], and the Palaeognathae (ratite) and Neognathae (birds) divergence of ~100 Ma [74].

Here, we also find that we can enrich UCE loci from Coleoptera museum specimens, which can significantly aid sampling efforts for future studies, similar to other studies [11,75,76]. Our results are consistent with the findings of McCormack et al. 2016 [10] and Blaimer et al. 2016 [11], who showed a gradual decline in UCE capture rate with specimen age (Fig 11). Generally speaking, the capture rate we observed correlates similarly to the aforementioned studies, according to a likelihood ratio test between models’ slopes, which is re-assuring because Coleoptera, due to their thicker exoskeletons compared to other insects (e.g. Drosophila), may undergo a different decomposition process following collection. We hypothesize that in large insects and or those with thick cuticle, their initial drying time (period spent air drying for preservation after dispatching the insect) can be longer than other insects, and length of initial drying time is key to preventing decay and preserving the specimens for later use [10,11,77,78]. We suspect that in the case of tropical Coleoptera with thick exoskeletons, more decay tends to occur because of their thickened cuticle and the humid conditions under which they are collected, however this needs to be formally quantified. Additionally, there is substantial variance in the procedures through which beetle specimens are preserved. For example, when collecting weevils on the Solomon Islands in 1959, weevil researcher Dr. Charles O’Brien, would actively dry his weevils at the end of the day on top of a kerosene powered lantern before packing them into a tin with cotton (personal communication). With such information one can start to make better sense as to why some historic samples fail and others work relatively well. For example potentially overheating samples (as described above) could shear DNA as well as specimens that were not dried quickly could have DNA degradation due to enzymatic decomposition. Having such information will be invaluable to identify particular collections that may harbor particularly valuable specimens for genomic work. More systematic studies need to be undertaken to fully explore this area of museum phylogenomics in beetles and preservation methods, but our results at least show the feasibility of using tropical beetle specimens to bolster sampling.

All of our phylogenetic reconstructions show the same result concerning the classification of the Eupholini: the main genera Eupholus, Gymnopholus and Rhinoscapha are polyphyletic. This should not be too surprising because they were described and delineated in the late 19th and early 20th centuries when there were fewer known species and inadequate examination of internal structures (such as genitalia) during classification. For instance, in Eupholus, species were apparently assigned to that genus mostly based on coloration, i.e. all bright blue or green species were placed in Eupholus [16].

Lastly, for the majority of loci we found that by using multiple character sets which resulted in multiple partition schemes was a better fit for the data, than a single partitioning scheme according to AICc model selection. This indicates that using each UCE locus as a single partition as commonly performed may not be the best fit for the data. With these data, differences appear between species tree topologies based on both the partitioning strategy and initial method used to reconstruct the gene trees. This indicates that using only single partitions may not be adequate for many UCE loci. Additionally, the method used to reconstruct the gene trees also has an effect on the resulting species tree topology. What general effect gene tree partitioning has on species tree reconstruction and resulting branch lengths remains an open question.

We find great potential for UCE loci to tackle some of the more challenging issues in beetle systematics such as in the Eupholini radiation. We suggest that the use of UCEs and similar phylogenomic markers may also provide insights into the deeper relationships of diverse, higher-order taxonomic groups such as the Curculionoidea. Museum collections should also be recognized as a more prominent genomic resource, especially for tropical arthropods. Here, we demonstrate with a logistically challenging group how collection efforts of the past can aid current and future studies by increasing sampling, an undertaking that would otherwise require many years of effort to recollect the same samples. What steps museums should take to help preserve this genomic resource should be studied in a systematic fashion across institutions, as many existing museum specimens may be irreplaceable.

Supporting information

S1 Table. Data results.

(XLS)

S2 Table. Results of UCE loci alignments percent completeness (number of taxa missing from total number of taxa).

Matrix % complete indicate the percent completeness of the UCE loci alignments. 1st round of sequencing indicates the number of UCE loci alignments from the first round of sequencing of the 48 samples on an illumina HiSeq 3000 half lane. 2nd round of sequencing indicates the number UCE loci alignments from the concatenation of reads from the second round of sequencing (same 48 samples on an illumina HiSeq 3000 half lane) and the first round of sequencing.

(XLS)

S3 Table. Table of specimens used in this stud with museum where they are deposited the year the specimen was dispatched and the subsequent preservation type.

Museum abbreviations as follows: SNSB-Zoological State Collection, München (ZSM), State Museum of Natural History Karlsruhe (SMNK), California Academy of Sciences (CAS).

(XLS)

S1 Fig. ASTRAL-RAxML partitioned Eupholini tree, minus 10% lowest quantile of average bootstrap values loci.

(PDF)

S2 Fig. ASTRAL-RAxML partitioned Eupholini species tree, minus 10% lowest quantile of average bootstrap values and potentially oversaturated loci.

(PDF)

S1 Supporting Information Links. Links to supporting R/unix code, alignments and raw sequence reads.

(DOCX)

S1 File. Partitioning strategy by loci.

(ZIP)

Acknowledgments

The authors would like to thank Joe Russack and Brian Simison for help with using the linux cluster and software installs at the CAS. We would also like to thank Misha Leong and Jun Ying Lim for their help with statistical analyses. We would like to thank Marek Borowiec and the other 3 anonymous reviewers for their comments that helped to improve this manuscript. The New Guinea Binatang Research Center has provided extremely valuable assistance with collections in Papua New Guinea. Moreover, Bega Inaho (Goroka) and Agus Faknik (Abepura) helped collect specimens. The field work in New Guinea would not have been possible without the generous hospitality and help of many local people, and we thank all of them very warmly. M.H.V.D. and R.L. (salary and lab work/sequencing) were funded by NSF award DBI #1402102, A.R. by DFG BA2152/10-1 & 3 / RI1718/3-1. We are indebted to the PNG Department of Environment and Conservation for issuing permits (#014282) in PNG; to the Queensland Government, Department of Environment and Heritage Protection for granting collecting permits in many Australian National Parks, and the Australian Government, Department of Environment for an export permit.

Data Availability

Links to all UCE loci use in phylogenetic analyses: 10.6084/m9.figshare.5172478. R/UNIX CODE at MHVD’s github page: https://github.com/matthewhvandam/weevil-UCE/tree/master. Raw sequence data: NCBI BioProject ID: PRJNA394929.

Funding Statement

M.H.V.D. and R.L. (salary and lab work/sequencing) were funded by NSF award DBI #1402102 and A.R. by DFG BA2152/10-1 & 3 / RI1718/3-1. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Table. Data results.

(XLS)

S2 Table. Results of UCE loci alignments percent completeness (number of taxa missing from total number of taxa).

Matrix % complete indicate the percent completeness of the UCE loci alignments. 1st round of sequencing indicates the number of UCE loci alignments from the first round of sequencing of the 48 samples on an illumina HiSeq 3000 half lane. 2nd round of sequencing indicates the number UCE loci alignments from the concatenation of reads from the second round of sequencing (same 48 samples on an illumina HiSeq 3000 half lane) and the first round of sequencing.

(XLS)

S3 Table. Table of specimens used in this stud with museum where they are deposited the year the specimen was dispatched and the subsequent preservation type.

Museum abbreviations as follows: SNSB-Zoological State Collection, München (ZSM), State Museum of Natural History Karlsruhe (SMNK), California Academy of Sciences (CAS).

(XLS)

S1 Fig. ASTRAL-RAxML partitioned Eupholini tree, minus 10% lowest quantile of average bootstrap values loci.

(PDF)

S2 Fig. ASTRAL-RAxML partitioned Eupholini species tree, minus 10% lowest quantile of average bootstrap values and potentially oversaturated loci.

(PDF)

S1 Supporting Information Links. Links to supporting R/unix code, alignments and raw sequence reads.

(DOCX)

S1 File. Partitioning strategy by loci.

(ZIP)

Data Availability Statement

Links to all UCE loci use in phylogenetic analyses: 10.6084/m9.figshare.5172478. R/UNIX CODE at MHVD’s github page: https://github.com/matthewhvandam/weevil-UCE/tree/master. Raw sequence data: NCBI BioProject ID: PRJNA394929.


Articles from PLoS ONE are provided here courtesy of PLOS

RESOURCES