Skip to main content
DNA Research: An International Journal for Rapid Publication of Reports on Genes and Genomes logoLink to DNA Research: An International Journal for Rapid Publication of Reports on Genes and Genomes
. 2024 Feb 16;31(2):dsae005. doi: 10.1093/dnares/dsae005

The genome of a globally invasive passerine, the common myna, Acridotheres tristis

Katarina C Stuart 1,2,, Rebecca N Johnson 3, Richard E Major 4, Kamolphat Atsawawaranunt 5, Kyle M Ewart 6,7, Lee A Rollins 8, Anna W Santure 9, Annabel Whibley 10
PMCID: PMC10917472  PMID: 38366840

Abstract

In an era of global climate change, biodiversity conservation is receiving increased attention. Conservation efforts are greatly aided by genetic tools and approaches, which seek to understand patterns of genetic diversity and how they impact species health and their ability to persist under future climate regimes. Invasive species offer vital model systems in which to investigate questions regarding adaptive potential, with a particular focus on how changes in genetic diversity and effective population size interact with novel selection regimes. The common myna (Acridotheres tristis) is a globally invasive passerine and is an excellent model species for research both into the persistence of low-diversity populations and the mechanisms of biological invasion. To underpin research on the invasion genetics of this species, we present the genome assembly of the common myna. We describe the genomic landscape of this species, including genome wide allelic diversity, methylation, repeats, and recombination rate, as well as an examination of gene family evolution. Finally, we use demographic analysis to identify that some native regions underwent a dramatic population increase between the two most recent periods of glaciation, and reveal artefactual impacts of genetic bottlenecks on demographic analysis.

Keywords: Sturnidae, genome assembly, methylation, transposable elements, demographic history

1 Introduction

Invasive species are organisms that successfully establish outside their native range, whose populations expand demographically and spatially, and may cause negative impacts including ecological, environmental, or economic damage.1,2 Invasive species can undergo rapid evolution following introduction to a novel environment, where there may exist a radically different set of biotic and/or abiotic conditions.3–5 Introductions of small numbers of individuals often result in a genetic bottleneck (a sudden reduction in genetic diversity). These features of introduction provide an interesting avenue through which to investigate theories and concepts regarding rapid adaptation in the context of low genetic diversity, genetic inbreeding, mutational load, and novel genetic variation.6 Additionally, studying evolutionary processes within invasive populations may be useful for invasive species management (e.g. range forecasting7,8), and can elucidate small population paradigms relevant to the management of vulnerable and/or declining native populations, given the bottlenecks invasive species typically experience.9 Although the increasing affordability of genetic sequencing offers opportunity to assess the mechanisms of rapid adaptation, genomic resources are missing for most invasive species.2

Sturnidae are a songbird family, comprising more than 100 species spanning a wide range of habitats across the globe.10 Included in this number is the invasive Acridotheres tristis, the common myna (Fig. 1a). The common myna is listed by the IUCN as one of only three bird species on the ‘World’s 100 worst’ invasive species list.11 The species is native within central to southeast Asia, and has been either deliberately or accidentally introduced to Australia, New Zealand, Israel, South Africa, and the United States of America, as well as numerous smaller islands and island groups (e.g. Mauritius, Réunion, Fiji, New Caledonia, Hong Kong, Cayman Islands, Seychelles) (Fig. 1b).12–14 The common myna’s ecological impacts are particularly pronounced in island ecosystems15,16 and have also been studied within Australia, New Zealand, and South Africa, where their widespread geographic coverage has elicited research interest into their invasion history and impacts.14, 17–19 This species has been the focus of ecological and conservation research owing to their aggressive territorial behaviour that may exclude native avian species from nesting structures and foraging areas,20–22 though evidence for this is mixed in some environments.23 In addition to this, there are also concerns about this species regarding pathogen spread, nesting nuisance, and crop interference, all of which has led them to be categorized as an agricultural pest in most of their introduced ranges.24–26

Figure 1.

Figure 1.

The common myna (Acridotheres tristis), and its global distribution across native and invasive ranges. Panel (a) Acridotheres tristis (photo credit Rameez Remy). Panel (b) depicts the global distribution of the species, with native range indicated in brown, invasive range indicated in blue.

Population genomics studies of the common myna are relevant for understanding the impacts of reduced genetic diversity on adaptive potential. This species has undergone multiple concurrent and sequential bottlenecks across its globally invasive range, and yet has established itself across a diverse array of environments.17, 19, 27, 28 Thus, this system provides the opportunity to study the factors underlying invasion success, rapid adaptation, and population persistence. Further, the common myna is a relative of the European starling (Sturnus vulgaris), another globally invasive passerine with a similar introduction history.29 Such interspecies comparisons will provide an opportunity to examine whether molecular evolutionary processes behave in a conserved or stochastic manner across invasions of similar phylogenetic, ecological, and historical nature.

Here we present the genome assembly of the common myna, A. tristis, assembled through a combination of long-read (Oxford Nanopore Technologies) and linked-read (10x Chromium) sequencing. This reference genome will aid further research into the population genetics and evolutionary genomics of this ecologically important species. To provide genomic context for these studies, we describe the genomic landscape of this species, including recombination, DNA methylation, and single nucleotide polymorphisms (SNPs), as well as briefly examining gene family expansions and the demographic history of native and invasive populations.

2 Materials and methods

2.1 Sampling and sequencing

A male common myna was collected from Newcastle, Australia (−32.935, 151.751) on 15/07/2014 and the snap-frozen tissues (liver, heart, testis, breast muscle) were stored at −80°C (Australian Museum Registration Number O.76569, local ID 13099). Nucleic acid extractions (DNA on heart, liver and breast muscle, and total RNA on all tissues) were conducted, and 10x Chromium linked-read, MinION Oxford Nanopore Technologies (ONT) long-read, and short-read cDNA sequencing was performed (SupplementaryAppendix 1: sampling, extraction, and sequencing).

Additionally, representative common myna individuals from globally distributed native and invasive populations18, 19 were whole genome resequenced (WGR) using short-read sequencing on the Illumina Novaseq platform (150 bp paired-end reads) with sequencing brokered by Custom Science, Australasia (Supplementary Table S1).

2.3 Genome assembly

The assembly process is summarized in Fig. 2. First, we processed raw ONT long-reads with guppy v6.2.1 (Oxford Nanopore Technologies; for config settings see: Supplementary Table S2). After basecalling, porechop v0.2.4 was used to detect and remove residual sequencing adaptors.30 These reads were assembled into an initial assembly with flye v2.9.131 using the settings ‘--nano-raw’, ‘--no-alt-contigs’ and ‘--scaffold’. The ONT reads were mapped back to the draft assembly and polished using Medaka v1.4.332 with default parameters and the r941_min_sup_g507 model. The assembly was then further polished through two iterations of nextpolish v1.4.1,33 using Chromium 10x reads that had been stripped of their barcodes using Scaff10x v5.0 (https://github.com/wtsi-hpag/Scaff10X) and then quality filtered using trimgalore v0.6.734 with default parameters plus the ‘–2-colour’ flag enabled. Reads were mapped to the draft genome using bwa v0.71735mem and alignments were sorted and compressed using samtools v1.14.36 The linked-read data was not used for scaffolding as it did not improve assembly continuity or provide useful long-range information. The polished genome was then manually curated to correct for misassemblies (Supplementary Appendix 2: Manual genome curation). While Hi-C data was not available at the time of assembly construction, there existed a common myna genome that was recently released through the Vertebrate Genomes Project (VGP), which provided a valuable species-specific means of assigning chromosomal identities and order to the curated contigs. The assembly was aligned to the VGP common myna genome (GCA_027559615.1) using ragtag v2.1.037 to produce synteny based pseudo-chromosomes (one VGP contig was excluded from scaffolding, see: Supplementary Appendix 3, Supplementary Figs. S4 and S5). These scaffolds were then renamed based on their synteny with the major chromosomes of the zebra finch genome (Supplementary Fig. S6). The final genome assembly produced from the above protocol is referred to as AcTris_vAus2.0.

Figure 2.

Figure 2.

The common myna (Acridotheres tristis) genome workflow summary information. ONT = Oxford Nanopore Technologies long reads. 10x = 10x chromium linked Illumina reads. RNA = Illumina short reads. WGR = Illumina whole genome resequencing.

2.4 Assembly evaluation

We used k-mer frequency analysis through jellyfish v2.3.038 and genomescope39 to assess genome size. A k-mer histogram was produced using all the trimmed linked-read gDNA raw data for an initial value of 20-mer based on an approximated genome size of just above 1 Gb (k-mer values 18, 19, 21, and 22 were also assessed). Counts for k-mer values of 7 or below were attributed to extremely rare reads or sequencing errors, and were removed. The genome size was then estimated by dividing the total number of k-mers over all k values by the mean coverage.

In addition to this, merqury v1.340 was used to assess completeness and the assembly consensus quality value (QV). A k-mer value of 20 was selected to construct a meryl v1.4 database based on a genome size of 1.04 Gb.

We assessed the AcTris_vAus2.0 genome (as well as the VGP common myna genome) assembly contiguity with seqsuite v1.27.041, and completeness with busco v5.3.242 using genome mode and ‘Aves’ lineage from the ODB10 dataset.

2.5 Transcriptome assembly

We assembled a genome-guided transcriptome assembly from the three Illumina-sequenced RNA tissues (liver, heart, and testes). Raw reads were trimmed and quality filtered using fastp v0.23.243 (default settings), before being mapped to AcTris_vAus2.0 using hisat2 v2.2.144 (settings: --rna-strandness FR –dta --phred33). The resulting SAM files were sorted and converted to BAM using samtools v1.15.1, before being assembled into a gtf file using stringtie v2.2.0.45 Each tissue was initially assembled into individual gtf files, and these tissue-specific transcriptomes were then combined into one using the stringtie –merge function. The overlap across tissues within this merged transcriptome was assessed using gffcompare v0.12.6.46 We assessed the completeness of this three-tissue transcriptome using busco (--mode transcriptome).

2.6 Genome annotation

For the genome annotation of the common myna, we generated ab initio gene predictions using braker v3.0.2,47, 48 homology-based gene predictions using gemoma v1.9,49 and merged these into a singular annotation using tserba v1.1.0.50 First, we soft masked the genome using the joint repeat library described below (see section 2.7.3: Repeat and transposable element annotation). Braker was provided with the three separate RNAseq data BAM files (one for each tissue, see 2.5: Transcriptome assembly), and the UniProt/Swiss-Prot database51 as evidence. Braker employs both RNA and protein evidence to run genemark-ept,52prothint,53 and train augustus54, with redundant training gene structures filtered out using diamond.55gemoma was run concurrently to this, and was provided the annotation information for 26 avian species (Supplementary Table S3) (parameters: tblastn = false GEMOMA.m = 200000 GEMOMA.Score = ReAlign AnnotationFinalizer.r = SIMPLE pc = true o = true).

These two annotations were then merged using tsebra, with the braker gene set enforced so that the final annotation would retain the species specific abinitio sequences predicted by braker, but would merge in the homology based sequences called by gemoma (configuration file settings: hint weightings P 0.1, E 10, C 5, M 1; intron_support 0.1, stasto_support 1; e_1 0.1, e_2 0.5, e_3 0.05, e_4 0.18). We generated functional annotation of protein-coding genes using eggnog-mapper v2.1.1056 using diamond (-m diamond) to perform the protein sequence searches.

We assessed the completeness of the final annotation (as well as the individual braker and gemoma annotations) using busco (--mode transcriptome). We assessed the quality of predicted transcripts using saaga v0.7.757 with the Ensembl Gallus gallus proteome (GCF_016699485.2) as a reference, and summarized these statistics overall as well as separately for macro and micro chromosomes, where we define macrochromosomes as chromosomes 1, 1A, 2, 3, 4 and 5 (Supplementary Fig. S7). We generated annotation statistic summaries using the agat58 agat_sp_functional_statistics.pl script, and used bedtools59 to plot gene density in 1 Mb windows across the whole genome.

2.7 Genomic landscape

We explored the landscape of SNPs, methylation, repeat, and transposable element (TE) content, and linkage disequilibrium (LD) based inference of recombination along the genome assembly of AcTris_vAus2.0, to provide context and resources for future genomic studies on this species.

2.7.1 SNP variant density

The WGR data from 15 individuals across two native range Indian sample sites (Tamil Nadu = TN and Madhya Pradesh = MP, Supplementary Table S1) were used to quantify genome-wide SNP density. Raw reads were processed using trim_galore v0.6.7 and the reads were mapped to the AcTris_vAus2.0 genome assembly using bwa v0.7.17 mem, before being processed by samtools into sorted BAM files. Duplicate reads were marked using picard v2.26.1060MarkDuplicates, and variants jointly called across samples using bcftools v1.1336mpileup (-C 50 -q 20 -Q 25), call and view functions. Indels were excluded from the dataset, and SNPs were filtered for a minimum depth of 5, a maximum depth of 50, and non-variant sites were removed using vcftools v0.1.1561 (--mac 1). SNP density was then calculated for 1 Mb bins along the genome, and was also summarized across macro and micro chromosomes.

2.7.2 DNA methylation profiling

We used guppy v6.2.1 to perform extended DNA methylation base calling of the ONT long-reads against the AcTris_vAus2.0 genome, with each flow cell batch run separately (for config files see: Supplementary Table S2). Reads that were assigned a ‘pass’ score were then combined and sorted using samtools, before we aggregated modified base counts using modbam2bed v0.9.0 (https://github.com/epi2me-labs/modbam2bed) (-m 5mC) to identify 5-methylcytosine (5mC) in a CpG context. Bedtoolscoverage was then used to assess this genome methylation coverage across three flow cell types separately in 1 Mb windows to check for consensus (Supplementary Fig. S8), before the bed files of all three flow cell types were combined into a joint modbam2bed run. The proportion of methylated reads for each methylated CpG site (with a minimum read depth of 5) was calculated, and CpG site counts were calculated for 1 Mb windows along the genome using Bedtoolscoverage (this was done separately for all CpG sites, and those with 75% or more methylated reads). The individual CpG methylation sites (filtered for a minimum coverage of 5) were then merged into DNA methylated regions using dmrfinder v0.362 using default settings other than a minimum count number of 5 (-r 5) and a minimum CpG sites per region set to 15 (-c 15) because we chose to focus only on those windows with a high density of CpGs. The proportion of methylated reads in each region were then summarized across macro and micro chromosomes, up to 5kb upstream of gene sequences, and within TEs.

2.7.3 Repeat and transposable element annotation

A repeat library was generated for the AcTris_vAus2.0 assembly using several means. We first generated a species-specific repeat library following the maker2 advanced repeat library construction protocol (http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced) with miniature inverted-repeat TEs identified using mite-tracker.63 We also identified TEs in the genome using earlgrey v2.0,64 which employs repeatmasker v4.1.2,65repeatmodeler v2.0.266 and the Dfam 3.6 database67 in a fully automated TE annotation pipeline to create de novo consensus sequences. Then, we assessed the repeat content of the AcTris_vAus2.0 genome using repeatmasker, using a joint repeat library by combining the maker2 advanced repeat library, earlgrey consensus sequences, and the Aves lineage specific sequences from the repeatmasker repeat database. We also ran the VGP common myna genome through earlgrey and annotated repeats using repeatmasker (using the same repeat library generated above but with the genome assembly specific earlgrey library).

2.7.4 Recombination profile

To generate a linear recombination landscape for each of the common myna pseudo-chromosomes, we used the SNP data generated above (see 2.7.1: SNP variant density) to run ldhat v2.2,68 which can estimate recombination rates from LD measures for SNPs in population genetic data. We analysed the two separate sample sites (Tamil Nadu = TN and Madhya Pradesh = MP) together because we believe the utility of increasing sample size to help resolve broad recombination patterns along chromosomes outweighs possible impacts of pooling individuals from two separate geographic sample sites. The SNP variant file was split into each separate chromosome, because ldhat must be run separately for each scaffold, and was converted into ldhat format using vcftools. We did not perform minor allele frequency filtering, because removing singletons does not have a large impact on program output when high-quality reference genomes are used.69 However, we did use vcftools to thin the SNP data (--thin 1,000) to reduce computation time and resources. Each chromosome was then run through ldhatinterval (-its 10,000,000 -bpen 5 -samp 5,000) and then ldhatstats (-burn 250) to summarize the output. The population-scaled recombination rate estimate Rho (ρ = 4Ner, where r is the recombination rate) was then plotted along each scaffold’s length to illustrate the linear landscape of recombination. We also examined LD decay in a pairwise manner between SNPs on the same chromosome (up to 10 Mb away from each other) using vcftools using the ‘--geno-r2’ function (--ld-window-bp 10,000,000).

2.8 Gene family evolution

We identified gene families and orthologous gene clusters of 14 species and identified expansions within these using orthofinder v2.5.270 and cafe5.71 We used four outgroup species of Homo sapiens (GCF_000001405.26), Mus musculus (GCF_000001635.2), Salvator merianae (GCF_000534875.1) and Podarcis muralis (GCF_004329235.1), and nine ingroup bird species of Lonchura striata domestica (GCF_002197715.1), Taeniopygia guttata (GCA_003957565.1), Gallus gallus (GCF_016699485.2), Ficedula albicollis (GCF_000247815.1), Cyanistes caeruleus (GCF_002901205.1), Serinus canaria (GCF_000534875.1), Parus major (GCF_001522545.2), Zonotrichia albicollis (GCF_000385455.1), and Sturnus vulgaris (vAU1.057), alongside the AcTris_vAus2.0 genome annotation. The longest transcript for each gene in the annotation file for each species was identified using agat agat_sp_keep_longest_isoform.pl and the protein sequence extracted using the matching genome assembly and gffread v0.12.7.46 We used seqkit v2.472 to remove protein sequences less than 30 amino acids long. In addition, a small number of species had translated protein sequences with premature stop codons; these were filtered out with seqkit. We ran these final protein sequences through orthofinder (-M msa -S blast -I 1.3) to produce a species tree and identify phylogenetic hierarchical orthogroups (HOGs) that had expanded or contracted on specific branches. timetree73 was then used to make the species tree ultrametric, which was then used alongside a summary of HOGs across the 14 species as input for cafe5 to determine which of these HOG had significant expansions or contractions in taxa and linages. We ran cafe5 five times with gamma rate categories count (-k) set to 1, 2, 3, 4, and 5 respectively. The log files of each run were checked for convergence, and the run with the smallest likelihood score was retained (k = 4). We identified which HOGs had reported a significant contraction or expansion across the phylogeny (P-value < 0.05) and filtered these for HOGs with expansions or contractions identified in AcTris_vAus2.0. These HOGs were then mapped to their corresponding orthogroup (OG), and the largest five sequences associated within each OG were functionally annotated with their associated gene ontology (GO) terms using interproscan v5.5174 (using the –goterms flag). The GO terms biological processes were then summarised separately for expanding and contracting sequences using revigo.75

2.9 Demographic inference from assembly

To estimate effective population size (Ne) over ancient timescales for the common myna we used the program psmc (Pairwise Sequentially Markovian Coalescent) v0.6.5,76 which employs hidden Markov models in a coalescent approach to identify historical recombination events in a single diploid genome. A single WGR representative individual was selected from four native range sample sites and six invasive range sample sites (Supplementary Table S1). The raw reads of these individuals were individually put through a variant calling pipeline similar to that described above (see section 2.7.1: SNP variant density), except that SNPs within 10 bp of an indel or overlapping repeat regions were excluded (see section 2.7.3: Repeat and transposable element annotation) following the psmc protocol used by Schield et al.77 We ran psmc for 30 iterations (-N 30), with the upper limit of time to most recent common ancestor set to 5 (-T 5), an initial h:q value of 5 (-r 5), and free atomic time intervals set to (4 + 30*2 + 4 + 6 + 10) based on recommendations for avians.77 We performed bootstrapping (50 iterations) to check for variation in Ne estimates. Results were then scaled using an estimated generation time of 2 years (approximate age of first breeding10) and a yearly mutation rate of 2.3 × 10-9 based on related avian species estimates.77, 78

To understand more recent Ne demographic changes, we used the tool stairway plot v2,79 which uses the population site frequency spectrum (SFS) for inferring demographic history. We ran this analysis first on one of the native range sample sites (Tamil Nadur, India, IND TN N = 8) using the SNP data called above (section 2.7.1: SNP variant density). Then, because we were interested in the effects strong genetic bottleneck effects within invasive ranges have on demographic history, we called and analysed SNP data for an invasive range sample site as well (Leigh, New Zealand, NZ LEI N = 8). SFS information was obtained for variant sites using vcf2sfs in r,80 and total number of observed nucleic sites (L) was set to the total sites in our SNP data (variant + non-variant) once filtered to only include individuals within each of the sample sites. A generational mutation rate was set to 4.6 × 10-9 (the yearly mutation rate above × generation time used in psmc analysis). All other parameters were run at their default settings.

3 Results and discussion

3.1 Genome assembly

A combination of long- and short-read technologies was used to assemble, scaffold, and polish the genome of the common myna (AcTris_vAus2.0). A total of 23.5 Gb of ONT long reads (Table 1) were used to create an initial genome assembly of 1,648 contigs with a contig N50 of 11.3 Mb and a total length of 1.046 Gb, a relatively continuous assembly given ONT coverage was lower than for many other avian genomes.40, 81, 82 Sequence curation and trimming reduced this down to 597 contigs, with a contig N50 of 10,406,399 and a total length of 1.041 Gb. After species-specific synteny scaffolding, the final genome comprised a total of 256 scaffolds, with a median scaffold length of 3,369 bp and 98.9% of the assembly that could be anchored to putative chromosomes (Table 2, Supplementary Fig. S7).

Table 1.

Library information of all sequencing data used in the construction of the Acridotheres tristis reference genome and annotation

Genetic input Sequencing platform Library Mean read length
(bp)
Total read count Total read length (Gb)
gDNA Hiseq X Ten Paired-end 10x Chromium 151 369,166,113 55.38
gDNA Oxford Nanopore Technologies Ligation 7,084.09 3,324,660 23.5
cDNA Illumina
Paired end sequencing
Heart 75 151,033,702 11.3
Liver 160,333,493 12.0
Testes 156,981,058 11.8

Total read count, total read length, and median/mean insert size all raw totals before quality filtering. All sequence data was obtained from the same individual 0.76569.

Table 2.

Genome assembly statistics for the Acridotheres tristis AcTris_vAus2.0 genome

Assembly statistic AcTris_vAus2.0
Total length (bp) 1,040,539,946
Number of scaffolds 256
Scaffold N50 (bp) 72,486,765
Scaffold L50 5
Largest scaffold (bp) 150,861,042
Mean scaffold length (bp) 4,064,609.16
Median scaffold length (bp) 3,369
Number of Contigs 597
Contig N50 (bp) 10,406,399
Contig L50 30
Gap (N) length (bp) 34,958 (0.00%)
GC content (%) 41.85%
busco (genome: Aves) 8,338
 Complete 8,108 (97.2%)
 Complete (single copy) 8,077 (96.9%)
 Complete (duplicated) 31 (0.4%)
 Fragmented 44 (0.5%)
 Missing 186 (2.2%)

Assembly contiguity statistics were calculated with seqsuite v1.27.0, and completeness with busco v5.3.2 using genome mode and ‘Aves’ lineage from the ODB10 dataset.

Chromosome identities were assigned to pseudo-chromosomes based on synteny to the model passerine the zebra finch, given avian karyotypes are highly conserved.83, 84 Our assembly, like many other avian genomes, was missing chromosome 16, which is difficult to assemble due to the many copies of the MHC gene it contains.85 The largest pseudo-chromosome length was 151 Mb in length (Table 2, Supplementary Fig. S7), and equated to the autosomal chromosome 2 which is the largest for most passerine species (though not for all avians86), while the major (Z) sex chromosome was the fourth largest of the assembled pseudo-chromosomes, which reflects the results of earlier karyotyping efforts that reported the same approximate macrochromosome size patterns.83

3.2 Genome assembly evaluation

Analysis of genome completeness using busco indicated that 97.3% of expected single-copy orthologs were complete and single-copy within the genome (Table 2). Genome size for the common myna was estimated using k-mer analysis of the 55.38 Gb raw short reads (Table 1) to be approximately 1.162 Gb using jellyfish (Supplementary Fig. S9, Table S4) and 1.074 Gb using genomescope. This placed AcTris_vAus2.0 at 90–97% completeness assuming the kmer estimates were not biased due to fluctuations in sequencing data coverage. The assembly has a consensus quality value (QV) of 44.1 estimated by jellyfish, as well as a global heterozygosity rate of 0.71% and genome repeat percentage of 15.6% estimated by genomescope. Together, this evidence suggests that the remaining length of the genome likely is biased towards non-coding regions.87 This gap between the estimated genome size and final assembly length is not unusual because avian species, even with their relatively small and repeat-light genomes, have been found to be missing 7–42% of their expected genome length in previous genome assemblies.88 While this is improving as long-read sequencing becomes more commonplace, enabling characterization of more hard-to-assemble regions,89, 90 there are still many questions on the evolution of avian genomes that remain underexplored or poorly understood91–94

3.3 Comparison with VGP genome

We compared the overall genome completeness statistics of this version of the common myna genome to the VGP common myna genome. While AcTris_vAus2.0 is shorter than the VGP genome by approximately 150 Mb (Supplementary Table S5), a majority of this extra length is contained on smaller contigs that are presently unassigned to a chromosome (Supplementary Fig. S1) and places the final genome length of 1.193 Gb above the k-mer based genome size estimate for our genome. The two genome versions have comparable busco completeness scores (97.3% for AcTris_vAus2.0, 97.1% for the VGP genome), meaning that likely these additional unplaced contigs are repeat heavy and/or non-protein coding regions, as is typical of genomic regions that are harder to place within larger scaffolds.95 Additionally, the repeat and TE content of the VGP genome was much higher than that of AcTris_vAus2.0 (18.8% and 9.8% respectively, Supplementary Fig. S10) and the ONT reads mapped very poorly to these fragments. With passerine genomes typically containing 10% or less repeat sequence content,96, 97 it is thus likely that some of this extra length contains inappropriately expanded repeats81 or haplotypes.98 These differences in genome length and repeat content (and genome structure: Supplementary Appendix 3) may in part reflect differences in the primary sequencing technologies used in each assembly (PacBio for the VGP genome, and ONT for AcTris_vAus2.0), though will also reflect pipeline decisions downstream of the initial genome assembly and possibly some genuine biological differences between source populations (AcTris_vAus2.0 is an Australian sourced bird, the VGP common myna genome was sourced from Israel). Although we generated the linked-read 10x dataset to aid in scaffolding, initial exploration of these datasets revealed that it did not significantly improve assembly contiguity for the genome presented in this study. The VGP genome therefore provided a species-specific means of generating chromosome level pseudo-chromosomes from the initial long-read assembly in the absence of Hi-C data for AcTris_vAus2.0, demonstrating how valuable research efforts such as the VGP93 and B10k99 are for enhancing the science of smaller consortiums and lab groups. In the era of genome assemblies being generated by large conglomerates en masse, there is still an important space for creation and release of additional genome versions of the same species, particularly when achieved through different sequencing technologies. Such efforts are complementary and will ultimately form the backbone of data needed for high-quality genome graphs and pan-genomes to better capture structurally diverse regions of the genome and reduce reference bias, particularly for species with considerable genetic structure across their range.44, 100

3.4 Transcriptome and Annotation

We obtained a total of 35.1 Gb short-read RNA sequencing data over three different tissues (Table 1), from which we generated 32,617 transcript sequences. RNA sequencing of the testes tissue yielded the highest number of unique transcripts, likely because of the alternative splicing occurring in this sex organ,101, 102 while unique RNA contributions to the transcriptome were nearly identical for the heart and liver tissues (Supplementary Fig. S11). busco analysis of the transcriptome revealed that despite just three tissues being combined in its creation, only 14.9% of busco genes were missing from this final transcriptome (Table 3), an amount comparable to other short-read transcriptomes.103 Nevertheless, expanding tissue diversity in the common myna is an important next step for full characterization of the transcriptomic landscape of the species, because a large portion of missing transcripts are thought to be due to tissue-specific expression.104 The transcriptome was then used for gene prediction when completing the annotation of the AcTris_vAus2.0 genome assembly.

Table 3.

Transcriptome and proteome statistics for the Acridotheres tristis AcTris_vAus2.0 genome

Statistic AcTris_vAus2.0
RNA-seq produced transcriptomea
busco (transcriptome: Aves) 8,338
 Complete 6,917 (83.0%)
 Complete (single copy) 4,162 (49.9%)
 Complete (duplicated) 2,755 (33.0%)
 Fragmented 184 (2.2%)
 Missing 1,237 (14.8%)
Gene prediction produced annotationb,*
busco (transcriptome: Aves) 8,338
 Complete 8,211 (98.5%)
 Complete (single copy) 8,157 (97.8%)
 Complete (duplicated) 54 (0.6%)
 Fragmented 47 (0.6%)
 Missing 80 (1.0%)
Genes Total number 19,836
Average length 23,344 bp
Mean transcripts per gene 2.1
Transcripts Total number 41,104
Average length 32,068 bp
Mean exons per transcript 12.2
CDS Total number 41,104
Average length 1,955
Average intron in CDS length 2,680
Exons Total number 502,825
Mean length 160
Gene Function Ontology term 17,016
Protein Family 16,289

Transcriptome and annotation completeness were calculated with busco v5.3.2 using genome mode and ‘Aves’ lineage from the ODB10 dataset. Annotation completeness statistics were calculated using the agat agat_sp_functional_statistics.pl script.

aBased on three tissue RNA-seq transcriptome assembled in this study using stringtie.

bBased on the tsebra merged braker and gemoma annotation.

*filtered for longest predicted transcript.

The final annotation of AcTris_vAus2.0 identified 19,836 genes and 41,104 transcript sequences across the genome. When restricted to just the longest transcript per gene, the total gene sequence coverage was 31,728,307 bp, equating to 3.05% of the genome’s length (Table 3). The final genome annotation had a busco completeness score of 98.4% (Table 3), and was a merge of two gene models, one produced by gemoma and one braker. gemoma, being homology-based and thus biased towards easier to predict genes in more conserved genomic regions, achieved the highest busco scores (97.7%, Supplementary Table S6), and a total gene and transcript count of 22,216 and 69,476, respectively. Nevertheless, braker performed well, identifying 13,773 gene sequences and 18,261 transcript sequences, with a fairly high busco score (88.0%, Supplementary Table S6). For the final annotation, eggnog assigned an identity to 17,016 genes (Table 3).

Annotation quality was assessed using saaga. The 19,836 longest transcript protein sequences were mapped to the high-quality Gallus gallus reference proteome (GCF_016699485.2), with 15,212 returning successful hits (76.7%) and 4,624 transcripts returning no hit against this reference (23.3%). Sequences with successful hits were on average longer (600 vs 309 amino acids) and contained more exons (11 vs 4 exons per sequence) compared to sequences failing to match. While longer unknown proteins may be indicative of legitimate novel sequences, it is likely that these also contain short sequences of incorrectly predicted or fragmented gene sequences.

3.5 Genomic landscape

We explore the genomic landscape of the common myna genome version AcTris_vAus2.0. Because of their repeat-sparse genomes, short generation times, and diverse ecological interactions, invasive passerines pose promising model systems in which investigate eco-evolutionary processes such as rapid adaptation. Further, comparative genomics across different invasive species offers us an opportunity to better understand the molecular mechanisms that underpin success within novel environments. To this end, comparing the genomic landscape of the myna to other invasive avians, including their close relative the European starling, Sturnus vulgaris, will help us better understand, for example, the predictive or stochastic nature of rapid evolution in invasive avians.105 To this effect, we use our newly constructed genome to characterize patterns along the genome (macro, micro, and major sex chromosomes) for important genetic features including single nucleotide polymorphisms (SNPs), gene density, methylation (specifically CpG sites), repeat and TE content, and finally linkage disequilibrium (LD) based recombination estimates.

3.5.1 Single nucleotide polymorphisms

Across the 15 whole genomes used to characterize allelic diversity, we identified a total of 22,992,315 SNPs post-filtering (variant sites of minimum depth 5, maximum allele depth 50), which represents 2.2% of the genome. We plotted this whole genome variant data (Fig. 3; track 1) to visualize regions where variant density is high as indicated by peaks, and low trough regions which are indicative of locally reduced variant density and thus may be interpreted as regions of high conservation across conspecifics of the species. Variant density was fairly consistent across the genome, with chromosome ends reporting relatively fewer SNPs likely due to difficulties in mapping to these low-complexity regions.106,107 The deficit of SNPs on chromosome Z in the common myna is interesting given the disproportionate role the major sex chromosome plays in adaptation and speciation,107–109 though may reflect variant calling from hemizygous individuals. When variant density was examined across macro and micro chromosomes separately, we observed that macrochromosomes and microchromosomes had very similar variant density profiles (Fig. 4a).

Figure 3.

Figure 3.

Chromosome coverage plots for the Acridotheres tristis AcTris_vAus2.0 genome. circlize plot of the 30 largest pseudo-chromosomes, with tracks (from the outside in) SNP density (1 Mb windows), gene density (1 Mb windows), CpG site density (1 Mb windows, CpG sites with 0-75% methylated reads in white, and 75-100% methylated reads in grey), and repeat density (0.5Mb windows), and recombination Rho (log corrected, not plotted for Z chromosome).

Figure 4.

Figure 4.

Genome summary information for the Acridotheres tristis AcTris_vAus2.0 genome. Panel (a) is the histogram of SNP density per 1 Mb window across macro and micro chromosomes. Panel (b) is the histogram of the proportion of methylated reads per CpG region across macro and micro chromosomes. Panel (c) is saaga annotation quality assessment of the proteome generated by the final annotation against the Gallus gallus reference proteome. Panel (d) is the average linkage disequilibrium (R2) in 1 Mb bins for macro and micro chromosomes. Panel (e) is the repeatmasker annotation of the macro and micro chromosomes separately using the custom assembled repeat library. Panel (f) is the earlgrey plot of Kimura Distances for the different classes of TEs across the genome.

3.5.2 Gene density

From our final annotation of 19,836 genes, we plotted gene density to reveal genomic regions of interest (Fig. 3; track 2). For instance, some gene density peaks aligned with highly non-variant regions (e.g. ~20% into chromosome 1), which suggests the genes contained within this region are highly conserved, indicating some level of purifying selection against novel variation.110 Alternatively, some regions of high gene density coincide with regions of high variant density (e.g. start of Z chromosome), pointing to high inter-individual diversity and diversifying selection.108 Gene density was generally higher on microchromosomes than macrochromosomes. Predicted gene sequences across these two groups of chromosomes had similar quality profiles as indicated by comparisons to the Gallus gallus reference proteome (Fig. 4c).

3.5.3 Repeat profiling

Profiling repeats across the common myna genome, we identified more repeat coverage on microchromosomes, though the difference was minimal (Fig. 4e). The reduced repeat content of the macrochromosomes and microchromosomes (Fig. 4e) compared to the overall assembly (Fig. S10c) indicates that the contig fragments not incorporated into the primary pseudo-chromosomes of the assembly are very high in repeat content, as would be expected for hard to assemble regions.88 We further examined TEs across the genome, finding that a majority of TEs were long interspersed nuclear elements (LINE); specifically chicken repeat 1 (CR1), and long terminal repeats (LTR); specifically endogenous retroviruses (ERV) (Fig. 4d and e, Supplementary Fig. S10), reflecting similar profiles found in other avians.111 The largest peak of TE expansion was mostly driven by LINE/CR1 elements, coupled with LTR/ERVL (Fig. 4f). Conversely, the more recent and smaller burst of TEs was dominated by several groups of ERVs, specifically and in order of contribution: ERVK, ERV1, and ERVL (Fig. 4f). This most recent burst of TEs has been seen in other avian species,82 though is unusual as most have TE peaks at greater Kimura distances.97, 112

3.5.4 DNA methylation

Using the ONT data we quantified DNA methylation proportions at 18,501,863 CpG sites across the genome. The methylated CpG site density was generally higher on microchromosomes compared to macrochromosomes (Fig. 3; track 3), though with similar profiles (Fig. 4b). A high density of highly methylated reads (75%+ methylated) is noted in several genomic regions, such as midway through Chr 1A and 4A, and throughout the major sex (Z) chromosome (Fig. 3; track 3). These CpG sites were then summarized into 175,596 methylated regions with an average of 36 methylated sites over an average of 370 bp. Of these methylated regions, 9,524 occurred 5kb (or less) upstream of gene start sites, and showed signals of hypomethylation (35.8% of reads methylated) compared to the genome-wide average of methylated reads per region (48.1%). In contrast to these gene-associated methylation patterns, we found hypermethylation in methylated regions overlapping TEs (63.0% of reads methylated). The common myna methylome has a similar level of overall CpG methylation, as well as gene- and TE- associate patterns, compared to other avian species, reflecting the role CpG modifications play in silencing TE transcription and regulating gene transcription.113, 114 Interspecific differences in methylome patterns are hard to directly compare because these will be highly impacted by different methylation profiling methods and analytical decisions.82, 115, 116 However, profiling these patterns in the common myna lays important groundwork for future studies into genome-methylome interactions.117, 118

3.5.4 Recombination

Using LD-based inference, we resolved the linear recombination profile of each autosomal chromosome. We identified that macrochromosomes had consistently higher recombination at telomeric ends, while recombination patterns along the microchromosomes were less predictable and often higher in mid-chromosomal regions (Fig 4a; track 5) reflecting trends seen in other avians.119 Pairwise linkage disequilibrium decay over macrochromosomes and microchromosomes were similar, though with the former having consistently higher average R2 scores until background recombination values were reached at a binned distance of approximately 10 Mb (Fig. 4d). Avian genomes generally report stronger linkage disequilibrium on macrochromosomes compared to microchromosomes,120, 121 though we note that some differences in linkage disequilibrium patterns for immediately adjacent markers120 may be impacted by analytical methods and marker density filtering.

3.6 Gene family evolution in Acridotheres tristis

We used orthofinder and cafe5 to summarize gene orthogroups over 14 species and identify expansion and contraction events. We identified a total of 19,060 phylogenetic hierarchical orthogroups (HOGs) across the 14 species included in our analysis (this number excluded HOG that had only a single copy across all species, or HOG with no variability across species). The common myna genome had a similar number of gene families that were lost compared to gained (Fig. 5). However, when considering only significant HOGs (Supplementary Table S7), the number of contracted HOGs was larger than expanded HOGs with the common myna having 133 HOG expansions and 167 HOG contractions, out of a total of 1,523 HOGs that were flagged as significantly expanding or contracting across all species. In general, most avian proteomes across the species included in our analysis (Fig. 5) and previous similar analysis122, 123 tend to report more gene family contractions, owing to the gene loss that characterizes avian genomes.96 The only other avian species included in our analysis that did not conform to this trend was the zebra finch (Fig. 5). This pattern is seen in other studies of T. guttata,122, 123 and is not seen in the zebra finch’s nearest relative Lonchura striata domestica despite only 10 million years’ divergence,73 though this result may also be an artifact of differing annotation approaches.

Figure 5.

Figure 5.

Gene family evolution analysis across Aves species, and within Acridotheres tristis. This figure depicts the total number of gene families (specifically, hierarchical orthogroups or HOGs) that had expanded or contracted within branches. The green (+) and red (−) numbers represent the expanded and contracted gene families, respectively, within the Sturnidae lineage, indicated in purple. Pictures taken from https://www.phylopic.org/.

We then investigated the gene function for common myna gene families that had undergone significant expansion or contraction, using GO terms annotated by interproscan. Of those that were successfully annotated, significant expanded gene families were associated with biological processes such as lipid and glutathione metabolic processes, response to pheromone, steroid biosynthetic process and keratinization (Supplementary Fig. S12, Supplementary Table S8). Significantly contracted gene families were associated with tachykinin receptor signalling pathways, cobalamin transport, immune response, and viral processes (Supplementary Fig. S12, Supplementary Table S8). We note that in some cases the same gene ontology group appeared in both expanding and contracting families, because expansions and contractions are calculated based on HOGs and thus annotated GO terms are not guaranteed to be unique. Appearance of the same GO term in both the significant expanding and contracting gene families was seen for some common biological processes and may indicate higher redundancy in those biological pathways.124, 125

3.7 Demographic history inference

We used two different historic demographic statistical approaches, because the sequentially Markovian coalescent methods in psmc are more useful for demographic changes in the distant past while site frequency spectrum-based stairway plot performs better over more recent generations.126 Using psmc to analyse demographic changes based on a singular genome, we observed that many of these individuals’ genomes signalled a large increase in effective population size ~60 kya (Fig. 6a), followed by a sharp population decrease ~10-20 kya. This population increase exists roughly after marine isotope stages MIS4,127 and before MIS2, the last glacial maximum.128 This pattern is similar to that in some other Eurasian passerine species over this time period,77, 129 though we note that this result is purely correlative and highly dependent on generation time and mutation rate choices. We observe also that the size of the population increase is highly variable across analysed genomes and curiously, individuals from more bottlenecked invasive populations exhibited dramatically larger population expansion estimates compared to native range individuals despite the ancestors of both populations presumably occurring in the native range at the time of population expansion.19 This is highly suggestive that recent demographic shifts may introduce systemic bias in coalescent estimates of historical effective population sizes, thus signalling the need for caution when interpreting relative signals in highly bottlenecked populations.

Figure 6.

Figure 6.

Demographic history of Acridotheres tristis. Panel (a) depicts the ancient demographic history as generated by psmc from the resequenced whole genomes of 10 individuals, with native range Indian individuals (IND) in browns, and invasive range individuals (FIJI, AUS, and NZ) in blue and green. Panel (b) depicts the more recent demography as generated by stairway plot from individuals from one native range sample site and one introduced range sample site (IND TN N = 8, NZ LEI N = 8; lighter lines indicate 75% and 95% confidence intervals). Full sample site names are given in Supplementary Table S3.

This paradox might alternatively be explained through the observation that this dramatic population expansion was not ubiquitous across all samples analysed, and indeed was not present in some of the native range individuals. The mixed signal for population expansion within the native range was evident particularly when bootstrapping was performed (Supplementary Fig. S13). When the number of native range samples analysed were increased (Supplementary Fig. S14), we noted that the mixed population expansion signal was visible in all native range sample sites except for Maharashtra subpopulation A (MAa), a sample site previously observed to be more bottlenecked than the rest of the native range and which clustered closely with some invasive populations19 (though we acknowledge small sample size in this analysis limits the strength of this conclusion). This result is indicative of two things. First, that this population expansion event may not have covered the entire native range but possibly differed based on climate or biome differences throughout Eurasia across MIS4-MIS2.130, 131 Second, the lack of a mixed signal in the invasive range samples analysed (Fig. 5a) supports previous conclusions that it is likely that the ancestors of invasive populations were sourced from similar locations for each separate introduction effort.19

When demographic patterns over more recent generations were estimated using the population-based site frequency spectrum approach employed by stairway plot, both the native and invasive range population produced an estimated peak effective population size of 400,000–600,000 individuals, followed by a slow decline towards present day (Fig. 6a). While both populations reported an increase in population size at around 600 kya, this decreased within the first 10 steps of both population’s plots, and it is best practice to not overinterpret this region.132 This increase does however correspond to a similar time frame of a population increase reported by psmc. In both analyses, we observe that the bottlenecked invasive populations had a more rapid estimated rate of demographic change. Despite these populations only having diverged within the last 200 years,14 the invasive NZ LEI population results in a more pronounced effective population size decrease towards present day time. Interestingly, the initial point of effective population size decline estimated by this population corresponds to the psmc estimated decline at 20 kya. However, NZ LEI continues to decline sharply until present day and produces more recent population size estimates of 2,000 individuals, which is well below that estimated by IND TN (160,000) or both native and introduced individuals in psmc (50,000–400,000).

3.6 Conclusions

Here we present a chromosome level assembly of the common myna, A. tristis, one of the most successful invasive avian species recorded over the past 200 years. There are substantial resources and efforts going towards understanding this species in the region in which it has become invasive and we believe genomic insights will provide valuable tools towards those efforts. Conversely, as increasing numbers of species suffer reduced population sizes, successfully introduced/invasive species like the common myna provide unique insight as to how some species can successfully inhabit a diversity of ecological conditions, in this case from a small founder population size. This high-quality chromosome-level assembly and annotation will provide valuable genomic context for future studies on this species. Through this work, we describe the genomic landscape of this species, including genome-wide allelic diversity, DNA methylation, repeats, and recombination, as well as an examination of gene family evolution. Using demographic analysis we provide the first whole genome-level insight into this species through ancient and recent time. We identify that some native regions underwent a dramatic population increase between the two most recent periods of glaciation, but also reveal artefactual impacts of recent bottlenecks on historical demographic analysis.

Supplementary Material

dsae005_suppl_Supplementary_Material
dsae005_suppl_Supplementary_Table_S7

Acknowledgments

We extend many thanks to Dinindu Senanayake and Joseph Guhlin for their technical support for analyses. Huge thanks to Andrew King for sample preparation and extraction. We are very grateful to Aaron Darling, Dominique Gorse, and their groups for their work in sequencing and assembling an earlier version of the myna genome. Thanks to Stella Loke for hosting A.W. and K.C.S. for ONT training. And our thanks to Australia Museum staff including Leah Tsang, Scott Ginn, and Tracey McVea for their support in sample loans and research collaboration contracts. Finally, our thanks to Mark Peck at the Royal Ontario Museum and to the many individual collectors in Australia and New Zealand who contributed myna samples.

Contributor Information

Katarina C Stuart, School of Biological Sciences, University of Auckland, Auckland, Aotearoa, New Zealand; Evolution and Ecology Research Centre, School of Biological, Earth and Environmental Sciences, University of New South Wales, Sydney, Australia.

Rebecca N Johnson, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA.

Richard E Major, Australian Museum Research Institute, Australian Museum, Sydney, Australia.

Kamolphat Atsawawaranunt, School of Biological Sciences, University of Auckland, Auckland, Aotearoa, New Zealand.

Kyle M Ewart, Australian Museum Research Institute, Australian Museum, Sydney, Australia; School of Life and Environmental Sciences,University of Sydney, Sydney, Australia.

Lee A Rollins, Evolution and Ecology Research Centre, School of Biological, Earth and Environmental Sciences, University of New South Wales, Sydney, Australia.

Anna W Santure, School of Biological Sciences, University of Auckland, Auckland, Aotearoa, New Zealand.

Annabel Whibley, School of Biological Sciences, University of Auckland, Auckland, Aotearoa, New Zealand.

Data Accessibility

All raw data associated with this project is available on NCBI. The sequencing data used to create the genome assembly and annotation (10x Chromium linked-read, MinION Oxford Nanopore Technologies (ONT) long-read, and short-read cDNA) is available under the BioProject accession number PRJNA928557. The Illumina whole genome resequencing data is available under the BioProject accession number PRJNA1054049. The genome assembly is available on NCBI (accession GCA_027559615.1). The code used for this project is available through Zenodo (10.1101/2023.08.22.554353) and GitHub (https://github.com/katarinastuart/At1_MynaGenome).

Funding

We gratefully acknowledge funding from The New Zealand Royal Society Te Apārangi Marsden Grant (grant number UOA1911), Australian Museum Foundation and City of Sydney Environmental Grants Program. Seed funding was also generously provided by the University of Auckland’s Faculty of Science Sustainability Theme, Computational Biology Theme and the Digital Biology Interface Institute. A University of Auckland Doctoral Scholarship supports K.A.

Author contributions

A.W., K.C.S., R.N.J., R.M., L.A.R., and A.W.S. designed the research, A.W. conducted the ONT sequencing and led the myna genome assembly, K.C.S. led the genome analyses. R.M. and K.M.E. coordinated data collection from Australia for the genome individual and population samples. A.W.S. coordinated data collection for New Zealand with K.A. contributing insights on the global population structure. K.C.S. led the writing of the paper, with input from A.W.S. and feedback from all authors. All authors read and approved the final manuscript.

Conflict of interest

The authors declare no conflicts of interest.

References

  • 1. Lockwood, J. L., Hoopes, M. F., and Marchetti, M. P.. 2013, Invasion Ecology. 2nd edition. Chichester, West Sussex, UK: Wiley-Blackwell. [Google Scholar]
  • 2. Matheson, P. and McGaughran, A.. 2022, Genomic data is missing for many highly invasive species, restricting our preparedness for escalating incursion rates, Sci. Rep., 12, 13987. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Thompson, J.N. 1998, Rapid evolution as an ecological process, Trends Ecol. Evol., 13, 329–32. [DOI] [PubMed] [Google Scholar]
  • 4. Whitney, K.D. and Gabler, C.A.. 2008, Rapid evolution in introduced species, ‘invasive traits’ and recipient communities: challenges for predicting invasive potential, Divers. Distrib., 14, 569–80. [Google Scholar]
  • 5. Buswell, J.M., Moles, A.T., and Hartley, S.. 2011, Is rapid evolution common in introduced plant species? J. Ecol., 99, 214–24. [Google Scholar]
  • 6. Schrieber, K. and Lachmuth, S.. 2017, The Genetic Paradox of Invasions revisited: the potential role of inbreeding × environment interactions in invasion success, Biol. Rev. Camb. Philos. Soc., 92, 939–52. [DOI] [PubMed] [Google Scholar]
  • 7. Clements, D.R. and Ditommaso, A.. 2011, Climate change and weed adaptation: can evolution of invasive plants lead to greater range expansion than forecasted? Weed Res, 51, 227–40. [Google Scholar]
  • 8. Sillero, N., Huey, R.B., Gilchrist, G., Rissler, L., and Pascual, M.. 2020, Distribution modelling of an introduced species: do adaptive genetic markers affect potential range? Proc. Biol. Sci., 287, 20201791. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Willi, Y., Kristensen, T.N., Sgrò, C.M., Weeks, A.R., Ørsted, M., and Hoffmann, A.A.. 2022, Conservation genetics as a management tool: The five best-supported paradigms to assist the management of threatened species, Proc. Natl. Acad. Sci. U.S.A., 119, e2105076119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Feare, C., and Craig, A.. 1999, Starlings and Mynas. Princeton, New Jersey: Princeton University Press. [Google Scholar]
  • 11. Lowe, S., Browne, M., and Boudjelas, S.. 2000, 100 of the world’s worst invasive alien species. A selection from the global invasive species database. Auckland, New Zealand: Invasive Species Specialist Group. [Google Scholar]
  • 12. Holzapfel, C., Levin, N., Hatzofe, O., and Kark, S.. 2006, Colonisation of the Middle East by the invasive Common Myna Acridotheres tristis L., with special reference to Israel, Sandgrouse, 28, 44–51. [Google Scholar]
  • 13. Magory Cohen, T., McKinney, M., Kark, S., and Dor, R.. 2019, Global invasion in progress: modeling the past, current and potential global distribution of the common myna, Biol. Invasions, 21, 1295–309. [Google Scholar]
  • 14. Beesley, A., Whibley, A., Santure, A.W., and Battles, H.T.. 2023, The introduction and distribution history of the common myna (Acridotheres tristis) in New Zealand. N. Z, J. Zool., 0, 1–13. [Google Scholar]
  • 15. Hughes, B.J., Martin, G.R., and Reynolds, S.J.. 2017, Estimating the extent of seabird egg depredation by introduced Common Mynas on Ascension Island in the South Atlantic, Biol. Invasions, 19, 843–57. [Google Scholar]
  • 16. Feare, C.J., Bristol, R.M., and Crommenacker, J.V.D.. 2022, Eradication of a highly invasive bird, the Common Myna Acridotheres tristis, facilitates the establishment of insurance populations of island endemic birds, Bird Conserv. Int., 32, 439–59. [Google Scholar]
  • 17. Peacock, D.S., Janse van Rensburg, B., and Robertson, M.P.. 2007, The distribution and spread of the invasive alien common myna, Acridotheres tristis L. (Aves: Sturnidae), in southern Africa: research article, South Afr. J. Sci., 103, 465–73. [Google Scholar]
  • 18. Ewart, K.M., Griffin, A.S., Johnson, R.N., et al. 2019, Two speed invasion: assisted and intrinsic dispersal of common mynas over 150 years of colonization, J. Biogeogr., 46, 45–57. [Google Scholar]
  • 19. Atsawawaranunt, K., Ewart, K.M., Major, R.E., Johnson, R.N., Santure, A.W., and Whibley, A.. 2023, Tracing the introduction of the invasive common myna using population genomics, Heredity, 131, 56–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Tindall, S.D., Ralph, C.J., and Clout, M.N.. 2007, Changes in bird abundance following common myna control on a New Zealand island, Pac. Conserv. Biol., 13, 202–12. [Google Scholar]
  • 21. Grarock, K., Tidemann, C.R., Wood, J., and Lindenmayer, D.B.. 2012, Is it benign or is it a Pariah? Empirical evidence for the impact of the common Myna (Acridotheres tristis) on Australian Birds, PLoS One, 7, e40622. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Rogers, A.M., Griffin, A.S., van Rensburg, B.J., and Kark, S.. 2020, Noisy neighbours and myna problems: interaction webs and aggression around tree hollows in urban habitats, J. Appl. Ecol., 57, 1891–901. [Google Scholar]
  • 23. Lowe, K.A., Taylor, C.E., and Major, R.E.. 2011, Do common mynas significantly compete with native birds in urban environments? J. Ornithol., 152, 909–21. [Google Scholar]
  • 24. Tracey, J. and Saunders, G.. 2003, Bird damage to the wine grape industry (Report to the Bureau of Rural Sciences, Department of Agriculture, Fisheries, and Forestry), Vertebrate Pest Research Unit, New South Wales Agriculture, Orange, Australia. [Google Scholar]
  • 25. Koopman, M.E. and Pitt, W.C.. 2007, Crop diversification leads to diverse bird problems in Hawaiian agriculture, Hum.-Wildl. Confl., 1, 235–43. [Google Scholar]
  • 26. Clark, N.J., Olsson-Pons, S., Ishtiaq, F., and Clegg, S.M.. 2015, Specialist enemies, generalist weapons and the potential spread of exotic pathogens: malaria parasites in a highly invasive bird, Int. J. Parasitol., 45, 891–9. [DOI] [PubMed] [Google Scholar]
  • 27. Magory Cohen, T., Major, R.E., Kumar, R.S., et al. 2021, Rapid morphological changes as agents of adaptation in introduced populations of the common myna (Acridotheres tristis), Evol. Ecol., 35, 443–62. [Google Scholar]
  • 28. Atsawawaranunt, K., Whibley, A., Cain, K.E., Major, R.E. and Santure, A.W.. 2024, Projecting the current and potential future distribution of New Zealand’s invasive sturnids. Biol Invasions 10.1007/s10530-024-03246-0 [DOI] [Google Scholar]
  • 29. Stuart, K.C., Hofmeister, N.R., Zichello, J.M., and Rollins, L.A.. 2023, Global invasion history and native decline of the common starling: insights through genetics, Biol. Invasions, 25, 1291–316. [Google Scholar]
  • 30. Wick, R., Volkening, J., and Loman, N.. 2017, Porechop. Github (https://github.com/rrwick/Porechop).
  • 31. Kolmogorov, M., Yuan, J., Lin, Y., and Pevzner, P.A.. 2019, Assembly of long, error-prone reads using repeat graphs, Nat. Biotechnol., 37, 540–6. [DOI] [PubMed] [Google Scholar]
  • 32. Oxford Nanopore Technologies. 2018, Medaka. https://github.com/nanoporetech/medaka
  • 33. Hu, J., Fan, J., Sun, Z., and Liu, S.. 2020, NextPolish: a fast and efficient genome polishing tool for long-read assembly, Bioinformatics, 36, 2253–5. [DOI] [PubMed] [Google Scholar]
  • 34. Krueger, F. 2021, Babraham Bioinformatics - Trim Galore!https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/
  • 35. Li, H. 2013, Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM, ArXiv13033997 Q-Bio. [Google Scholar]
  • 36. Danecek, P., Bonfield, J.K., Liddle, J., et al. 2021, Twelve years of SAMtools and BCFtools, GigaScience, 10, giab008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Alonge, M., Lebeigle, L., Kirsche, M., et al. 2022, Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing, Genome Biol., 23, 258. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Marçais, G. and Kingsford, C.. 2011, A fast, lock-free approach for efficient parallel counting of occurrences of k-mers, Bioinformatics, 27, 764–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Vurture, G.W., Sedlazeck, F.J., Nattestad, M., et al. 2017, GenomeScope: fast reference-free genome profiling from short reads, Bioinformatics, 33, 2202–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Rhie, A., Walenz, B.P., Koren, S., and Phillippy, A.M.. 2020, Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies, Genome Biol., 21, 245. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Edwards RJ: SLiMSuite v1.9.1. [https://github.com/slimsuite/SLiMSuite] 2020. [Google Scholar]
  • 42. Simão, F.A., Waterhouse, R.M., Ioannidis, P., Kriventseva, E.V., and Zdobnov, E.M.. 2015, BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs, Bioinformatics, 31, 3210–2. [DOI] [PubMed] [Google Scholar]
  • 43. Chen, S., Zhou, Y., Chen, Y., and Gu, J.. 2018, fastp: an ultra-fast all-in-one FASTQ preprocessor, Bioinformatics, 34, i884–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Kim, D., Paggi, J.M., Park, C., Bennett, C., and Salzberg, S.L.. 2019, Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype, Nat. Biotechnol., 37, 907–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Kovaka, S., Zimin, A.V., Pertea, G.M., Razaghi, R., Salzberg, S.L., and Pertea, M.. 2019, Transcriptome assembly from long-read RNA-seq alignments with StringTie2, Genome Biol., 20, 278. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Pertea, G. and Pertea, M.. 2020, GFF Utilities: GffRead and GffCompare, F1000Research, 9, 304. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Hoff, K.J., Lange, S., Lomsadze, A., Borodovsky, M., and Stanke, M.. 2016, BRAKER1: unsupervised RNA-seq-based genome annotation with GeneMark-ET and AUGUSTUS, Bioinformatics, 32, 767–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Brůna, T., Hoff, K.J., Lomsadze, A., Stanke, M., and Borodovsky, M.. 2021, BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database, NAR Genom Bioinform, 3, lqaa108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Keilwagen, J., Hartung, F., Paulini, M., Twardziok, S.O., and Grau, J.. 2018, Combining RNA-seq data and homology-based gene prediction for plants, animals and fungi, BMC Bioinf., 19, 189. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Gabriel, L., Hoff, K.J., Brůna, T., Borodovsky, M., and Stanke, M.. 2021, TSEBRA: transcript selector for BRAKER, BMC Bioinf., 22, 566. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. UniProt Consortium. 2019, UniProt: a worldwide hub of protein knowledge, Nucleic Acids Res., 47, D506–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Brůna, T., Lomsadze, A., and Borodovsky, M.. 2023, GeneMark-ETP: automatic gene finding in eukaryotic genomes in consistence with extrinsic data, bioRxiv, 2023.01.13.524024. [Google Scholar]
  • 53. Brůna, T., Lomsadze, A., and Borodovsky, M.. 2020, GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins, NAR genomics and bioinformatics, 2, lqaa026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Stanke, M., Diekhans, M., Baertsch, R., and Haussler, D.. 2008, Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics, 24, 637–44. [DOI] [PubMed] [Google Scholar]
  • 55. Buchfink, B., Xie, C., and Huson, D.H.. 2015, Fast and sensitive protein alignment using DIAMOND, Nat. Methods, 12, 59–60. [DOI] [PubMed] [Google Scholar]
  • 56. Cantalapiedra, C.P., Hernández-Plaza, A., Letunic, I., Bork, P., and Huerta-Cepas, J.. 2021, eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale, Mol. Biol. Evol., 38, 5825–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Stuart, K.C., Edwards, R.J., Cheng, Y., et al. 2022, Transcript- and annotation-guided genome assembly of the European starling, Mol. Ecol. Resour., 22, 3141–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Dainat, J. 2020, AGAT: Another Gff Analysis Toolkit to handle annotations in any GTF/GFF format, Zenodo, 10.5281/zenodo.3552717 [DOI]
  • 59. Quinlan, A. R. and Hall, I. M.. 2010, BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. 2019, Picard toolkit. Broad Institute. [Google Scholar]
  • 61. Danecek, P., Auton, A., Abecasis, G., et al. ; 1000 Genomes Project Analysis Group. 2011, The variant call format and VCFtools, Bioinformatics, 27, 2156–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Gaspar, J.M. and Hart, R.P.. 2017, DMRfinder: efficiently identifying differentially methylated regions from MethylC-seq data, BMC Bioinf., 18, 528. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Crescente, J.M., Zavallo, D., Helguera, M., and Vanzetti, L.S.. 2018, MITE Tracker: an accurate approach to identify miniature inverted-repeat transposable elements in large genomes, BMC Bioinf., 19, 348. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. Baril, T., Imrie, R.M., and Hayward, A.. 2022, Earl Grey: a fully automated user-friendly transposable element annotation and analysis pipeline, bioRxiv, 2022.06.30.498289. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65. Smit, A., Hubley, R., and Green, P.. 2013, 2015, RepeatMasker Open-4.0. http://www.repeatmasker.org
  • 66. Flynn, J.M., Hubley, R., Goubert, C., et al. 2020, RepeatModeler2 for automated genomic discovery of transposable element families, Proc. Natl. Acad. Sci. U.S.A., 117, 9451–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67. Storer, J., Hubley, R., Rosen, J., Wheeler, T.J., and Smit, A.F.. 2021, The Dfam community resource of transposable element families, sequence models, and genome annotations, Mob. DNA, 12, 2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68. McVean, G.A.T., Myers, S.R., Hunt, S., Deloukas, P., Bentley, D.R., and Donnelly, P.. 2004, The fine-scale structure of recombination rate variation in the human genome, Science, 304, 581–4. [DOI] [PubMed] [Google Scholar]
  • 69. Stukenbrock, E.H. and Dutheil, J.Y.. 2018, Fine-scale recombination maps of fungal plant pathogens reveal dynamic recombination landscapes and intragenic hotspots, Genetics, 208, 1209–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70. Emms, D.M. and Kelly, S.. 2019, OrthoFinder: phylogenetic orthology inference for comparative genomics, Genome Biol., 20, 238. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71. Mendes, F.K., Vanderpool, D., Fulton, B., and Hahn, M.W.. 2020, CAFE 5 models variation in evolutionary rates among gene families, Bioinformatics, 36, 5516–8. [DOI] [PubMed] [Google Scholar]
  • 72. Shen, W., Le, S., Li, Y. and Hu, F.. 2016, SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation. PLoS ONE 11, e0163962. 10.1371/journal.pone.0163962 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73. Kumar, S., Suleski, M., Craig, J.M., et al. 2022, TimeTree 5: an expanded resource for species divergence times, Mol. Biol. Evol., 39, msac174. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74. Quevillon, E., Silventoinen, V., Pillai, S., et al. 2005, InterProScan: protein domains identifier, Nucleic Acids Res., 33, W116–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75. Supek, F., Bošnjak, M., Škunca, N., and Šmuc, T.. 2011, REVIGO Summarizes and Visualizes Long Lists of Gene Ontology Terms, PLoS One, 6, e21800. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76. Li, H. and Durbin, R.. 2011, Inference of human population history from individual whole-genome sequences, Nature, 475, 493–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77. Nadachowska-Brzyska, K., Li, C., Smeds, L., Zhang, G., and Ellegren, H.. 2015, Temporal dynamics of avian populations during pleistocene revealed by whole-genome sequences, Curr. Biol., 25, 1375–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78. Smeds, L., Qvarnström, A., and Ellegren, H.. 2016, Direct estimate of the rate of germline mutation in a bird, Genome Res., 26, 1211–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79. Liu, X. and Fu, Y.-X.. 2020, Stairway Plot 2: demographic history inference with folded SNP frequency spectra, Genome Biol., 21, 280. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80. Liu, S., Ferchaud, A.-L., Grønkjaer, P., Nygaard, R., and Hansen, M.M.. 2018, Genomic parallelism and lack thereof in contrasting systems of three-spined sticklebacks, Mol. Ecol., 27, 4725–43. [DOI] [PubMed] [Google Scholar]
  • 81. Peona, V., Blom, M.P.K., Xu, L., et al. 2021, Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird-of-paradise, Mol. Ecol. Resour., 21, 263–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82. Bailey, S., Guhlin, J., Senanayake, D.S., et al. 2023, Assembly of female and male hihi genomes (stitchbird; Notiomystis cincta) enables characterization of the W chromosome and resources for conservation genomics, Mol. Ecol. Resour., n/a. 10.3389/fnhum.2015.00181 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83. Sharma, G.P., Mittal, O.P., and Gupta, N.. 1980, Somatic chromosomes of Acridotheres fuscus fuscus Wagler and Acridotheres tristis tristis Linnaeus, Cytologia, 45, 403–10. [Google Scholar]
  • 84. O’Connor, R.E., Kiazim, L., Skinner, B., et al. 2019, Patterns of microchromosome organization remain highly conserved throughout avian evolution, Chromosoma, 128, 21–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85. Miller, M.M. and Taylor, R.L.. 2016, Brief review of the chicken Major histocompatibility complex: the genes, their distribution on chromosome 16, and their contributions to disease resistance, Poult. Sci., 95, 375–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86. Waters, P.D., Patel, H.R., Ruiz-Herrera, A., et al. 2021, Microchromosomes are building blocks of bird, reptile, and mammal chromosomes, Proc. Natl. Acad. Sci. U.S.A., 118, e2112494118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87. Beauclair, L., Ramé, C., Arensburger, P., et al. 2019, Sequence properties of certain GC rich avian genes, their origins and absence from genome assemblies: case studies, BMC Genomics, 20, 734. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88. Peona, V., Weissensteiner, M.H., and Suh, A.. 2018, How complete are ‘complete’ genome assemblies?—An avian perspective, Mol. Ecol. Resour., 18, 1188–95. [DOI] [PubMed] [Google Scholar]
  • 89. Driver, R.J. and Balakrishnan, C.N.. 2021, Highly contiguous genomes improve the understanding of avian olfactory receptor repertoires, Integr. Comp. Biol., 61, 1281–90. [DOI] [PubMed] [Google Scholar]
  • 90. He, K., Minias, P., and Dunn, P.O.. 2021, Long-read genome assemblies reveal extraordinary variation in the number and structure of MHC loci in birds, Genome Biol. Evol, 13, evaa270. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91. Botero-Castro, F., Figuet, E., Tilak, M.-K., Nabholz, B., and Galtier, N.. 2017, Avian genomes revisited: hidden genes uncovered and the rates versus traits paradox in birds, Mol. Biol. Evol., 34, 3123–31. [DOI] [PubMed] [Google Scholar]
  • 92. Bravo, G.A., Schmitt, C.J., and Edwards, S.V.. 2021, What have we learned from the first 500 avian genomes? Annu. Rev. Ecol. Evol. Syst., 52, 611–39. [Google Scholar]
  • 93. Rhie, A., McCarthy, S.A., Fedrigo, O., et al. 2021, Towards complete and error-free genome assemblies of all vertebrate species, Nature, 592, 737–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94. Peona, V., Blom, M.P.K., Frankl-Vilches, C., et al. 2022, The hidden structural variability in avian genomes, biorxiv. [Google Scholar]
  • 95. Weissensteiner, M. H., and Suh, A.. 2019, Repetitive DNA: the dark matter of avian genomics. In: Kraus, R. H. S., (ed.), Avian Genomics in Ecology and Evolution: From the Lab into the Wild. Cham: Springer International Publishing, pp. 93–150. [Google Scholar]
  • 96. Zhang, G., Li, C., Li, Q., et al. ; Avian Genome Consortium. 2014, Comparative genomics reveals insights into avian genome evolution and adaptation, Science, 346, 1311–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97. Kapusta, A. and Suh, A.. 2017, Evolution of bird genomes—a transposon’s-eye view, Ann. N. Y. Acad. Sci., 1389, 164–85. [DOI] [PubMed] [Google Scholar]
  • 98. Guiglielmoni, N., Houtain, A., Derzelle, A., Van Doninck, K., and Flot, J.-F.. 2021, Overcoming uncollapsed haplotypes in long-read assemblies of non-model organisms, BMC Bioinf., 22, 303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99. Zhang, G., Rahbek, C., Graves, G.R., Lei, F., Jarvis, E.D., and Gilbert, M.T.P.. 2015, Bird sequencing project takes off, Nature, 522, 34–34. [DOI] [PubMed] [Google Scholar]
  • 100. Sherman, R.M. and Salzberg, S.L.. 2020, Pan-genomics in the human genome era, Nat. Rev. Genet., 21, 243–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101. Song, H., Wang, L., Chen, D., and Li, F.. 2020, The function of Pre-mRNA alternative splicing in mammal spermatogenesis, Int. J. Biol. Sci., 16, 38–48. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102. Mueller, R.C., Ellström, P., Howe, K., et al. 2021, A high-quality genome and comparison of short- versus long-read transcriptome of the palaearctic duck Aythya fuligula (tufted duck), GigaScience, 10, giab081. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103. Richardson, M.F., Sherwin, W.B., and Rollins, L.A.. 2017, De Novo assembly of the liver transcriptome of the European Starling, Sturnus vulgaris, J. Genomics, 5, 54–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104. Yin, Z.-T., Zhu, F., Lin, F.-B., et al. 2019, Revisiting avian ‘missing’ genes from de novo assembled transcripts, BMC Genomics, 20, 4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105. Stuart, K.C., Sherwin, W.B., Edwards, R.J., and Rollins, L.A.. 2023, Evolutionary genomics: insights from the invasive European starlings, Front. Genet., 13, 1010456. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106. Li, H. 2014, Toward better understanding of artifacts in variant calling from high-coverage samples, Bioinformatics, 30, 2843–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107. Stuart, K.C., Sherwin, W.B., Austin, J.J., et al. 2022, Historical museum samples enable the examination of divergent and parallel evolution during invasion, Mol. Ecol., 31, 1836–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108. Lavretsky, P., Dacosta, J.M., Hernández-Baños, B.E., Engilis, A., Sorenson, M.D., and Peters, J.L.. 2015, Speciation genomics and a role for the Z chromosome in the early stages of divergence between Mexican ducks and mallards, Mol. Ecol., 24, 5364–78. [DOI] [PubMed] [Google Scholar]
  • 109. Meisel, R.P. and Connallon, T.. 2013, The faster-X effect: integrating theory and data, Trends Genet., 29, 537–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110. Cvijović, I., Good, B.H., and Desai, M.M.. 2018, The effect of strong purifying selection on genetic diversity, Genetics, 209, 1235–78. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111. Gao, B., Wang, S., Wang, Y., et al. 2017, Low diversity, activity, and density of transposable elements in five avian genomes, Funct. Integr. Genomics, 17, 427–39. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112. Prost, S., Armstrong, E.E., Nylander, J., et al. 2019, Comparative analyses identify genomic features potentially involved in the evolution of birds-of-paradise, GigaScience, 8, giz003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113. Li, Q., Li, N., Hu, X., et al. 2011, Genome-wide mapping of DNA methylation in chicken, PLoS One, 6, e19428. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 114. Derks, M.F.L., Schachtschneider, K.M., Madsen, O., Schijlen, E., Verhoeven, K.J.F., and van Oers, K.. 2016, Gene and transposable element methylation in great tit (Parus major) brain and blood, BMC Genomics, 17, 332. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115. Viitaniemi, H.M., Verhagen, I., Visser, M.E., Honkela, A., van Oers, K., and Husby, A.. 2019, Seasonal variation in genome-wide DNA methylation patterns and the onset of seasonal timing of reproduction in great tits, Genome Biol. Evol, 11, 970–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 116. Höglund, A., Henriksen, R., Fogelholm, J., et al. 2020, The methylation landscape and its role in domestication and gene regulation in the chicken, Nat. Ecol. Evol., 4, 1713–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 117. Sun, D., Layman, T.S., Jeong, H., et al. 2021, Genome-wide variation in DNA methylation linked to developmental stage and chromosomal suppression of recombination in white-throated sparrows, Mol. Ecol., 30, 3453–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 118. Sepers, B., Chen, R.S., Memelink, M., Verhoeven, K.J.F., and van Oers, K.. 2023, Variation in DNA methylation in avian nestlings is largely determined by genetic effects, Mol. Biol. Evol., 40, msad086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 119. Groenen, M.A.M., Wahlberg, P., Foglio, M., et al. 2009, A high-density SNP-based linkage map of the chicken genome reveals sequence features correlated with recombination rate, Genome Res., 19, 510–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 120. Stapley, J., Birkhead, T.R., Burke, T., and Slate, J.. 2010, Pronounced inter- and intrachromosomal variation in linkage disequilibrium across the zebra finch genome, Genome Res., 20, 496–502. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 121. Fu, W., Dekkers, J.C., Lee, W.R., and Abasht, B.. 2015, Linkage disequilibrium in crossbred and pure line chickens, Genet. Sel. Evol., 47, 11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 122. Liu, S., Chen, H., Ouyang, J., et al. 2022, A high-quality assembly reveals genomic characteristics, phylogenetic status, and causal genes for leucism plumage of Indian peafowl, GigaScience, 11, giac018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 123. Karawita, A.C., Cheng, Y., Chew, K.Y., et al. 2023, The swan genome and transcriptome, it is not all black and white, Genome Biol., 24, 13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 124. Nowak, M.A., Boerlijst, M.C., Cooke, J., and Smith, J.M.. 1997, Evolution of genetic redundancy, Nature, 388, 167–71. [DOI] [PubMed] [Google Scholar]
  • 125. Delattre, M. and Félix, M.-A.. 2009, The evolutionary context of robust and redundant cell biological mechanisms, BioEssays, 31, 537–45. [DOI] [PubMed] [Google Scholar]
  • 126. Patton, A.H., Margres, M.J., Stahlke, A.R., et al. 2019, Contemporary demographic reconstruction methods are robust to genome assembly quality: a case study in Tasmanian Devils, Mol. Biol. Evol., 36, 2906–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 127. De Deckker, P., Arnold, L.J., van der Kaars, S., et al. 2019, Marine Isotope Stage 4 in Australasia: a full glacial culminating 65,000 years ago – Global connections and implications for human dispersal, Quat. Sci. Rev., 204, 187–207. [Google Scholar]
  • 128. Lisiecki, L.E. and Raymo, M.E.. 2005, A Pliocene-Pleistocene stack of 57 globally distributed benthic δ18O records, Paleoceanography, 20, PA1003. [Google Scholar]
  • 129. Nadachowska-Brzyska, K., Burri, R., Smeds, L., and Ellegren, H.. 2016, PSMC analysis of effective population sizes in molecular ecology and its application to black-and-white Ficedula flycatchers, Mol. Ecol., 25, 1058–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 130. Ray, N. and Adams, J.. 2001, A GIS-based vegetation map of the world at the Last Glacial Maximum (25,000-15,000 BP), Internet Archaeol, 11. [Google Scholar]
  • 131. Guo, C., Nisancioglu, K.H., Bentsen, M., Bethke, I., and Zhang, Z.. 2019, Equilibrium simulations of Marine Isotope Stage 3 climate, Clim. Past, 15, 1133–51. [Google Scholar]
  • 132. Liu, X. and Fu, Y.-X.. 2015, Exploring population size changes using SNP frequency spectra, Nat. Genet., 47, 555–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

dsae005_suppl_Supplementary_Material
dsae005_suppl_Supplementary_Table_S7

Data Availability Statement

All raw data associated with this project is available on NCBI. The sequencing data used to create the genome assembly and annotation (10x Chromium linked-read, MinION Oxford Nanopore Technologies (ONT) long-read, and short-read cDNA) is available under the BioProject accession number PRJNA928557. The Illumina whole genome resequencing data is available under the BioProject accession number PRJNA1054049. The genome assembly is available on NCBI (accession GCA_027559615.1). The code used for this project is available through Zenodo (10.1101/2023.08.22.554353) and GitHub (https://github.com/katarinastuart/At1_MynaGenome).


Articles from DNA Research: An International Journal for Rapid Publication of Reports on Genes and Genomes are provided here courtesy of Oxford University Press

RESOURCES