Skip to main content
Microbiome logoLink to Microbiome
. 2025 Dec 22;14:42. doi: 10.1186/s40168-025-02296-3

Healthy pangolin virome reveals mammalian viral diversity and zoonotic risk

Tianyi Dong 1,2,#, Qi Wang 3,#, Tengcheng Que 6,7,#, Haorui Si 3, Jia Su 1,2, Ying Chen 5, Kaixin Yang 3, Cong Li 4, Mengjie Qin 1,2, Bei Li 1, Yan Zhu 1, Shousheng Li 6, Yingjiao Li 6, Meihong He 6, Yanli Zhong 6, Qingyu Xiao 3,, Ben Hu 1,, Leiping Zeng 1,
PMCID: PMC12837103  PMID: 41430329

Abstract

Background

Pangolins, the world’s most trafficked mammals, have emerged as critical subjects of study due to their potential role as intermediate hosts for zoonotic viruses. While previous studies have primarily focused on diseased pangolins, the virome composition of healthy individuals remains largely unexplored.

Results

To address this knowledge gap, we performed comprehensive metatranscriptomic analysis of 83 healthy pangolins, in comparison with virome data of 52 diseased individuals derived from previously published datasets. We identified 51 viral operational taxonomic units (vOTUs) across six mammalian-associated viral families: Parvoviridae, Picornaviridae, Papillomaviridae, Circoviridae, Flaviviridae, and Paramyxoviridae. Notably, we observed recombination in Morbillivirus canis isolate BJ16B35, Canine distemper virus strain PS, and UN_MBA191024-Paramyxoviridae-1 from pangolins and domestic dogs, suggesting cross-species transmission dynamics. Co-infection analysis revealed a strong positive correlation between Copiparvovirus P171T/pangolin/2018 and Pangolin protoparvovirus, suggesting possible shared transmission pathways. Several viruses, including Orthopneumovirus hominis and Orthorubulavirus mammalis, were exclusively detected in diseased pangolins, implicating their potential role in pathogenesis. Zoonotic risk assessment identified 16 vOTUs with high predicted potential for human infection, including Pangolin pestivirus and Manis javanica papillomavirus 1.

Conclusions

Our findings significantly expand our understanding of viral diversity in healthy pangolins and help distinguish commensal viral communities from potentially pathogenic ones. This research underscores the importance of continued wildlife viral surveillance for both conservation and public health preparedness.

Download video file (50.2MB, mp4)

Video Abstract

Supplementary Information

The online version contains supplementary material available at 10.1186/s40168-025-02296-3.

Keywords: Pangolins, Metatranscriptomics, Zoonotic potential, Virome, Recombination, Viral diversity

Background

In an era where global health security faces unprecedented challenges, zoonotic pathogens—particularly those emerging from wildlife—have emerged as formidable adversaries, threatening human health, economies, and ecological stability [1]. With nearly 70% of emerging infectious diseases originating from animal reservoirs, viral spillover events underscore a critical intersection of ecology, evolution, and public health urgency [2]. Among wildlife species, pangolins (order Pholidota), the world’s most trafficked mammals, have garnered intense scientific scrutiny. Their dual role as both victims of illegal wildlife trade and potential intermediaries in zoonotic transmission networks raises alarming questions about their virome dynamics and spillover risks [3, 4]. While pangolins harbor an array of viruses, their immune vulnerabilities—such as the pseudogenization of interferon epsilon, a key antiviral gene in epithelial barriers—expose critical gaps in our understanding of their baseline viral ecology [5].

Despite growing interest in pangolin virome, research to date has predominantly focused on diseased individuals, often in crisis-response scenarios, leaving the virome of healthy individuals—a population that is rare, ecologically elusive, and seldom sampled—underexplored [68]. This omission obscures our ability to distinguish commensal viral communities from pathogenic ones, hindering efforts to predict disease emergence or mitigate spillover risks. Although a prior study has provided foundational insights into the virome of healthy pangolins, it has been limited in scope of sampling a small number of individuals or focusing on specific viral groups [8].

Building on this groundwork, our study expands the virome survey of healthy pangolins—a population critical to understanding baseline viral ecology yet rarely studied due to their rarity and conservation status. Through a systematic analysis of 83 healthy pangolins and 52 diseased individuals, we employ cutting-edge metatranscriptomics and machine learning-driven risk assessment. Our findings uncover striking disparities in viral diversity and abundance. We revealed six families of viruses commonly associated with mammals in healthy pangolins, while eight families were detected in the diseased group. Notably, viruses like flaviviruses, paramyxoviruses, and picornaviruses were far more diverse and common in sick pangolins. A key discovery—a novel morbillivirus sequence in a healthy pangolin, potentially linked to canine hosts—underscores the complexity of cross-species transmission dynamics and highlights the importance of studying rare but healthy animal populations to uncover cryptic viral interactions. Furthermore, we evaluate its zoonotic potential through predictive modeling, offering a framework for proactive surveillance.

Overall, our work extends existing research and highlights key differences in the viruses carried by healthy versus sick pangolins. Our findings improve our understanding of virome composition of this animal species significant for public health and stress the necessity of continued research on understudied wildlife reservoirs to better understand viral risks and prevent future outbreaks.

Methods

Sample collection

In 2019, a batch of smuggled pangolins were intercepted by Chinese customs and transported to the Guangxi Zhuang Autonomous Region Land Wildlife Medical Rescue and Epidemic Surveillance Research Center after rescued under ethical approval. Of these, 83 healthy pangolins were sampled and 83 oral swabs and 81 anal swabs were collected, transported, and stored following standard procedures according to the animal ethics permit (WIVA05201705) issued by the Wuhan Institute of Virology. The samples were stored in −80 °C freezers until further processing.

RNA extraction, library preparation, and sequencing

A total of 164 pangolin samples were retrieved from −80 °C fridges, thawed, and subjected to vortex agitation for 1 min. Following this, the samples were centrifuged at 4 °C, 21,000 × g for 5 min, and 200 µL of supernatant was collected for nucleic acid extraction. Total RNA was extracted from anal swab using the VAMNE Virus DNA/RNA Extraction Kit (Cat. No.: RM503-02, Vazyme, China) following the manufacturer’s instructions. An RNA library was then constructed using the MGIEasy RNA Library Prep Set (96 RXN, BGI, China) (Cat. No.: 1000006384). Paired-end (150 bp) sequencing of the RNA library was performed on the DNBSEQ-T7 platform.

Viral contigs assembly and annotation

Raw sequencing data were processed using fastp (v0.23.4) [9] to remove adaptor sequences and low-quality bases. Quality-controlled reads were aligned to the SILVA database using Bowtie2 (v2.5.2) [10] to filter out ribosomal RNA (rRNA). The remaining clean reads were assembled into contigs using megahit (v1.2.9) [11], with default parameters and the minimum contig length set to 1000 bp. The resulting contigs were compared against the non-redundant protein database (NR) using DIAMOND (v2.1.8.162) [12] with an E-value cut-off of 1 × 10−5. Vertebrate-associated contigs were selected based on their taxonomic classification. The completeness of each viral sequence was assessed using CheckV (v1.0.1) [13]. For contigs corresponding to the incomplete full genomes, clean reads were mapped to the subject sequences identified in their blastn results using Geneious (v2023.0.4) [14], thereby extracting and annotating the viral consensus sequences. Finally, hallmark sequences within these sequences were identified using DIAMOND blastx against an in-house viral hallmark sequence libraries (e.g., RdRp for RNA viruses and conserved replication-associated proteins for DNA viruses).

Quantification of virus abundance

The abundance of each newly identified viral species was assessed using a read mapping approach. Clean data, following the removal of adapter sequences, low-quality bases, and rRNA, were mapped to the viral consensus sequences. The abundance of each virus species was then quantified as RPM (reads per million mapped reads). To reduce false positives, the criteria of RPM ≥ 1 and coverage ≥ 10% were applied.

Host species identification

For each sample, the assembled contigs were compared against a customized database of pangolin Cytb sequences and mitochondrial libraries using blastn. For each healthy pangolin, more complete Cytb contigs were selected from either the anal or oral sample, and the host taxonomy was identified by combining these results with mitochondrial sequence analysis. For samples of ill pangolins where no Cytb sequence could be identified, the mitochondrial library results were used. And if still no match was found, the host taxonomy was determined based on the published article. The customized pangolin Cytb database was constructed as follows: all mitochondrial sequences and Cytb nucleotide sequences of pangolins available in NCBI (https://www.ncbi.nlm.nih.gov/) prior to May 8, 2024, were downloaded. Sequences shorter than 700 bp were excluded, and the Cytb region within mitochondrial sequences was extracted using nhmmer in HMMER (v3.4). The resulting sequences were integrated and clustered using cd-hit (v4.8.1) [15] with an average nucleotide identity threshold of 99.9%.

Phylogenetic analysis

For each viral family, replication-associated proteins and representative marker proteins from the same family (based on ICTV representative strains) were aligned using MAFFT (v7.525) [16] with the “auto” parameter. Maximum likelihood trees were constructed using IQ-TREE (v2.4.0) [17] with 1000 bootstrap replicates under default settings. To construct phylogenetic trees of Orthorubulavirus mammalis and Morbillivirus canis, all available genome sequences were first downloaded from GenBank. Redundant sequences were removed using MMseqs2 (v. 16.747c6) [18], with the following parameters: easy-linclust --min-seq-id 0.99 -e 0.01 -c 0.99 --cov-mode 1 --cluster-mode 2 --kmer-per-seq-scale 0.4 --spaced-kmer-mode 0. After deduplication, multiple sequence alignments were performed, and phylogenetic trees were constructed.

Recombination analysis

The 102 newly identified viral sequences were first segmented using sliding in SeqKit (v2.6.1) [19], with a window size of 1000 bp and a step size of 500 bp. Each segment was then subjected to BLASTn (v2.15.0) against the nt database, and the highest bitscore match for each segment was recorded, ensuring a minimum identity of 60%. The top-matching subject sequence from the output was used as the reference sequence for recombination analysis within each viral family. Next, the 102 viral sequences were clustered into viral operational taxonomic units (vOTUs) using MMseqs2, with the following parameters: easy-linclust --min-seq-id 0.99 -e 0.01 -c 0.99 --cov-mode 1 --cluster-mode 2 --kmer-per-seq-scale 0.4 --spaced-kmer-mode 0. A total of 51 vOTUs were identified and subsequently merged with the corresponding viral family reference sequences, followed by multiple sequence alignment using MAFFT with default parameters. To detect potential recombination events, six methods were employed within RDP5 (v5.23) [20]: RDP, GENECONV, Bootscan, MaxChi, SiScan, and 3Seq, with a significance threshold of p ≤ 0.01. A recombination event was considered valid by at least three of these methods. Further validation of recombination signals was conducted using SimPlot (v3.5.1) under the Kimura model, with a window size of 600 bp and a step size of 60 bp. The detected recombination events were then visualized in circos and similarity plots using circlize package (v0.4.16) in R (v4.2.2) and SimPlot (v3.5.1).

Correlation analysis

Pearson correlation analysis was used to analyze viral co-infection based on viral abundance data (RPM). The correlation matrix was calculated using the corrplot package (v0.94) in R with the Pearson method. Statistical significance of correlations was determined using cor.mtest, conf.level = 0.95. Heatmaps were generated using the corrplot package to visualize correlation relationships.

Zoonotic risk prediction

The annotation of the 51 newly identified vOTUs was performed using ORF-finder (https://www.ncbi.nlm.nih.gov/orffinder/), followed by the prediction of human infection probability through the machine learning model zoonotic_rank [21]. This model leverages viral genomic features that are partially independent of viral classification, allowing for predictions across divergent viruses and enabling the differentiation of closely related viruses to assess their zoonotic risk. The output includes a prediction results table (Supplementary Table 11). The ggplot2 package (v3.5.1) in R was used to visualize the zoonotic risk of these newly discovered viruses.

Results

Overview of pangolin samples

This study includes 83 healthy pangolins showing no apparent disease symptoms at sampling. Throat and anal swabs were collected from each individual and subjected to next-generation sequencing (NGS). Successful sequencing libraries were constructed for 83 oral swabs and 81 anal swabs (Supplementary Table 1), yielding 13.8 billion reads number and 4001.3 Gb data size after quality filtering (Supplementary Table 2). We supplemented our analysis with 62 previously published metatranscriptomic datasets from 52 diseased pangolins, comprising diverse samples including lung, spleen, lymph node, muscle, skin, mixed organ, and fecal specimens (Fig. 1A and Supplementary Table 1).

Fig. 1.

Fig. 1

Sample characterization and phylogenetic relationships of pangolins in this study. A Tissue distribution of pangolin samples: healthy (n = 83) and diseased (n = 52; from previous studies). Blue represents Manis javanica and pink for Manis pentadactyla. B Maximum likelihood phylogenetic tree based on Cytb sequences, showing genetic and geographic relationships of pangolin samples. Healthy samples are marked with light blue dots, diseased samples with orange dots. Clade-specific geographic origins are annotated outside the gray arc

Host species identification was performed through alignment of cytochrome b (Cytb) and mitochondrial sequences. Among the 83 healthy pangolins, 80 were identified as Manis javanica (Malayan pangolin) and 3 as Manis pentadactyla (Chinese pangolin). For the 52 diseased pangolins, 43 were Manis javanica and 9 were Manis pentadactyla. In cases where host species could not be determined through sequence analysis (4 samples), previously published classifications were referenced (Supplementary Table 3). To investigate geographical distribution of these pangolins, we compiled representative pangolin Cytb sequences from the NCBI database and constructed a phylogenetic tree. Our analysis revealed that Manis javanica in this study spanned all three major clades of the species, with five novel Cytb genotypes identified in healthy individuals. Manis pentadactyla healthy pangolins originated exclusively from the clade distributed in China, while diseased individuals were distributed across both clades. Additionally, one novel Cytb genotype with low homology was identified in this group (Fig. 1B).

Characterization of the pangolin viromes

To ensure standardized results, we reanalyzed the published data on diseased pangolins using our in-house developed ViralScan pipeline, which was also applied to the healthy pangolin data generated in this study. ViralScan represents a customized bioinformatics pipeline specifically designed for viral detection in metatranscriptomic datasets, incorporating specialized databases and quality control measures for wildlife samples. From the assembled metatranscriptomic data, we identified 80,790 virus-containing contigs after filtering for contig length and e-value (Supplementary Table 4). We focused our analysis on mammalian-associated viruses, excluding bacteriophages and plant viruses. Viruses with relatively high abundance (reads per million total reads > 1) and coverage (coverage > 30%) were classified as positive detections. Ultimately, we identified 43 mammalian-associated virus species across 8 families (Retroviridae was excluded from further analysis), after removing confirmed contaminants from published datasets (verified through correspondence with the original authors). In healthy pangolins, we identified 8 RNA virus species belonging to three families (Flaviviridae, Paramyxoviridae, Picornaviridae) as well as 17 DNA virus species from 3 families (Circoviridae, Papillomaviridae, Parvoviridae). In diseased pangolins, 15 RNA virus species were detected across four families (Flaviviridae, Paramyxoviridae, Pneumoviridae, Picornaviridae) along with 8 DNA virus species from 3 families (Anelloviridae, Circoviridae, Parvoviridae). Several of these viruses exhibited high abundance levels (Fig. 2A and Supplementary Table 5).

Fig. 2.

Fig. 2

Characterization of mammalian-associated viruses in pangolins. A Heatmap showing mammalian-associated virus abundance in pangolin samples. Columns represent libraries; rows represent virus species. Abundance is log-transformed reads per million. Colored blocks indicate sample characteristics and viral taxonomy. B Viral prevalence differs between oral (green) and anal (brown) samples from healthy pangolins, ranked by detection frequency. C Rarefaction curves showing viral species diversity relative to sampling effort, with shaded 95% confidence intervals

In healthy pangolin samples, we detected six mammalian-associated virus families: Circoviridae, Flaviviridae, Papillomaviridae, Paramyxoviridae, Parvoviridae, Picornaviridae. Notably, Pangolin pestivirus and Hunnivirus amagyari were found in both healthy and diseased pangolins, suggesting they may not be primary pathogens in pangolin disease. In contrast, Pneumoviridae and Anelloviridae were detected exclusively in diseased pangolins. Despite differences in the sample types between healthy and diseased individuals, several viruses with known or suspected zoonotic potential—Orthopneumovirus hominis, Orthorubulavirus mammalis, Pangolin respirovirus, and Orthorubulavirus laryngotracheitidis—were identified only in diseased pangolins. We hypothesize these viruses may contribute to disease onset or progression in pangolins.

Our analysis of 83 healthy pangolins revealed that Parvoviridae had the highest prevalence in both anal and oral swabs, with detection rates of 33.3% and 42.2%, respectively. Circoviridae was the second most prevalent family (Extended Data Fig. 1 A and Supplementary Table 6). Notably, Pangolin densovirus A1 showed the highest detection rate in both anal and oral samples. Within Parvoviridae, Densovirinae typically infects invertebrates such as insects, crustaceans, and echinoderms [22]. Given that adult pangolins consume up to 70 million insects annually [8], we believe the densovirus-related sequences identified in this study likely originate from their insect diet. Morbillivirus canis (Paramyxoviridae) was exclusively detected in anal swabs (Fig. 2B).

The diversity of viral species varied significantly across sample types. Anal samples harbored 20 mammalian-associated viruses across 34 samples, while oral samples yielded 42 virus-positive cases but only 18 mammalian-associated viral species (Fig. 2C and Extended Data Fig. 1B). These findings suggest anal samples may contain a more diverse and abundant viral community compared to oral samples.

Evolutionary relationships of mammalian-associated viruses in pangolins

A total of 102 viral genomes were newly identified and assembled—78 from healthy pangolins and 24 from diseased pangolins (Supplementary Table 7). Among these, 91 were high-quality genomes (completeness > 90%), while 10 were medium-quality (completeness > 50%) genomes (Fig. 3A). The genomes within Picornaviridae, Flaviviridae, and Parvoviridae displayed marked genetic diversity relative to existing databases (Fig. 3B). Notably, 48 of 56 Parvoviridae sequences were high-quality, likely due to high viral abundance and homology with reference strains (Fig. 3A, B and Supplementary Table 5).

Fig. 3.

Fig. 3

Diversity of mammalian-associated viruses in pangolins. A Genome completeness of viral families (>50% medium-quality, >90% high-quality). B Identity comparison of viral genome across virus families (mean ± SD). C Maximum likelihood phylogenetic trees (midpoint rooted) based on conserved proteins: RdRp (RNA viruses), Rep (Circoviridae), L1 (Papillomaviridae), NS1 (Parvoviridae). Symbols: blue dots (healthy pangolins, this study), red dots (diseased pangolins, published), pink squares (genera with human viruses), dark blue squares (genera without human infections), gray triangles (collapsed clades)

To infer viral phylogenetic relationships, we analyzed conserved viral genes: RNA-dependent RNA polymerase (RdRp) for RNA viruses (Flaviviridae, Paramyxoviridae, Picornaviridae) and hallmark proteins (Rep for Circoviridae, L1 for Papillomaviridae, NS1 for Parvoviridae) for DNA viruses (Fig. 3C).

Flaviviridae: this family of enveloped ssRNA(+) viruses includes Pestivirus genera, among others [23]. We identified seven pangolin-derived Pestivirus sequences, sharing 69.3–80.3% amino acid identity with their closest ICTV reference strain. A distinct monophyletic cluster of seven RdRp protein sequences suggests novel Pestivirus lineages in pangolins. As no significant association was observed between Pestivirus presence and host health status, it is unlikely the causative agent of disease in pangolins (Supplementary Table 8).

Paramyxoviridae: enveloped virus with ssRNA(−) genome [24]. We identified three RdRp protein sequences: two from diseased pangolins exhibited 99.5% amino acid identity to Parainfluenza virus 5 (genus Orthorubulavirus), phylogenetically clustering with a swine-derived strain PP189887.1 (Extended Data Fig. 2). A third sequence from a healthy pangolin matched Morbillivirus canis, sharing 99.5% amino acid identity. This virus, commonly known as Canine Distemper Virus (CDV) [25], is a contagious pathogen that spreads through the lymphatic, epithelial, and nervous systems of domestic dogs and wildlife, and has been documented in at least six different mammalian orders and over 20 families [26]. Phylogenetic analysis revealed that this pangolin sequence clustered with mink-derived CDV strains (Extended Data Fig. 3). The detection of Orthorubulavirus mammalis and Morbillivirus canis sequences highlights the diversity of Paramyxoviridae in pangolins, warranting further surveillance and phylogenetic analysis to assess their evolutionary origins and zoonotic potential (Supplementary Table 8).

Picornaviridae: five novel viral genomes were identified. One clustered within Shanbavirus genus, sharing 65.2% amino acid identity to closest ICTV strain, suggesting a potentially novel lineage. Three others aligned with Hunnivirus amagyari (genus Hunnivirus) but low RdRp identity (exhibited 19.8–45.2%) compared to seven representative strains, indicating substantial divergence. Collectively, these newly identified Picornaviridae sequences highlight unexplored viral diversity within Picornaviridae (Supplementary Table 8).

Circoviridae: we identified 15 L1 protein sequences. Twelve from healthy pangolins cluster within Circovirus porcine 1 (92–98.3% amino acid identity), while one from a diseased pangolin groups with Cyclovirus irsi and exhibits 68.9% identity to human Cyclovirus maanav. Two sequences divergent from ICTV reference strains show 99.6–100% identity to viruses from birds (Fringilla montifringilla) and mollusks (Gonidea angulata), underscoring cross-host evolutionary links (Supplementary Table 8).

Papillomaviridae: we identified six L1 protein sequences from healthy pangolins, genetically distinct from the ICTV reference strains, but closely related to three unclassified pangolin Papillomaviridae sequences. Among them, two cluster with Manis javanica papillomavirus 1 (98.4–99.2% identity), three with Papillomavirus manis 7551 (99–99.8% identity), and one with Manis pentadactyla papillomavirus 1 (99.4% identity). These findings highlight extensive genetic diversity and uncharacterized lineages within this family (Supplementary Table 8).

Parvoviridae: non-enveloped ssDNA viruses, replicating via a rolling-hairpin mechanism [27]. We identified 53 NS1 protein sequences in both healthy and diseased pangolins, clustered with Copiparvovirus and Protoparvovirus. For Copiparvovirus, the pangolin-derived sequences showed 79.5–95.7% amino acid identity to three ICTV reference strains isolated from pangolins. In Protoparvovirus, the closest match was Protoparvovirus carnivoran 1, sharing 99.2–100% amino acid identity. This viral species encompasses Canine parvovirus and Feline panleukopenia virus. These viruses are known to infect diverse hosts, including cats [28], dogs [29], blue foxes [30], minks [31], giant panda [32], and raccoons [33], and now pangolins. Additionally, one NS1 sequence from a sick pangolin was classified within Blattambidensovirus, sharing 99.2% amino acid identity with Blattambidensovirus incertum 1. Notably, 39 NS1 proteins exhibited substantial genetic divergence from ICTV reference strains. Following an expanded reference database search, these sequences showed 81.8–99.6% amino acid identity to Zophobas morio black wasting virus, further highlighting the broad genetic diversity within this virus family (Supplementary Table 8).

Recombination analysis of novel pangolin viruses

We obtained 51 unique vOTUs from clustering 102 novel pangolin viral genomes across six families. Recombination analysis was performed for these vOTUs along with ICTV representative strains using recombination detection program (RDP5) and confirmed with SimPlot. We identified three recombination events within Parvoviridae and Paramyxoviridae (Fig. 4A and Supplementary Table 9).

Fig. 4.

Fig. 4

Recombination in Parvoviridae and Paramyxoviridae. A Circular representation of recombination events. Major parent (deep rose), minor parent (purple), and recombinant (coral) segments are connected according to recombination regions. B Similarity plot analysis of recombinant sequence MF926599-Dog against references JN896331-dog and UN_MBA191024-Paramyxoviridae-1. Kimura model, 600 bp window, 60 bp step. C Phylogenetic trees of pre-recombination, recombination, and post-recombination regions. Colored dots mark recombinant sequences; gray triangles mark collapsed clades

Notably, we detected recombination involving a Morbillivirus canis (Paramyxoviridae)-associated sequence from a healthy pangolin (named UN_MBA191024-Paramyxoviridae-1) and two canine Morbillivirus canis strains (named MF926599-dog and JN896331-dog). Similarity plot analysis revealed a high nucleotide identity (99.2%) between MF926599-Dog and the pangolin-derived sequence within the 4217–5480 bp region (Fig. 4B and Supplementary Table 9). Phylogenetic analysis showed that while MF926599-Dog clustered with JN896331-dog in non-recombinant regions, it grouped closely with the pangolin sequence within the recombinant segment (Fig. 4C). This evidence of cross-species recombination between pangolin- and dog-derived Morbillivirus canis sequences highlights potential expanded viral host range and transmission dynamics. These findings underscore the need for enhanced surveillance of paramyxoviruses in pangolins and their ecological contacts, particularly given the zoonotic potential and conservation implications of such viral exchanges.

Co-infection correlation analysis of novel viruses in healthy pangolins

The correlation analysis of viral co-infections in healthy pangolins, based on Pearson correlation and using RPM as a measure of abundance, revealed significant positive correlations among certain viruses from Circoviridae, Paramyxoviridae, Parvoviridae, and Picornaviridae (Fig. 5). Notably, among the seven virus pairs with correlation coefficients exceeding 0.75, Copiparvovirus P171T/pangolin/2018 and Pangolin protoparvovirus within Parvoviridae exhibit a strong positive correlation (r = 0.87), while Fringilla montifringilla Circoviridae sp. shows high co-infection rates with these two viruses (r ≥ 0.88). Similarly, Rhinolophus bat shanbavirus and Morbillivirus canis demonstrate a significant correlation (r = 1.0), suggesting potential shared transmission routes. In contrast, some viruses like Hunnivirus P236T (Picornaviridae) and Circovirus porcine1 (Circoviridae) showed weak correlations, implying independent infection patterns. While certain viruses displayed weak negative correlations, these were not statistically significant and thus not further discussed (Supplementary Table 10). These findings provide insights into viral community dynamics in pangolin populations and may inform future research on co-infection mechanisms and ecological interactions.

Fig. 5.

Fig. 5

Co-infection analysis in healthy pangolins. Heatmap depicting Pearson correlation of viral abundance levels. Blue shades represent positive correlations, white represents no correlation, and red denotes negative correlations. Statistically significant correlations (p < 0.05) are marked with asterisks (*p < 0.05, **p < 0.01, ***p < 0.001). Numbers within cells show correlation values

Identifying viruses with zoonotic potential

Theoretical and empirical evidence suggests that viral genomes may contain signatures predictive of human infectivity [34]. We assessed the zoonotic potential of 44 vOTUs (exclude Densovirinae) using zoonotic_rank, a machine learning framework integrating genomic features, host range, and ecological characteristics to predict zoonotic risk [21]. This approach provides a rapid, cost-effective risk assessment tool for viral surveillance.

Results revealed a spectrum of zoonotic risks: 16 vOTUs were classified as “high,” 17 as “medium,” and 11 as “low” risk (Supplementary Table 11). Notably, viruses from Flaviviridae, Parvoviridae, Papillomaviridae, and Picornaviridae families show high zoonotic potential. Pangolin pestivirus, Pangolin protoparvovirus, and Manis javanica papillomavirus 1, detected in healthy pangolins, exhibit particularly high predicted probabilities of human infection, suggesting potential cross-species transmission risks.

In contrast, Circovirus, Hunnivirus, and unclassified Parvoviridae sequences demonstrate lower zoonotic potential, suggesting more restricted host ranges. The 95% interquartile range bars reveal the variability in predictions, with some high-risk viruses showing wide intervals, reflecting model uncertainties likely due to ambiguous host classifications in the training data (Fig. 6).

Fig. 6.

Fig. 6

Predicted zoonotic probability of novel pangolin viruses. Predicted probabilities of human infection for the novel vOTUs identified in pangolin. Species names and sequence counts corresponding to each vOTU are annotated on the left, while genus and family classifications appear on the right. Asterisks (*) mark viruses exclusively detected in diseased pangolins. Each bar represents the 95% interquartile range of predicted probability, calculated from the top 10% of model iterations excluding the target species. Circles denote mean predicted probabilities, with a dashed line showing the threshold of 0.293

Discussion

The emergence of zoonotic diseases from wildlife reservoirs represents one of the most significant challenges to global public health [35]. Understanding the virome of wildlife species, particularly those with established zoonotic potential, is critical for developing proactive strategies to prevent future pandemics. Given the role of pangolins as critical intermediate hosts for cross-species viral transmission, this study characterizes the virome of 83 healthy pangolins and 52 previously published diseased individuals, integrating metatranscriptomic, phylogenetics, recombination analysis, and machine learning-based zoonotic risk assessment. Our findings significantly advance the understanding of pangolin-associated viruses, revealing novel viral diversity, complex co-infection patterns, and potential zoonotic threats.

Previous studies on pangolin virome have primarily focused on diseased or deceased individuals [4, 36, 37], leaving a critical gap in our knowledge of baseline viral ecology in healthy individuals. By expanding the virome dataset on healthy pangolins, our study provides valuable reference information for distinguishing commensal viruses from potential pathogens carried by pangolins. Although a prior virome study of healthy pangolins identified several viruses with zoonotic potential (e.g., circoviruses, rotaviruses, astroviruses) [8], we identified 23 mammalian-associated viruses across six families—Circoviridae, Parvoviridae, Picornaviridae, Papillomaviridae, Paramyxoviridae, and Flaviviridae—significantly broadening the known pangolin virome. This comprehensive overview allows for meaningful comparisons between different viral communities and suggests that certain viruses detected exclusively in diseased pangolins—including Orthopneumovirus hominis and Orthorubulavirus mammalis—may contribute to morbidity and mortality, especially given their high potential of cross-species transmission. In contrast, viruses like Pangolin pestivirus and Hunnivirus amagyari, found in both healthy and diseased hosts, likely represent components of commensal virome rather than pathogens. Our analysis detected no coronavirus in healthy pangolins using oral/anal swabs. This contrasts with prior tissue-based studies of diseased pangolins and implies that detectable viral replication and shedding may be elevated in compromised animals [3840].

Phylogenetic analyses revealed several highly divergent viral sequences with low amino acid identity to ICTV-designated reference strains, indicating the presence of potentially novel viral taxa. Notably, certain Circoviridae sequences exhibit substantial genetic divergence from representative strains, underscoring the need for updated viral classification frameworks to accommodate this expanded genetic diversity. These findings not only broaden our knowledge of pangolin virome but also provide critical data for evaluating cross-species transmission risks.

The co-infection analysis revealed significant positive correlations between specific viruses, suggesting potential viral interactions or shared transmission pathways within pangolin hosts. The strong correlation between Copiparvovirus P171T/pangolin/2018 and Pangolin protoparvovirus, along with their co-occurrence with Fringilla montifringilla Circoviridae sp., indicates possible ecological or biological factors facilitating their joint persistence. Furthermore, the recombination event involving Morbillivirus canis sequences from both pangolins and domestic dogs highlights the dynamic nature of viral evolution at the wildlife-domestic animal interface and underscores the importance of cross-species surveillance.

Using the machine learning-based tool zoonotic_rank [21], we assessed the zoonotic potential of newly identified vOTUs. While none of the viruses detected in healthy pangolins was classified as “very high” risk, 16 vOTUs exhibited “high” predicted zoonotic potential, including Pangolin pestivirus, Pangolin protoparvovirus, and Manis javanica papillomavirus 1. These predictions demonstrate the utility of genome-based risk assessment tools in prioritizing viruses for further study before clinical emergence [34]. This approach enables efficient allocation of surveillance resources and enhances early warning systems for potential zoonotic spillover events. However, the high variability of some predictions underscores the need for cautious interpretation of predictive models and incorporation of ecological context when assessing zoonotic threats. While these predictions provide valuable directional insights for surveillance prioritization, genomic characterization and functional studies need to be further conducted to better understand spillover potential.

Our findings reinforce the necessity of long-term virome surveillance in wildlife populations, including asymptomatic individuals, to detect potential zoonotic threats before they impact human or animal health. However, several limitations should be acknowledged. A key constraint arises from the nature of our samples; as the pangolins were rescued from the illegal wildlife trade, their precise geographic origins are uncertain, which precludes meaningful spatial analysis linking the detected viruses to human disease prevalences in source locations. Furthermore, since our focus was on mammalian viruses infecting eukaryotic hosts, our study did not include an analysis of bacteriophages. Other limitations include the restricted sample types, the absence of functional validation of viral infectivity and potential confounding effects of dietary or environmental contamination remain important considerations. Future work should incorporate host transcriptomic responses, virus isolation efforts, and experimental infection studies to evaluate their potential of cross-species infection and pathogenesis.

Conclusions

In conclusion, this study significantly advances our understanding of the pangolin virome and its relevance to zoonotic surveillance. By establishing baseline viral profiles in healthy individuals and identifying high-risk candidates, we provide a foundation for more targeted monitoring and risk assessment. As wildlife continues to face unprecedented environmental pressures, proactive virome characterization can be employed as a critical strategy in global health preparedness, offering important insight into the viral landscape before cross-species transmission events occur.

Supplementary Information

Supplementary Material 1. (24.2KB, xlsx)
Supplementary Material 3. (458.1KB, xlsx)
Supplementary Material 7. (21.6KB, xlsx)
Supplementary Material 9. (74.6KB, xlsx)

Acknowledgements

We thank Zhengli Shi, Peng Zhou, Mang Shi, Daxi Wang and Nailou Zhang for helpful discussions and suggestions. We acknowledge the support of the Data Science Platform of Guangzhou National Laboratory and the Bio-medical Big Data Operating System (Bio-OS).

Authors’ contributions

T.Y.D., and L.P.Z. conceived the idea and planned the experiments and analyses. T.Y.D. performed the bioinformatic analyses with the help of Q.W. T.Y.D., and L.P.Z. wrote the manuscript. T.C.Q, S.S.L., Y.J.L., M.H.H., Y.L.Z. and H.R.S. contributed to the collection of the pangolin samples. B.L., Y.Z, Y.C., C.L., M.J.Q. performed PCR and sequencing experiments. J.S. and K.X.Y performed part of data collation. B.H. and Q.Y. reviewed and revised the manuscript. All authors read and commented on the manuscript.

Funding

This study was funded by grants from the National Key R&D Program of China (2022YFC2305100), the Major Project of Guangzhou Laboratory (Grant No. GZNL2023A01001), and Open Research Fund Program of CAS Key Laboratory of Special Pathogens and Biosafety, Chinese Academy of Sciences (2020SPCAS001).

Data availability

The datasets supporting the conclusions of this article are included within the article and its additional files. The raw read data from healthy pangolin samples generated in this study were submitted to the Genome Sequence Archive (GSA; https://ngdc.cncb.ac.cn/gsa/) [41, 42] under Project Accession: CRA025689. This study reused raw sequencing data from diseased pangolins, obtained from the following repositories: NCBI SRA (accessions: SRR10168373-93, SRR11119759, SRR11119762-67, SRR12053850, SRR17481175-76, SRR17481184, SRR17481195, SRR17481206, SRR17481246-47, SRR17481249, SRR17509909-11, SRR17509913, SRR17509917, SRR17509928, SRR17509939; https://www.ncbi.nlm.nih.gov/sra), GSA (accessions: CRR204937, CRR501292-CRR501307), China National GeneBank Sequence Archive (CNSA; accession: CNP0001573: https://db.cngb.org/search/project/CNP0001573/). See Supplementary Table 1 for details. The newly assembled viral genome sequences generated in this study have been deposited in GenBase under accession numbers C_AA107816.1–C_AA107893.1 (from healthy pangolins) and C_AA107792.1–C_AA107815.1 (from diseased pangolins). All data used for plotting is stored in source data (Source data). Intermediate analysis files and code related to this study are available on GitHub at https://github.com/Eager144000/Healthy-Pangolin-Virome.

Ethics approval and consent to participate

A batch of smuggled pangolins were intercepted by Chinese customs and transported to the Guangxi Zhuang Autonomous Region Land Wildlife Medical Rescue and Epidemic Surveillance Research Center after rescued under ethical approval. Samples including 83 oral swabs and 81 anal swabs were collected from 83 healthy pangolins, transported and stored following standard procedures according to the animal ethics permit (WIVA05201705) issued by the Wuhan Institute of Virology.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Tianyi Dong, Qi Wang and Tengcheng Que contributed equally to this work.

Contributor Information

Qingyu Xiao, Email: xiao_qingyu@gzlab.ac.cn.

Ben Hu, Email: huben@wh.iov.cn.

Leiping Zeng, Email: zenglp@wh.iov.cn.

References

  • 1.Tan CCS, van Dorp L, Balloux F. The evolutionary drivers and correlates of viral host jumps. Nat Ecol Evol. 2024. 10.1038/s41559-024-02353-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Jones KE, Patel NG, Levy MA, Storeygard A, Balk D, Gittleman JL, et al. Global trends in emerging infectious diseases. Nature. 2008;451(7181):990–3. 10.1038/nature06536. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Ye RZ, Wang XY, Li YY, Wang BY, Song K, Wang YF, et al. Systematic review and integrated data analysis reveal diverse pangolin-associated microbes with infection potential. Nat Commun. 2023;14(1):6786. 10.1038/s41467-023-42592-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Shi W, Shi M, Que TC, Cui XM, Ye RZ, Xia LY, et al. Trafficked Malayan pangolins contain viral pathogens of humans. Nat Microbiol. 2022;7(8):1259–69. 10.1038/s41564-022-01181-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Choo SW, Rayko M, Tan TK, Hari R, Komissarov A, Wee WY, et al. Pangolin genomes and the evolution of mammalian scales and immunity. Genome Res. 2016;26(10):1312–22. 10.1101/gr.203521.115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Peng MS, Li JB, Cai ZF, Liu H, Tang X, Ying R, et al. The high diversity of SARS-CoV-2-related coronaviruses in pangolins alerts potential ecological risks. Zool Res. 2021;42(6):834–44. 10.24272/j.issn.2095-8137.2021.334. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Li L, Wang X, Hua Y, Liu P, Zhou J, Chen J, et al. Epidemiological study of betacoronaviruses in captive Malayan pangolins. Front Microbiol. 2021;12:657439. 10.3389/fmicb.2021.657439. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Tian FJ, Li J, Liu WL, Liu YJ, Hu YJ, Tu QH, et al. Virome in healthy pangolins reveals compatibility with multiple potentially zoonotic viruses. Zool Res. 2022;43(6):977–88. 10.24272/j.issn.2095-8137.2022.246. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Chen S, Zhou Y, Chen Y, Gu J. Fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34(17):i884–90. 10.1093/bioinformatics/bty560. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9(4):357–9. 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Li D, Liu C-M, Luo R, Sadakane K, Lam T-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de bruijn graph. Bioinformatics. 2015;31(10):1674–6. 10.1093/bioinformatics/btv033. [DOI] [PubMed] [Google Scholar]
  • 12.Buchfink B, Reuter K, Drost HG. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat Methods. 2021;18(4):366–8. 10.1038/s41592-021-01101-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Nayfach S, Camargo AP, Schulz F, Eloe-Fadrosh E, Roux S, Kyrpides NC. Checkv assesses the quality and completeness of metagenome-assembled viral genomes. Nat Biotechnol. 2021;39(5):578–85. 10.1038/s41587-020-00774-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, et al. Geneious basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics. 2012;28(12):1647–9. 10.1093/bioinformatics/bts199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22(13):1658–9. 10.1093/bioinformatics/btl158. [DOI] [PubMed] [Google Scholar]
  • 16.Nakamura T, Yamada KD, Tomii K, Katoh K. Parallelization of MAFFT for large-scale multiple sequence alignments. Bioinformatics. 2018;34(14):2490–2. 10.1093/bioinformatics/bty121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, et al. IQ-tree 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol. 2020;37(5):1530–4. 10.1093/molbev/msaa015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Steinegger M, Soding J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol. 2017;35(11):1026–8. 10.1038/nbt.3988. [DOI] [PubMed] [Google Scholar]
  • 19.Shen W, Sipos B, Zhao L. SeqKit2: a swiss army knife for sequence and alignment processing. Imeta. 2024;3(3):e191. 10.1002/imt2.191. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Martin DP, Varsani A, Roumagnac P, Botha G, Maslamoney S, Schwab T, et al. RDP5: a computer program for analyzing recombination in, and removing signals of recombination from, nucleotide sequence datasets. Virus Evol. 2021;7(1):veaa087. 10.1093/ve/veaa087. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Mollentze N, Babayan SA, Streicker DG. Identifying and prioritizing potential human-infecting viruses from their genome sequences. PLoS Biol. 2021;19(9):e3001390. 10.1371/journal.pbio.3001390. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Cotmore SF, Agbandje-McKenna M, Canuti M, Chiorini JA, Eis-Hubinger AM, Hughes J, et al. ICTV virus taxonomy profile: Parvoviridae. J Gen Virol. 2019;100(3):367–8. 10.1099/jgv.0.001212. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Simmonds P, Becher P, Bukh J, Gould EA, Meyers G, Monath T, et al. ICTV virus taxonomy profile: Flaviviridae. J Gen Virol. 2017;98(1):2–3. 10.1099/jgv.0.000672. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Rima B, Balkema-Buschmann A, Dundon WG, Duprex P, Easton A, Fouchier R, et al. ICTV virus taxonomy profile: Paramyxoviridae. J Gen Virol. 2019;100(12):1593–4. 10.1099/jgv.0.001328. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Colina SE, Williman MM, Tizzano MA, Serena MS, Echeverria MG, Metz GE. Morbillivirus canis infection induces activation of three branches of unfolded protein response, MAPK and apoptosis. Viruses. 2024. 10.3390/v16121846. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Duque-Valencia J, Sarute N, Olarte-Castillo XA, Ruiz-Saenz J. Evolution and interspecies transmission of canine distemper virus-an outlook of the diverse evolutionary landscapes of a multi-host virus. Viruses. 2019. 10.3390/v11070582. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Chen S, Liu F, Yang A, Shang K. For better or worse: crosstalk of parvovirus and host DNA damage response. Front Immunol. 2024;15:1324531. 10.3389/fimmu.2024.1324531. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Stuetzer B, Hartmann K. Feline parvovirus infection and associated diseases. Vet J. 2014;201(2):150–5. 10.1016/j.tvjl.2014.05.027. [DOI] [PubMed] [Google Scholar]
  • 29.Temizkan MC, Sevinc Temizkan S. Canine parvovirus in Turkey: first whole-genome sequences, strain distribution, and prevalence. Viruses. 2023. 10.3390/v15040957. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Xiao-ying C, Zhi-jing X, Zhong-peng Z, Shi-jin J, Hong-kun Z, Yan-li Z, et al. Genetic diversity of parvovirus isolates from dogs and wild animals in China. J Wildl Dis. 2011;47(4):1036–9. 10.7589/0090-3558-47.4.1036. [DOI] [PubMed] [Google Scholar]
  • 31.Fei-fei D, Yong-feng Z, Jian-li W, Xue-hua W, Kai C, Chuan-yi L, et al. Molecular characterization of feline panleukopenia virus isolated from mink and its pathogenesis in mink. Vet Microbiol. 2017;205:92–8. 10.1016/j.vetmic.2017.05.017. [DOI] [PubMed] [Google Scholar]
  • 32.Zhao S, Hu H, Lan J, Yang Z, Peng Q, Yan L, et al. Characterization of a fatal feline panleukopenia virus derived from giant panda with broad cell tropism and zoonotic potential. Front Immunol. 2023;14:1237630. 10.3389/fimmu.2023.1237630. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Chen C, Tao J, Tang L, Sun T, Sun Z, Xu H, et al. First isolation and characterization of feline panleukopenia virus from wild raccoon dogs in the residential area of Shanghai, China. Vet Med Sci. 2024;10(6):e70071. 10.1002/vms3.70071. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Babayan SA, Orton RJ, Streicker DG. Predicting reservoir hosts and arthropod vectors from evolutionary signatures in RNA virus genomes. Science. 2018;362(6414):577–80. 10.1126/science.aap9072. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Morens DM, Fauci AS. Emerging pandemic diseases: how we got to COVID-19. Cell. 2020;182(5):1077–92. 10.1016/j.cell.2020.08.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Que T, Li J, He Y, Chen P, Lin W, He M, et al. Human parainfluenza 3 and respiratory syncytial viruses detected in pangolins. Emerg Microbes Infect. 2022;11(1):1657–63. 10.1080/22221751.2022.2086071. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Ning S, Dai Z, Zhao C, Feng Z, Jin K, Yang S, et al. Novel putative pathogenic viruses identified in pangolins by mining metagenomic data. J Med Virol. 2022;94(6):2500–9. 10.1002/jmv.27564. [DOI] [PubMed] [Google Scholar]
  • 38.Xiao K, Zhai J, Feng Y, Zhou N, Zhang X, Zou JJ, et al. Isolation of SARS-CoV-2-related coronavirus from Malayan pangolins. Nature. 2020;583(7815):286–9. 10.1038/s41586-020-2313-x. [DOI] [PubMed] [Google Scholar]
  • 39.Lam TT, Jia N, Zhang YW, Shum MH, Jiang JF, Zhu HC, et al. Identifying SARS-CoV-2-related coronaviruses in Malayan pangolins. Nature. 2020;583(7815):282–5. 10.1038/s41586-020-2169-0. [DOI] [PubMed] [Google Scholar]
  • 40.Cui X, Fan K, Liang X, Gong W, Chen W, He B, et al. Virus diversity, wildlife-domestic animal circulation and potential zoonotic viruses of small mammals, pangolins and zoo animals. Nat Commun. 2023;14(1):2488. 10.1038/s41467-023-38202-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Chen T, Chen X, Zhang S, Zhu J, Tang B, Wang A, et al. The genome sequence archive family: toward explosive data growth and diverse data types. Genomics Proteomics Bioinformatics. 2021;19(4):578–83. 10.1016/j.gpb.2021.08.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Members C-N, Partners. Database resources of the National Genomics Data Center, China National Center for Bioinformation in 2022. Nucleic Acids Res. 2022;50(D1):D27–38. 10.1093/nar/gkab951. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1. (24.2KB, xlsx)
Supplementary Material 3. (458.1KB, xlsx)
Supplementary Material 7. (21.6KB, xlsx)
Supplementary Material 9. (74.6KB, xlsx)

Data Availability Statement

The datasets supporting the conclusions of this article are included within the article and its additional files. The raw read data from healthy pangolin samples generated in this study were submitted to the Genome Sequence Archive (GSA; https://ngdc.cncb.ac.cn/gsa/) [41, 42] under Project Accession: CRA025689. This study reused raw sequencing data from diseased pangolins, obtained from the following repositories: NCBI SRA (accessions: SRR10168373-93, SRR11119759, SRR11119762-67, SRR12053850, SRR17481175-76, SRR17481184, SRR17481195, SRR17481206, SRR17481246-47, SRR17481249, SRR17509909-11, SRR17509913, SRR17509917, SRR17509928, SRR17509939; https://www.ncbi.nlm.nih.gov/sra), GSA (accessions: CRR204937, CRR501292-CRR501307), China National GeneBank Sequence Archive (CNSA; accession: CNP0001573: https://db.cngb.org/search/project/CNP0001573/). See Supplementary Table 1 for details. The newly assembled viral genome sequences generated in this study have been deposited in GenBase under accession numbers C_AA107816.1–C_AA107893.1 (from healthy pangolins) and C_AA107792.1–C_AA107815.1 (from diseased pangolins). All data used for plotting is stored in source data (Source data). Intermediate analysis files and code related to this study are available on GitHub at https://github.com/Eager144000/Healthy-Pangolin-Virome.


Articles from Microbiome are provided here courtesy of BMC

RESOURCES