Skip to main content
. 2022 Jul 8;119(28):e2122301119. doi: 10.1073/pnas.2122301119

Fig. 1.

Fig. 1.

Assembly of transcriptome. (A) Assembly and annotation pipeline. Illumina paired-end reads (n = 2.97 billion) were trimmed with Trimmomatic and processed through three assemblers: DN-Trinity, which was used in all pilot assemblies; genome-guided de novo Trinity (GG-Trinity); and StringTie. Three assemblies were combined using PASA, and both the PASA contigs (n = 341,114) and the DN-Trinity contigs (n = 583,596) were evaluated with TransRate and filtered using custom combined TransRate contig scores (see SI Appendix, Methods). The combined 159,559 contigs were clustered based on ORFs with CD-HIT, and the longest ORF per cluster was identified as the unigene, yielding 71,104, which reduced to 39,486 unigenes with a minimum ORF length of 400 bp. Annotation was performed by Trinotate, supplemented with manual annotation. (*Custom Unigene Caller and §Manual annotation curation are described in SI Appendix, Methods) (B) A set of 57 Reference Transcripts was used to assess each set of contigs to obtain a more complete transcriptome assembly. This plot shows the percent coverage of Reference Transcripts by longest contig per transcript from the Broad 2013 Trinity transcriptome assembly, the final DN-Trinity assembly, and the combined set of contigs from PASA and DN-Trinity, after filtering through TransRate. 5HTR, serotonin receptor; AC, adenylyl cyclase; CBP, CREB-binding protein; CaV2.2, voltage-gated calcium channel type 2; eEF2K, eukaryotic elongation factor 2 kinase; eIF4G, eukaryotic initiation factor 4G; Eph_R_Part, ephrin receptor partial; FMRP, fragile X mental retardation protein; KHC, kinesin heavy chain; KIBRA, kidney and brain adapter protein; NaCh, sodium channel; RICTOR, rapamycin-insensitive companion of mammalian target of rapamycin; PKC, protein kinase C; TRK, tropomycin receptor kinase. Asterisks indicate substantial improvement of Reference Transcript sequences with the inclusion of contigs from genome-guided assemblers, relative to DN-Trinity contigs. (C) BUSCO scores for the 978 metazoan conserved proteins for predicted protein sequences from the 2013 Broad assembly, the DN-Trinity assembly, and the final combined set of unigenes.