Table 2.
Gene model options
| Gene Model | Strengths | Weaknesses |
|---|---|---|
| ENSEMBL | Reliable cross-species gene annotation via combined manual & automated methods; supports transcript diversity & comparative genomics; regularly updated with VEP & BioMart links; core genome browser support. | Annotation varies by species; complex transcripts could be inconsistently modelled; dependent on quality of assembly genome |
| GENCODE | High-quality gene annotation for human/mouse; includes lncRNAs, pseudogenes, & transcripts; integrates manual curation & automation; captures transcript diversity; used within Ensembl, RefSeq, & UCSC | Incomplete experimental support for all transcripts; redundancy & unclear function in many lncRNAs / pseudogenes; manual curation limits scalability; inter-version differences may affect coordinate tracking. |
| RefSeq |
High-quality, curated reference sequences for genomic, transcript, & protein data; consistent across species annotations; integrate manual curation with scalable automation; widely adopted in tools like VEP, ANNOVAR, GATK. |
RefSeq tends to be conservative and includes fewer transcript isoforms; RefSeq updates less frequently than other models; RefSeq is centrally managed by NCBI and no community input |
| UCSC |
Curated gene models from mRNA/protein alignments enhances RNA-seq quantification; emphasizes reliable transcripts & simplifies isoform sets for reproducible gene counts; integrated with UCSC Genome Browser. |
Limited transcript diversity & isoforms & non-coding RNAs. Fewer splice junctions reduce RNA-seq accuracy; biased toward canonical genes |
| Uniprot | Provides detailed protein-level annotation including domains, function, and subcellular localization | Does not directly annotate variants or regulatory elements; protein-focused |