Skip to main content
. 2015 Aug 15;404(2):149–163. doi: 10.1016/j.ydbio.2015.05.014
Terminology
The following terms used in this study are explained explicitly to avoid possible confusion with similar terms used elsewhere.
EST:Expressed Sequence Tag: single pass Sanger sequence from either end of a cloned mRNA.
EST cluster: computationally organised and assembled discrete group of ESTs, ideally containing all the ESTs from one gene and no ESTs from other genes.
Cluster consensus sequence: predominant sequence determined over the multiple aligned sequences in an EST cluster. Compensates for errors in single pass sequencing and may yield an accurate mRNA sequence.
Sub-cluster: an EST cluster may be composed of one layer of sub-clusters. These arise either by joining primary clusters after initial cluster assembly based on paired end data or similarity metrics, or by post-assembly decomposition into distinct transcripts. Sub-clusters have their own consensus sequence, and post-assembly sub-clusters undergo independent ORF analysis.
Gene model: physical map of exons and introns identified as belonging to a gene locus. They may be generated by gene modelling computer programs (usual) or manually.
Transcript model: gene model for a specific transcript of a gene locus.
Singleton: EST sequence not assembled with other EST into a cluster.
Full-ORF clone: cDNA clone determined (usually computationally) to contain both the initiator methionine of the encoded protein and the stop codon.
Gene coverage: Coverage is a measure of the proportion of gene loci for which we have one or more full-ORF cDNA clones in our GATEWAY collection.