Skip to main content
. 2019 Jun 4;12:315. doi: 10.1186/s13104-019-4343-8

Table 2.

Number and length of human exons and introns in protein-coding transcripts

Exons (E) Coding exonsa Introns (I)
Number
 Total entries 562,164 512,303 512,530
 Total non-redundant entries 159,652 151,285 148,092
 Median per transcript 9.0 8.0 8.0
 Mean per transcript 11.3 10.3 10.3
 SD per transcript 9.6 9.6 8.6
 Min per transcript

1

(1074 transcripts; 1068 genes)

1

(3157 transcripts; 2117 genes)

1

(1960 transcripts; 1572 genes)

 Max per transcript 363 (TTN, chr2) 362 (TTN, chr2) 362 (TTN, chr2)
Length
 Median

131 bp

Not lastb: 124 bp

120 bp 1747 bp
 Median non-redundant

142 bp

Not lastb: 130 bp

121 bp 1742 bp
 Mean

311 bp

Not lastb: 159 bp

160 bp 6938 bp
 Mean non-redundant

371 bp

Not lastb: 177 bp

171 bp 7397 bp
 SD

744 bp

Not lastb: 205 bp

254 bp 22,163 bp
 SD non-redundant

828 bp

Not lastb: 242 bp

293 bp 24,263 bp
 Shortest

2 bp

(GRK6, E16; SEPT7, E2)

1 bp

(e.g., GSTP1, last base of E1)

26 bp (XBP1, I4)

30 bp (RBP5, I2 and MST1L, I9)

 Longest

27,303 bp

(GRIN2B, E13, last, with 1857 coding bp)

21,693 bp

(MUC16, E3)

1,160,411 bp

(ROBO2, I2)

 Total 174,797,813 bp 82,144,360 bp 3,555,747,074 bp
 Total non-redundant 59,281,518 bp 25,840,698 bp 1,095,434,245 bp

Median, mean, SD, min and max number of exons or coding exons per transcript were calculated exploiting Excel functions in Transcripts.xlsx file (containing data exported from GeneBase “Transcripts” table, i.e. retrieved records with a VALIDATED or REVIEWED RefSeq status with an “NM_” type of corresponding RefSeq RNA accession number belonging to genes with a VALIDATED or REVIEWED RefSeq status, excluding “not in current annotation release” records). Number of introns per transcript was estimated assuming: (number of exons—1). Minimum number of introns per transcript was found excluding mono-exonic genes. Number of genes with one exon can be retrieved filtering Excel rows for Exons_per_RNA equal to 1, copying the retrieved gene symbols in a new sheet and applying the Excel “Advanced Filter” called “Unique records only”. Number of genes with one intron can be found with the same procedure, filtering Excel rows for Exons_per_RNA greater than 1. Length values were calculated exploiting Excel functions in Gene_Table.xlsx file containing data exported from GeneBase “Gene_Table” table (retrieved as above). When calculations were performed on filtered data, “AGGREGATE” Excel function was used. Exon and intron non-redundant sets were found counting only one exon or intron for each group of exons or introns present in multiple transcript isoforms, i.e. filtering for Excel rows containing “Yes” in the relative Non_Redundant column. Values were calculated for the total number of entries when “non-redundant” is not specified. Total number of entries was calculated in Gene_Table.xlsx file using Excel “Count number” function for each column containing length_bp values, filtering to select non-redundant entries when indicated. Total length for each feature was calculated in Gene_Table.xlsx file using Excel “Sum” function for each column, filtering to select non-redundant entries when indicated

SD standard deviation, min minimum, max maximum, chr chromosome, bp base pair

aIn this column numbers and lengths are shown considering only the protein-coding portion of exons, including stop codons

bThese values were calculated excluding records corresponding to the last exon, which is usually the longest one, filtering for Excel rows not containing “Yes” in Last_Exon column