Skip to main content
. 2021 May 31;22:290. doi: 10.1186/s12859-021-04208-2

Fig. 2.

Fig. 2

Comparison of called gene borders to TAIR10 and Araport11. A Stacked barplot shows the counts of High Confidence (HC), Medium Confidence (MC) and Low Confidence (LC) genes which either: (a) have no overlap with any annotated gene (white); (b) have a unique mate in the annotation (i.e. overlap a single annotated gene which in turn does not have any other overlaps among the called genes; dark grey); (c) have multiple matches in the annotation (light grey). Only overlaps on the same strand are considered valid. Only the uniquely matched pairs of genes were used on the next subfigures. B Distribution of pairwise overlap values between called HC (red), MC (green) or LC genes (blue) and their unique mates in TAIR10 or Araport11. Y axis shows the number of gene pairs, X axis shows the overlap calculated as the ratio of intersection (common length) to union (total length) of the overlapping genomic intervals. Area under the curve is proportional to the number of gene pairs in the group. C Distribution of differences between 5' or 3' borders (left and right panels, respectively) of the matched gene pairs. Y axis shows the number of gene pairs, X axis shows the difference of genomic coordinates (in bp). A negative (positive) difference value means that the border of the called gene is located upstream (downstream) from the respective border of its mate in TAIR10 (upper panel) or Araport11 (lower panel). A narrow peak with summit at zero position on the X axis means that the gene pairs most often have identical positions of the borders. A smooth peak with multiple summits indicates high incidence of mismatched gene borders. Area under the curve is proportional to the number of gene pairs in the group. D Metagene profile of TSS-seq signal over 5' gene borders. The HC/TAIR10 and HC/Araport11 matched gene pairs were joined into HC/TAIR10/Araport11 triads. Thus, each gene has three alternative 5' borders predicted by TranscriptomeReconstructoR, TAIR10 and Araport11. Fixed length genomic intervals (50 bp) were centered on the 5' gene borders in each of the three groups. TSS-seq signal (which is proportional to TSS usage) was averaged among the genomic windows. Y axis shows the average sequencing coverage, X axis shows the genomic coordinates relative to the 5' gene border (zero corresponds to the predicted gene start). The color of the wiggle line indicates the origin of the genomic windows: blue for TAIR10, red for Araport11 and green for the called HC genes. E Metagene profile of Helicos 3'DRS-seq signal over 3' gene borders. Both TSS-seq (in panel D) and 3' DRS-seq (in panel E) demonstrate a sharp peak at the respective gene borders derived from TAIR10 and HC genes, but not from Araport11 genes. This indicates that gene borders predicted by Araport11 often disagree with the experimental evidence. Moreover, the HC peak (green) is substantially higher than the TAIR10 peak (blue) in both TSS-seq and 3' DRS-seq. This means that TSS and PAS positions predicted from the HC gene set are in a better agreement with the experimental data than the TAIR10