Table 2.
Table of Gene Ontology (GO) terms. GO terms relating to biological processes that were significantly enriched in the gene set associated with the major axes of variation in the principal components analysis
GO.ID | Term | Ann | Sig | Exp | KS | Gene | Loading # | Comp | Treat | Loading score |
---|---|---|---|---|---|---|---|---|---|---|
GO:1902358 | Sulfate transmembrane transport | 12 | 1 | 0.02 | 0.015 * | THAPS_23437 (hypothetical protein) | 2 | comp1 | FS | 0.0559 |
GO:0006357 | Regulation of transcription from RNA polymerase | 44 | 1 | 0.03 | 0.015 * | THAPSDRAFT_7370 (hypothetical protein) | 27 | comp1 | FS | 0.0483 |
GO:0001522 | Pseudouridine synthesis | 44 | 1 | 0.03 | 0.025 * | THAPS_23513 (hypothetical protein) | 40 | comp1 | 26 | 0.0472 |
GO:0034599 | Cellular response to oxidative stress | 23 | 1 | 0.03 | 0.029 * | THAPSDRAFT_25121 (hypothetical protein) | 22 | comp2 | FS | 0.0570 |
GO:0008152 | Metabolic processes | 77 | 1 | 0.03 | 0.035 * | THAPSDRAFT_23040 (hypothetical protein) | 6 | comp1 | 26 | 0.0533 |
GO:0045454 | Cell redox homeostasis | 77 | 1 | 0.03 | 0.035 * | THAPSDRAFT_2086 (hypothetical protein) | 31 | comp1 | 26 | 0.0478 |
GO:0000103 | Sulfate assimilation | 29 | 1 | 0.02 | 0.037 * | THAPSDRAFT_25121.2 (hypothetical protein) | 22 | comp2 | FS | 0.0570 |
GO:0016567 | Protein ubiquitination | 3706 | 5 | 4.75 | 0.254 | THAPS_23513,THAPSDRAFT_2086,THAPSDRAFT_7370 (hypothetical protein) | Various | comp1 and comp2 | FS and 26 | Various |
GO.ID is the Gene Ontology identifier retrieved from protists.ensembl.org, term is the biological process associated with the GO.ID, and Ann is for the number of genes annotated to the GO.ID. Sig and Exp denote the number of significant and expected annotations for the GO.ID category found in the dataset of genes that are associated with the top 100 PCA loadings for the two principal components displayed in Fig. 4a compared to the reference “gene universe” made up from the entire T. pseudonana genome. KS is the p value output of a Kolmogorov–Smirnov test, which replaces Fisher’s exact test when working with scores (see Methods) with p < 0.05 indicated by * for significant enrichment. Loading # indicates the position that the gene has in the PCA’s top 100 loadings, and comp indicates whether it is more strongly associated with the first or second principal component (see Fig. 4a). The loading score is a numerical value for the strength of the association with a component, where higher absolute values are indicative of a stronger association. Gene gives the locus tag of the gene that was found to be strongly associated with the axis of variation. All genes are coding for hypothetical proteins, and were assigned GO terms through the most likely function that the protein may have given its amino acid sequence. Treat denotes the treatment that SNVs in that gene were most likely to be associated with based on the PCA