Skip to main content
. 2018 Apr 30;9:1719. doi: 10.1038/s41467-018-03906-5

Table 2.

Table of Gene Ontology (GO) terms. GO terms relating to biological processes that were significantly enriched in the gene set associated with the major axes of variation in the principal components analysis

GO.ID Term Ann Sig Exp KS Gene Loading # Comp Treat Loading score
GO:1902358 Sulfate transmembrane transport 12 1 0.02 0.015 * THAPS_23437 (hypothetical protein) 2 comp1 FS 0.0559
GO:0006357 Regulation of transcription from RNA polymerase 44 1 0.03 0.015 * THAPSDRAFT_7370 (hypothetical protein) 27 comp1 FS 0.0483
GO:0001522 Pseudouridine synthesis 44 1 0.03 0.025 * THAPS_23513 (hypothetical protein) 40 comp1 26 0.0472
GO:0034599 Cellular response to oxidative stress 23 1 0.03 0.029 * THAPSDRAFT_25121 (hypothetical protein) 22 comp2 FS 0.0570
GO:0008152 Metabolic processes 77 1 0.03 0.035 * THAPSDRAFT_23040 (hypothetical protein) 6 comp1 26 0.0533
GO:0045454 Cell redox homeostasis 77 1 0.03 0.035 * THAPSDRAFT_2086 (hypothetical protein) 31 comp1 26 0.0478
GO:0000103 Sulfate assimilation 29 1 0.02 0.037 * THAPSDRAFT_25121.2 (hypothetical protein) 22 comp2 FS 0.0570
GO:0016567 Protein ubiquitination 3706 5 4.75 0.254 THAPS_23513,THAPSDRAFT_2086,THAPSDRAFT_7370 (hypothetical protein) Various comp1 and comp2 FS and 26 Various

GO.ID is the Gene Ontology identifier retrieved from protists.ensembl.org, term is the biological process associated with the GO.ID, and Ann is for the number of genes annotated to the GO.ID. Sig and Exp denote the number of significant and expected annotations for the GO.ID category found in the dataset of genes that are associated with the top 100 PCA loadings for the two principal components displayed in Fig. 4a compared to the reference “gene universe” made up from the entire T. pseudonana genome. KS is the p value output of a Kolmogorov–Smirnov test, which replaces Fisher’s exact test when working with scores (see Methods) with p < 0.05 indicated by * for significant enrichment. Loading # indicates the position that the gene has in the PCA’s top 100 loadings, and comp indicates whether it is more strongly associated with the first or second principal component (see Fig. 4a). The loading score is a numerical value for the strength of the association with a component, where higher absolute values are indicative of a stronger association. Gene gives the locus tag of the gene that was found to be strongly associated with the axis of variation. All genes are coding for hypothetical proteins, and were assigned GO terms through the most likely function that the protein may have given its amino acid sequence. Treat denotes the treatment that SNVs in that gene were most likely to be associated with based on the PCA