. 2018 Apr 30;9:1719. doi: 10.1038/s41467-018-03906-5

Table 2.

Table of Gene Ontology (GO) terms. GO terms relating to biological processes that were significantly enriched in the gene set associated with the major axes of variation in the principal components analysis

GO.ID	Term	Ann	Sig	Exp	KS	Gene	Loading #	Comp	Treat	Loading score
GO:1902358	Sulfate transmembrane transport	12	1	0.02	0.015 *	THAPS_23437 (hypothetical protein)	2	comp1	FS	0.0559
GO:0006357	Regulation of transcription from RNA polymerase	44	1	0.03	0.015 *	THAPSDRAFT_7370 (hypothetical protein)	27	comp1	FS	0.0483
GO:0001522	Pseudouridine synthesis	44	1	0.03	0.025 *	THAPS_23513 (hypothetical protein)	40	comp1	26	0.0472
GO:0034599	Cellular response to oxidative stress	23	1	0.03	0.029 *	THAPSDRAFT_25121 (hypothetical protein)	22	comp2	FS	0.0570
GO:0008152	Metabolic processes	77	1	0.03	0.035 *	THAPSDRAFT_23040 (hypothetical protein)	6	comp1	26	0.0533
GO:0045454	Cell redox homeostasis	77	1	0.03	0.035 *	THAPSDRAFT_2086 (hypothetical protein)	31	comp1	26	0.0478
GO:0000103	Sulfate assimilation	29	1	0.02	0.037 *	THAPSDRAFT_25121.2 (hypothetical protein)	22	comp2	FS	0.0570
GO:0016567	Protein ubiquitination	3706	5	4.75	0.254	THAPS_23513,THAPSDRAFT_2086,THAPSDRAFT_7370 (hypothetical protein)	Various	comp1 and comp2	FS and 26	Various

GO.ID is the Gene Ontology identifier retrieved from protists.ensembl.org, term is the biological process associated with the GO.ID, and Ann is for the number of genes annotated to the GO.ID. Sig and Exp denote the number of significant and expected annotations for the GO.ID category found in the dataset of genes that are associated with the top 100 PCA loadings for the two principal components displayed in Fig. 4a compared to the reference “gene universe” made up from the entire T. pseudonana genome. KS is the p value output of a Kolmogorov–Smirnov test, which replaces Fisher’s exact test when working with scores (see Methods) with p < 0.05 indicated by * for significant enrichment. Loading # indicates the position that the gene has in the PCA’s top 100 loadings, and comp indicates whether it is more strongly associated with the first or second principal component (see Fig. 4a). The loading score is a numerical value for the strength of the association with a component, where higher absolute values are indicative of a stronger association. Gene gives the locus tag of the gene that was found to be strongly associated with the axis of variation. All genes are coding for hypothetical proteins, and were assigned GO terms through the most likely function that the protein may have given its amino acid sequence. Treat denotes the treatment that SNVs in that gene were most likely to be associated with based on the PCA