Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2020 Dec 18;9:e62208. doi: 10.7554/eLife.62208

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© 2020, Cook et al

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

PMC Copyright notice

Figure 6. — (A) Two machine learning algorithms, Stochastic Gradient Boosting (GBM) and Random Forest (RF), were used to predict Lineage-Specific (LS) regions from 15 independent training-test splits (80/20). Classifier performance was measured for each of the 15 trials, and summarized as a boxplot with each trial represented as a point. (B) Venn diagram showing the overlap between the results of the two classifiers and the original observations of LS regions (de Jonge et al., 2013; Faino et al., 2016). Each slice of the diagram shows the number of LS regions predicted, see Materials and methods for additional details. (C) Schematic representation of the eight chromosomes (labeled on right) of V. dahliae strain JR2. Core (gray) and LS (green) classification for 10 kb windows. The consensus predictions were those made by both the GBM and RF model (in total 280). (D) Boxplot showing a significant difference for in planta gene induction between core and LS genes, Mann-Whitney U test p-value=1.34e-50. (E) Density distribution for core (gray) and LS (orange) elements based on absence counts over 100 bp windows. The mean absence counts are shown as a dashed vertical line. (F) Similar to E but the analysis was conducted for TEs. (G) Boxplot showing no significant difference between core and LS TE elements for absence counts, Mann-Whitney U test p-value=0.92. (H) Similar to E but the analysis was conducted for genes. (I) Boxplot showing a significant difference between core and LS genes for absence counts, Mann-Whitney U test p-value=3.82e-104. ns, non-significant; **** p-value<1.00e-4.

Figure 6—source data 1. Consensus LS classification genomic regions.

elife-62208-fig6-data1.txt^{(1.9KB, txt)}

Figure 6—source data 2. Gene presence and absence counts.

elife-62208-fig6-data2.update.txt^{(977.3KB, txt)}

Figure 6—source data 3. TE presence and absence counts.

elife-62208-fig6-data3.update.txt^{(234.1KB, txt)}

Figure 6—figure supplement 1. — (A) Two machine learning algorithms, Stochastic Gradient Boosting (GBM) and Random Forest (RF), were used to predict Lineage-Specific (LS) regions from 15 independent training-test splits (80/20). Classifier performance was measured for each of the 15 trials, and summarized as a boxplot with each trial represented as a point. (B) Venn diagram showing the overlap between the results of the two classifiers and the original observations of LS regions (de Jonge et al., 2013; Faino et al., 2016). Each slice of the diagram shows the number of LS regions predicted, see Materials and methods for additional details. (C) Schematic representation of the eight chromosomes (labeled on right) of V. dahliae strain JR2. Core (gray) and LS (green) classification for 10 kb windows. The consensus predictions were those made by both the GBM and RF model (in total 280). (D) Boxplot showing a significant difference for in planta gene induction between core and LS genes, Mann-Whitney U test p-value=1.34e-50. (E) Density distribution for core (gray) and LS (orange) elements based on absence counts over 100 bp windows. The mean absence counts are shown as a dashed vertical line. (F) Similar to E but the analysis was conducted for TEs. (G) Boxplot showing no significant difference between core and LS TE elements for absence counts, Mann-Whitney U test p-value=0.92. (H) Similar to E but the analysis was conducted for genes. (I) Boxplot showing a significant difference between core and LS genes for absence counts, Mann-Whitney U test p-value=3.82e-104. ns, non-significant; **** p-value<1.00e-4.

Figure 6—source data 1. Consensus LS classification genomic regions.

elife-62208-fig6-data1.txt^{(1.9KB, txt)}

Figure 6—source data 2. Gene presence and absence counts.

elife-62208-fig6-data2.update.txt^{(977.3KB, txt)}

Figure 6—source data 3. TE presence and absence counts.

elife-62208-fig6-data3.update.txt^{(234.1KB, txt)}