Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2022 Apr 12;50(8):4545–4556. doi: 10.1093/nar/gkac227

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

PMC Copyright notice

Figure 4. — Islands are partially predicted by coding density. (A) Distribution of gene sizes in islands (red) and deserts (blue) displayed in violin and box plots. ***P < 0.0001, Mann–Whitney-Wilcoxon test. (B) Size distribution of convergent, tandem, and divergent intergenic regions in islands and deserts. ***P < 0.0001, Mann–Whitney-Wilcoxon test. (C) Top: non-overlapping 5-kb genomic windows were assigned island or desert identity (see Materials and Methods) and the coding density of each window was calculated. Coding densities range from 0 (all base pairs in the window are intergenic sequence) to 1 (all base pairs overlap with annotated ORFs). Top: histograms showing the number of windows with island or desert identity as a function of coding density. Bottom: Fitted curve for the data shown on top. Coding density was chosen as a significant feature in training a logistic regression model (P < 0.0001). (D) ROC curve showing the specificity versus sensitivity after training a model using 80% of the data to predict axis and desert identity in the remaining 20% of the data. AUC = area under the curve. Diagonal indicates random association.