Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2021 Jun 24;184(13):3376–3393.e17. doi: 10.1016/j.cell.2021.05.002

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© 2021 The Author(s)

This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

PMC Copyright notice

Microbial signatures

(A) Schematic of GeoDNA representation generation. Raw sequences of individual samples for all cities are transformed into lists of unique k-mers (left). After filtration, the k-mers are assembled into a graph index database. Each k-mer is then associated with its respective city label and other informative metadata, such as geo-location and sampling information (top middle). Arbitrary input sequences (top right) can then be efficiently queried against the index, returning a ranked list of matching paths in the graph together with metadata and a score indicating the percentage of k-mer identity (bottom right). The geo-information of each sample is used to highlight the locations of samples that contain sequences identical or close to the queried sequence (middle right).

(B) Classification accuracy of a random forest model for assigning city labels to samples as a function of the size of the training set.

(C) Distribution of endemicity scores (term frequency inverse document frequency) for taxa in each region.

(D) Prediction accuracy of a random forest model for a given feature (rows) in samples from a city (columns) that were not present in the training set. Rows and columns are sorted by average accuracy. Continuous features (e.g., population) were discretized.

See also Figure S4.