Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2021 Mar 25;12:614957. doi: 10.3389/fmicb.2021.614957

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

Copyright © 2021 Volpiano, Sant’Anna, Ambrosini, de São José, Beneduzi, Whitman, de Souza, Lisboa, Vargas and Passaglia.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

PMC Copyright notice

Graph-based clustering using ProKlust compared to hierarchical clustering. (A) The average of each pair from the pairwise input. matrix/matrices is/are obtained. A Boolean matrix/matrices is/are obtained according to the cut-off values chosen by the user. If more than one matrix is used as input, the final generated matrix is obtained by multiplying the elements of the matrices. A graph is formed by connecting the nodes which present the positive values. In this example, nodes correspond to genomes and edges correspond to ANI ≥95% with coverage alignment ≥50%. The data could be filtered to retain components containing more than one species name or unconnected nodes containing the same species names. In addition, filters to remove isolated nodes (“filterRemoveIsolated”) or the largest component (“filterOnlyLargerComponent”) are also available. The tool generates four types of outputs: (i) the maximal cliques on “maxCliques,” which is the largest subset of nodes in which each node is directly connected to every other node in the subset i.e., all the possible species groups that could be delimited in the graph, which could result in groups having genomes in common; (ii) “components” that contains the isolated nodes or groups formed of complete graphs; (iii) “graph,” an igraph object graph, that can be further handled by the user; and (iv) the “plot,” where the final graph could be visualized. (B) Overview of the hierarchical-based clustering approach. These approaches return tree-shaped diagrams with non-overlapping clusters.