Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2011 Jan;21(1):137–145. doi: 10.1101/gr.111278.110

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

Copyright © 2011 by Cold Spring Harbor Laboratory Press

PMC Copyright notice

Figure 1. — The RepeatNet algorithm. (A) The layout of the alphoid repeat array in the centromere and the paired-end inserts in the centromeric region is shown. Note that since the centromere is larger than the inserts (fosmids, plasmids, BACs, or short inserts used in next-generation sequencing), both ends of the same insert contain alphoid sequence. (B) Close-up view of a paired-end insert over the alphoid repeat array. We also show all possible k-mers (sliding by 1 bp) that can be generated from the reads. (C) The ideal case for the k-mer structure in the end sequences. When both ends of a paired-end insert contain alphoid sequence, we expect that the k-mers in the forward end will be represented with their reverse-complement counterparts in the reverse end. For simplicity, we show only the nonoverlapping k-mers; however, RepeatNet considers all possible overlapping k-mers. In this figure, w₁-m₁, w₂-m₂, w₃-m₃, w′₁-m′₁, w′₂-m′₂, w′₃-m′_3, w′′₁-m′′₁, w′′₂-m′′₂, w′′₃-m′′₃ are the k-mer pairs that are reverse complements of each other, and the triplet k-mer groups (w₁-w′₁-w′′₁), (w₂-w′₂-w′′₂), (w₃-w′₃-w′′₃) are highly similar k-mers. In the case of exact repeats, these k-mers are identical. (D) Since k-mer pairs w₁-m₁, w₂-m₂, and w₃-m₃ exist in the same read pairs, we put an edge between the nodes that represent such k-mers. (E) The repeat graph for the ideal case of a 31-mer tandem repeat with exact repeat units is shown. This graph includes 20 vertices for 20 k-mer pairs that can be generated from a 31-mer repeat structure, and there exists an edge between all pairs of k-mers. Note that this graph is a clique of size 20. For non-ideal cases, the clique property will be lost; however, the graph will still be very dense in terms of the average degree of the vertices. RepeatNet finds such dense subgraphs of the repeat graph with a heuristic that selects the vertex with the highest degree, and other vertices that share an edge with this selected vertex. Alternatively, a maximum density subgraph algorithm can be used (Fratkin et al. 2006), though this algorithm has a high running time complexity of O[n.m.log(n²m)].