Inverted Index1
|
|
|
Only for document search/retrieval, could be applied to genomic/sequence search |
Long construction time; impractical for bigger datasets; best case needs MPH and a known k-mer (term) distribution |
BIGSCI/COBS2,3
|
|
|
A hybrid between an inverted index and Bloom filters (COBS), high false positive rate, no benchmark with mutation rate by ANI/Mash |
Query time is linear in N, small index size |
Sequence Bloom Tree4
|
|
|
Given the k-mers from a query sequence, the task is to determine which of the N documents contain all the k-mers present in the query; no benchmark with mutation rate by ANI/Mash |
Sequential query process is bottleneck; designed for sequential implementation |
RAMBO5
|
|
|
Similar to SBT, finding which of the N documents/genomes contain all the k-mers present in the query, no benchmark with ANI/Mash |
Only for < 1, query time is sub-linear |
MinHash6
|
|
|
Average Nucleotide Identity (ANI) or mutation rate via Mash distance |
Query time is linear in N |
GSearch (MinHash-like + HNSW) |
|
|
Average Nucleotide Identity (ANI) via Mash-like mutation rate/index |
Long database construction time , but users are free from construction. |
FLINNG7
|
|
|
No Benchmark with Average Nucleotide Identity (ANI) via Mash-like mutation rate/index, only 15% of RefSeq genome meet the -stable criteria |
The -stable query condition is a relatively strong requirement for the query. Limitation: works for queries for which the neighbors are all above a (relatively high) similarity threshold to the query |