Low-density hashing with even coverage. (a) Random projections onto subspaces (left) cover all positions evenly only in expectation, and for small numbers of hash functions, will give uneven coverage. Using Gallager-inspired LDPC codes allows us to guarantee even coverage of all positions in the -mer (right) with a small number of hash functions. (b) Intuitively, one can think of a (, )-hash function as a 0/1 vector of length with 1’s specifying the locations in the -mer that are selected. Given any (, )-hash function h (e.g. the vector with 1’s followed by - 0’s), one can uniformly randomly construct another (, )-hash function by permuting the entries of . The key to the Opal’s Gallager-inspired LSH design is that instead of starting with a single hash function and permuting it repeatedly, we start with a hash function matrix which is a LDPC matrix. is designed such that in the first row , the first entries are 1, in the second row , the second entries are 1 and so on, until each column of has exactly one 1. Permuting the columns of repeatedly generates random LSH functions that together cover all positions evenly, ensuring that we do not waste coding capacity on any particular position in the -mer. Additionally, for very long -mers, we can construct the Gallager LSH functions in a hierarchical way to further capture compositional dependencies from both local and global contexts (see Section 2). (c) The rows of are then used as hash functions