Skip to main content
. 2020 Dec;30(12):1789–1801. doi: 10.1101/gr.267997.120

Figure 1.

Figure 1.

The key components of VarNote. (A) Architecture of VarNote index system. Bgzip-compressed annotation database (.bgz file) will be first converted to VarNote positioning file (.vanno file). The system tailors and encodes information of each original compressed block in the annotation database (OB) to generate a reduced virtual block that only keeps query-dependent information (ROB). The bgzip-compressed VarNote positioning file contains concatenated compressed block that stores ROB bytes. Then, summary information of the reduced virtual block (SROB) is linearly indexed to generate VarNote index file (.vi file). (B) Bit encoding of record position information. The system uses an 8-bit “RecordFlag” to encode position information of annotation record in corresponding OB. The first bit represents a sign of annotation record start; the second through fourth bits encode storage size of beginning (Beg) offset from the previous record; the fifth and sixth bits encode storage size of distance between End and Beg for current record; the seventh bit is the direction sign of the block offset; the eighth bit indicates the storage size of the block offset to average. (C) Workflow of random sweep. The algorithm accepts two dummy query intervals in the same chromosome (query 1 starts from 120 and ends in 150; query 2 starts from 255 and ends in 260) and efficiently executes the annotation intersection across a corresponding chromosome by leveraging the VarNote index system and original annotation database. The query 1 is first stream-compared with position information of each SROB in the VarNote index file (.vi file) to determine intersected ROBs (query 1 overlaps with ROB2 and ROB3). The algorithm directly skips unrelated ROBs and quickly locates the intersected ROBs using random access (ROB1 is completely skipped in the following searching). Because ROB only contains query-dependent information of annotation records, VarNote can sweep the ROB more efficiently with saved disk reads (query 1 intersects an annotation record in the ROB2, and intersects another annotation record in the ROB3). Once all intersected annotation records within a ROB are identified, the algorithm can instantly seek full annotation information of record hits from the corresponding OB (only needs to seek two records in the OB2 and OB3 of annotation database for query 1). Similarly, VarNote skips over ROB4 and ROB5, only sweeps across ROB6, and finally seeks an annotation record at OB6 for query 2.