Skip to main content
. 2022 Apr 27;2(5):100129. doi: 10.1016/j.xgen.2022.100129

Summary table of the V2.0 GIAB genome stratifications

Stratification Group Description # Strats Example Stratifications Useful for
FunctionalRegions Coding regions 2 CDS, not in CDS Evaluating performance in coding regions more likely to be functional
GC-content Various ranges of GC-content 14 GC < 25%; 30% < GC < 55% identifying GC bias in variant calling performance
Low Complexity 22 evaluating performance in locally repetitive, difficult to sequence contexts
 Homopolymers Identification of homopolymers by length 4 Homopolymers >101 bp; imperfect homopolymers >10 bp evaluating performance in homopolymers, where systematic sequencing errors and complex variants frequently occur
 Simple Repeats Di, tri, and quad-nucleotide repeats of different lengths 9 Di-nucleotide repeats 11–50 bp; di-nucleotide repeats >200 bp evaluating performance in exact Short Tandem Repeats where systematic sequencing errors and complex variants frequently occur, and variant calls are challenging if the read length is insufficient to traverse the entire repeat
 Tandem Repeats Tandem repeats of different lengths 5 Tandem repeats between 51 and 200 bp; tandem repeats >10 kb evaluating performance in exact Short Tandem Repeats and Variable Number Tandem Repeats where systematic sequencing errors and complex variants frequently occur, and variant calls are challenging if the read length is insufficient to traverse the entire repeat
Other Difficult Various difficult regions of the genome 6 MHC; VDJ evaluating performance in or excluding regions where variants are difficult to call and represent due to limitations of the reference genome (e.g. gaps or errors) or being highly polymorphic in the population (MHC).
Segmental Duplications Segmental duplications defined using multiple methods and limited to segdups >10kb 9 Segdups >10 kb; selfChain Regions with multiple similar copies in the reference, making them challenging to map and assemble.
Genome Specific Difficult regions of the genome specific to one or more of the GIAB genomes. Including but not limited to complex variants, copy number variants, and structural variants. 65 CNVs, complex variants evaluating performance in or excluding regions in each GIAB reference sample where small variants can be challenging to call (e.g., complex variants) or represent (e.g., CNVs and SVs)

The updated stratification set includes the union of multiple stratifications as well as “not in” stratifications, which are useful in evaluating performance outside specific difficult genomic contexts.