Skip to main content
. 2023 Oct 5;24:221. doi: 10.1186/s13059-023-03061-1

Table 1.

Chronological order of benchmark datasets for different variant types including point mutation, insertion, deletions, and structural variant for healthy and patient samples

Publication Title Project name Year Doi PMID Data Number of samples Technology Status Sample Cell Variants Reference included % Reference
A comprehensive catalogue of somatic mutations from a human cancer genome The catalogue of somatic mutations 2010 https://doi.org/10.1038/nature08658 20016485 Whole genome sequencing 1 sample (COLO-829) Illumina GAII Patient Somatic SNV and indel < 50 bp N/A NCBI36
A map of human genome variation from population-scale sequencing 1000 Genomes Project 2010 https://doi.org/10.1038/nature09534 20981092 Whole genome sequencing, exon-targeted sequencing 882 samples (low-coverage whole-genome sequencing of 179 individuals; high-coverage sequencing of two mother–father–child trios; exon-targeted sequencing of 697 individuals) 454 GS FLX, Illumina Genome Analyzer, and AB SOLiD System Healthy Germline SNV and indel < 50 bp 85 NCBI36
Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls GIAB v.2.19 2014 https://doi.org/10.1038/nbt.2835 24531798 Whole genome sequencing, exome sequencing 1 sample (NA12878, 11 whole-genome and 3 exome) 454, Complete Genomics, Illumina, Ion Torrent and SOLiD 4 Healthy Germline SNV and indel < 50 bp 77 GRCh37
svclassify: a method to establish benchmark structural variant calls svclassify 2016 https://doi.org/10.1186/s12864-016-2366-2 26772178 whole genome sequencing 1 sample (NA12878) Illumina HiSeq, Moleculo and PacBio Healthy Germline SV and indel < 50 bp N/A GRCh37
Extensive sequencing of seven human genomes to characterize benchmark reference materials GIAB Public Data 2016 https://doi.org/10.1038/sdata.2016.25 27271295 Whole genome sequencing 7 samples (HG001-7) 10xGenomics, BioNano, Complete Genomics (paired-end and LFR), GemCode WGS, Illumina (exome and WGS paired-end, mate-pair, and synthetic long reads), Ion Proton exome, ONT, PacBio, and SOLiD Healthy Germline SNV, indel, and SV N/A GRCh37
A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree Platinum Genomes 2017 http://dx.doi.org/10.1101/gr.210500.116 27903644 Whole genome sequencing 2 samples (2 individuals with benchmarks, but using short-read WGS from 11 children and 4 grandparents from CEPH pedigree 1463) Illumina Healthy Germline SNV and Indel < 50 bp 96.7 GRCh37
A synthetic-diploid benchmark for accurate variant calling evaluation CHM-eval, aka Syndip 2018 https://doi.org/10.1038/s41592-018-0054-7 30013044 Whole genome sequencing 2 samples (Synthetic mixture of two effectively haploid hydatidiform mole cell lines) PacBio CLR Haploid cell lines Germline SNV, indel > 1 bp, and SV 96 GRCh37 and GRCh38
An open resource for accurately benchmarking small variant and reference calls GIAB v.3.3.2 2019 https://doi.org/10.1038/s41587-019-0074-6 30936564 Whole genome sequencing 7 samples (HG001-7) 10 × Genomics, Illumina, Complete Genomics, Ion Torrent and SOLiD 4 Healthy Germline SNV and indel < 50 bp 85.4 GRCh37 and GRCh38
A robust benchmark for detection of germline large deletions and insertions NIST v0.6 SV benchmark set 2020 https://doi.org/10.1038/s41587-020-0538-8 32541955 Whole genome sequencing 1 sample (HG002) 10 × Genomics, Illumina, PacBio CLR, ONT Healthy Germline indel >  = 50 bp 86 GRCh37
A diploid assembly-based benchmark for variants in the major histocompatibility complex MHC benchmark 2020 https://doi.org/10.1038/s41467-020-18564-9 32963235 Whole genome sequencing 1 sample (HG002) 10 × Genomics, PacBio HiFi, and ONT Healthy Germline SNV and indel < 50 bp N/A GRCh37 and GRCh38
Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing SEQC2 Tumor-normal 2021 https://doi.org/10.1038/s41587-021-00993-6 34504347 Whole genome sequencing, exome sequencing 1 tumor/normal cell line pair 10 × Genomics, Illumina, Ion Torrent, and PacBio HiFi Patient Somatic SNV and indel < 50 bp N/A GRCh38
A verified genomic reference sample for assessing performance of cancer panels detecting small variants of low allele frequency SEQC2 Cancer panel 2021 https://doi.org/10.1186/s13059-021-02316-z 33863366 Targeted sequencing Mixed tumor cell lines Targeted Illumina Sequencing Patient Somatic SNV and indel N/A GRCh37 and GRCh38
Benchmarking challenging small variants with linked and long reads GIAB v.4.2.1 2022 https://doi.org/10.1016/j.xgen.2022.100128 36452119 Whole genome sequencing 7 samples (HG001-7) 10 × Genomics, Complete Genomics, Illumina, PacBio HiFi Healthy Germline SNV and indel < 50 bp 92.2 GRCh37 and GRCh38
Curated variation benchmarks for challenging medically relevant autosomal genes CMRG v1.00 2022 https://doi.org/10.1038/s41587-021-01158-1 35132260 Whole genome sequencing 1 sample (HG002) PacBio HiFi Healthy Germline SNV and SV N/A GRCh37 and GRCh38
A multi-platform reference for somatic structural variation detection Somatic SV truth set 2022 https://doi.org/10.1016/j.xgen.2022.100139 36778136 Whole genome sequencing 1 sample (COLO-829) 10xGenomics, Bionano, Illumina, ONT, PacBio Patient Somatic SV and indel N/A GRCh37 and GRCh38
Haplotype-resolved assemblies and variant benchmark of a Chinese Quartet Chinese Quartet 2022 https://doi.org/10.1101/2022.09.08.504083 N/A Whole genome sequencing Two monozygotic twin daughters and their biological parents Illumina, BGI, PacBio, and Oxford Nanopore Technology Healthy Germline SNVs, indels, and SVs N/A GRCh38