Skip to main content
. 2019 Sep 30;48(D1):D964–D970. doi: 10.1093/nar/gkz822

Table 1.

Information provided in the OGRDB standardized genotype

Field Description
sequence_id Identifier of the allele (either IMGT, or the name assigned by the submitter to an inferred gene)
sequences Overall number of sequences assigned to this allele
closest_reference For inferred alleles, the closest reference gene and allele, as inferred by the tool
closest_host For inferred alleles, the closest reference gene and allele that is in the subject's inferred genotype
nt_diff For inferred alleles, the number of nucleotides that differ between this sequence and the closest reference gene and allele
nt_diff_host For inferred alleles, the number of nucleotides that differ between this sequence and the closest reference gene and allele that is in the subject's inferred genotype
nt_substitutions For inferred alleles, comma-separated list of nucleotide substitutions (e.g. G112A) between the sequence and the closest reference gene and allele. IMGT numbering is used for V-genes, and number from start of coding sequence for D- or J- genes.
aa_diff For inferred alleles, the number of amino acids that differ between this sequence and the closest reference gene and allele
aa_substitutions For inferred alleles, the list of amino acid substitutions (e.g. A96N) between the sequence and the closest reference gene and allele. IMGT numbering is used for V-genes, and number from start of coding sequence for D- or J- genes.
unmutated_sequences The number of sequences exactly matching this unmutated sequence
assigned_unmutated_frequency The number of sequences exactly matching this allele divided by the number of sequences assigned to this allele, *100
unmutated_umis The number of molecules (identified by Unique Molecular Identifiers) exactly matching this unmutated sequence (if UMIs were used)
allelic_percentage The number of sequences exactly matching the sequence of this allele divided by the number of sequences exactly matching any allele of this specific gene, *100
unmutated_frequency The number of sequences exactly matching this sequence divided by the number of sequences exactly matching any allele of any gene, *100
unique_vs The number of V allele calls (i.e. unique allelic sequences) found associated with this allele
unique_ds The number of D allele calls (i.e. unique allelic sequences) found associated with this allele
unique_js The number of J allele calls (i.e. unique allelic sequences) found associated with this allele
unique_cdr3s The number of unique CDR3s found associated with this allele
unique_vs_unmutated The number of V allele calls (i.e. unique allelic sequences) associated with unmutated sequences of this allele
unique_ds_unmutated The number of D allele calls (i.e. unique allelic sequences) associated with unmutated sequences of this allele
unique_js_unmutated The number of J allele calls (i.e. unique allelic sequences) associated with unmutated sequences of this allele
unique_cdr3s_unmutated The number of unique CDR3s associated with unmutated sequences of this allele
haplotyping_gene The gene or genes from which haplotyping was inferred, where haplotyping is possible (e.g. IGHJ6)

Provision of statistics for each allele in the personalized genotype (both reference alleles and novel alleles) allows the novel inferences to be considered in the context of overall gene usage (usage frequency, exact unmutated matches, association with distinct CDR3 and so on), and also provides useful aggregate information on overall gene usage.