Table 1.
Field | Description |
---|---|
sequence_id | Identifier of the allele (either IMGT, or the name assigned by the submitter to an inferred gene) |
sequences | Overall number of sequences assigned to this allele |
closest_reference | For inferred alleles, the closest reference gene and allele, as inferred by the tool |
closest_host | For inferred alleles, the closest reference gene and allele that is in the subject's inferred genotype |
nt_diff | For inferred alleles, the number of nucleotides that differ between this sequence and the closest reference gene and allele |
nt_diff_host | For inferred alleles, the number of nucleotides that differ between this sequence and the closest reference gene and allele that is in the subject's inferred genotype |
nt_substitutions | For inferred alleles, comma-separated list of nucleotide substitutions (e.g. G112A) between the sequence and the closest reference gene and allele. IMGT numbering is used for V-genes, and number from start of coding sequence for D- or J- genes. |
aa_diff | For inferred alleles, the number of amino acids that differ between this sequence and the closest reference gene and allele |
aa_substitutions | For inferred alleles, the list of amino acid substitutions (e.g. A96N) between the sequence and the closest reference gene and allele. IMGT numbering is used for V-genes, and number from start of coding sequence for D- or J- genes. |
unmutated_sequences | The number of sequences exactly matching this unmutated sequence |
assigned_unmutated_frequency | The number of sequences exactly matching this allele divided by the number of sequences assigned to this allele, *100 |
unmutated_umis | The number of molecules (identified by Unique Molecular Identifiers) exactly matching this unmutated sequence (if UMIs were used) |
allelic_percentage | The number of sequences exactly matching the sequence of this allele divided by the number of sequences exactly matching any allele of this specific gene, *100 |
unmutated_frequency | The number of sequences exactly matching this sequence divided by the number of sequences exactly matching any allele of any gene, *100 |
unique_vs | The number of V allele calls (i.e. unique allelic sequences) found associated with this allele |
unique_ds | The number of D allele calls (i.e. unique allelic sequences) found associated with this allele |
unique_js | The number of J allele calls (i.e. unique allelic sequences) found associated with this allele |
unique_cdr3s | The number of unique CDR3s found associated with this allele |
unique_vs_unmutated | The number of V allele calls (i.e. unique allelic sequences) associated with unmutated sequences of this allele |
unique_ds_unmutated | The number of D allele calls (i.e. unique allelic sequences) associated with unmutated sequences of this allele |
unique_js_unmutated | The number of J allele calls (i.e. unique allelic sequences) associated with unmutated sequences of this allele |
unique_cdr3s_unmutated | The number of unique CDR3s associated with unmutated sequences of this allele |
haplotyping_gene | The gene or genes from which haplotyping was inferred, where haplotyping is possible (e.g. IGHJ6) |
Provision of statistics for each allele in the personalized genotype (both reference alleles and novel alleles) allows the novel inferences to be considered in the context of overall gene usage (usage frequency, exact unmutated matches, association with distinct CDR3 and so on), and also provides useful aggregate information on overall gene usage.