A. Allele frequency in healthy subjects |
1000 genomes project |
Database of 2,504 genomes sequenced from healthy subjects |
Rare variants are more likely to have larger impact or pathogenic effects compared to common ones. We should take note that the criteria of healthy is varies among databases |
Exome sequencing project 6500 |
Database of 6,503 exomes sequenced from healthy subjects |
ExAc |
Datasets of 60,706 exomes sequenced from unrelated healthy subjects |
GnomAD |
Datasets of 125,748 WES and 15,708 WGS from unrelated healthy subjects are available |
B. Inheritance pattern |
De novo |
Newly arising mutations in patients. |
De novo variants are more likely to be penetrant compared to inherited ones. The impact of maternally-inherited variants could be underestimated because of the female protective effect in ASD. |
Inherited |
Mutations inherited from father or mother to patients |
C. Types of variants |
Indel |
Small insertions or deletions of bases |
Nonsense, stoploss, splicing site mutations and indels are most likely to impact protein function. On the other hand, only subset of missense variants will impact protein function. Synonymous mutations do not alter amino acid sequence or protein function. |
Nonsense |
Mutations causing protein-truncation |
Stoploss |
Mutations disrupting the stop codon resulting in abnormal extention of proteins |
Missense |
Mutations causing a change to the amino acid |
Splicing site |
Mutations affecting the splicing sites possibly causing mis-splicing |
Synonymous |
Mutation which don't alter the amino acid sequence |
D. Genetic intorelance |
pLi |
A gene score of the probability of loss-of-function intolerance determined by the number of observed variants and that of expected variants. |
Mutations in intorerant genes are more likely to be deleterious |
RVIS |
A gene intolerance score determined by the number of observed nonsynonymous variants and that of synonymous variants |
E. in silico tools to predict the impact of SNVs |
SIFT |
A prediction tool of the SNV impact based on the evolutional conservation of the protein's amino acid sequence |
These tools score human variants and are usuful to estimate how deleterious a given variants will be to protein function. All of them can be applied to predict the impact of variants with amino-acid substitutions. CADD can be also used for indels. |
PolyPhen2 |
A prediction tool of SNV impact based on the protein sequence and structure. |
CADD |
A prediction tool of the impact of SNVs and short indels. It is an integrative metric built from diverse genetic features such as evolutionary constraint, epigenetic status and the score of other prediction tools including SIFT and PolyPhen2. |