Skip to main content
. 2023 Nov 28;14:7805. doi: 10.1038/s41467-023-43651-y

Fig. 1. PhenoSV workflow.

Fig. 1

a SV annotation. A coding SV that is a deletion or a duplication, fully containing gene B and partially encompassing gene C, is segmented into a sequence of six genome segments, including two affected genes, two intergenic noncoding regions, and two zero-padding segments. A noncoding SV that is a deletion or a duplication can potentially affect gene A, B, and C based on distance or TAD annotations (triangle shaded area). The genomic segment sequence has three candidate target genes, five intergenic noncoding regions, a noncoding SV region, and two zero-padding segments. b SV interpretation. Annotated SV with the shape of 6 ×238 or 11 ×238 from (a) is fed into PhenoSV architecture. Each MHA (multi-head attention) block has two types of attention heads to model indirect and direct effects on genes. The pathogenicity for overall SV (PhenoSV scores, psv) and individual genes (PhenoSV gene scores, psvgene) can be inferred from SV-level and gene-level embeddings, respectively. Prior phenotype information (HPO terms) can be further used to infer phenotype-related pathogenicity for overall SV (phenotype-aware PhenoSV scores, psvphen) and individual genes (phenotype-aware PhenoSV gene scores, psvgenephen).