Skip to main content
[Preprint]. 2023 Aug 3:rs.3.rs-3219092. [Version 1] doi: 10.21203/rs.3.rs-3219092/v1

Table 3.

Comparison among sequence-based, structure-based and hybrid scoring, using learned log odds in corresponding variable(s), for variant effects in Spearman’s ρ, AUROC, and AUPRC. Compared to conventional pLMs that only use learned distributions in sequence (amino acids or AA), our structure-informed pLMs here could use learned distributions in sequence, structural properties (secondary structures or SS, relative solvent accessibility or RSA, and contact map or CM), and both. Whereas only mutation positions are considered by default, versions with subscripts ‘env’ use all neighbor positions forming local environment of mutant positions. Boldfaced are the best performances.

Type Variable(s) Spearman’s ρ auroc ↑ auprc ↑

Sequence AA .546 .800 .806

Structure (single property) SS .090 .540 .615
SSenv .095 .551 .609
RSA .081 .545 .611
RSAenv .084 .546 .599
CM .169 .592 .599

Structure (multi) SS+RSA+CM .158 .587 .590
SSenv+RSAenv+CM .144 .574 .591

Sequence + Structure AA+SS+RSA+CM .552 .803 .792
AA+CM .556 .802 .794
AA+SSenv+RSAenv+CM .554 .799 .791