[Preprint]. 2023 Aug 3:rs.3.rs-3219092. [Version 1] doi: 10.21203/rs.3.rs-3219092/v1

Table 3.

Comparison among sequence-based, structure-based and hybrid scoring, using learned log odds in corresponding variable(s), for variant effects in Spearman’s $ρ$ , AUROC, and AUPRC. Compared to conventional pLMs that only use learned distributions in sequence (amino acids or AA), our structure-informed pLMs here could use learned distributions in sequence, structural properties (secondary structures or SS, relative solvent accessibility or RSA, and contact map or CM), and both. Whereas only mutation positions are considered by default, versions with subscripts ‘env’ use all neighbor positions forming local environment of mutant positions. Boldfaced are the best performances.

Type	Variable(s)	Spearman’s $ρ$ ↑	auroc ↑	auprc ↑

Sequence	AA	.546	.800	.806

Structure (single property)	SS	.090	.540	.615
	SS_env	.095	.551	.609
	RSA	.081	.545	.611
	RSA_env	.084	.546	.599
	CM	.169	.592	.599

Structure (multi)	SS+RSA+CM	.158	.587	.590
Structure (multi)	SS_env+RSA_env+CM	.144	.574	.591

Sequence + Structure	AA+SS+RSA+CM	.552	.803	.792
	AA+CM	.556	.802	.794
	AA+SS_env+RSA_env+CM	.554	.799	.791