Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Dec 13.
Published in final edited form as: Nat Struct Mol Biol. 2025 Jun 13;32(10):2099–2111. doi: 10.1038/s41594-025-01582-w

Multiplex and multimodal mapping of variant effects in secreted proteins via MultiSTEP

Nicholas A Popp 1,2,3, Rachel L Powell 1, Melinda K Wheelock 1,3, Kristen J Holmes 4,5,6, Brendan D Zapp 1, Kathryn M Sheldon 4,5,6, Shelley N Fletcher 7, Xiaoping Wu 7,8, Shawn Fayer 1,3, Alan F Rubin 9,10, Kerry W Lannert 4,5,6, Alexis T Chang 1, John P Sheehan 11,12, Jill M Johnsen 4,5,6,7,13,14, Douglas M Fowler 1,3,14,15
PMCID: PMC12373428  NIHMSID: NIHMS2100663  PMID: 40514537

Abstract

Despite widespread advances in DNA sequencing, the functional consequences of most genetic variants remain poorly understood. Multiplexed Assays of Variant Effect (MAVEs) can measure the function of variants at scale but cannot readily be applied to the ~10% of human genes encoding secreted proteins. Here, we develop a flexible, scalable human cell surface display method, Multiplexed Surface Tethering of Extracellular Proteins (MultiSTEP), to study the consequences of missense variation in coagulation factor IX (FIX), a serine protease where genetic variation can cause hemophilia B. We combine MultiSTEP with a panel of antibodies to detect FIX secretion and post-translational modification (PTM), measuring 44,816 variant effects for 436 synonymous variants and 8,528 of the 8,759 possible F9 missense variants. Almost half of missense variants impact secretion, PTM, or both. We also identify functional constraints on secretion within the signal peptide and for nearly all variants that caused gain or loss of cysteine. Secretion scores correlate strongly with FIX levels in hemophilia B and reveal that loss of secretion variants are more often associated with severe disease. Integration of the secretion and PTM scores enables reclassification of 63.1% of F9 variants of uncertain significance in the My Life, Our Future hemophilia genotyping project. Lastly, we show that MultiSTEP can be applied to other secreted proteins, thus demonstrating that MultiSTEP is a multiplexed, multimodal, and generalizable method for systematically assessing variant effects in secreted proteins at scale.

Introduction

Genome sequencing has revealed a rapidly expanding universe of human genetic variants, most with unknown disease consequences1. Variants with insufficient evidence to interpret pathogenicity are termed Variants of Uncertain Significance (VUS)2. Functional evidence derived from Multiplexed Assays of Variant Effect (MAVEs) can improve interpretation of VUS3. MAVEs combine a large library of genetic variants with selection for function and high-throughput DNA sequencing to characterize the functional effects of tens of thousands of variants simultaneously4. However, in MAVEs, every variant protein must remain linked with its cognate variant DNA sequence. Existing yeast, phage, and molecular display methods can satisfy this requirement but generally do not work well for human secreted proteins. Approximately 10% of human genes encode secreted proteins, and the number of VUS in these genes is rapidly growing, so a platform for applying MAVEs to human secreted proteins is urgently needed5 (Fig. 1a,b).

Figure 1: MultiSTEP enables at-scale measurement of variant effects in secreted proteins.

Figure 1:

a. Secreted proteins (purple) make up approximately 10% of the human proteome5. b. Cumulative number of missense variants in secreted proteins deposited in ClinVar from 2016 to 2023, colored by clinical interpretation. c. MultiSTEP retains secreted proteins on the cell surface, establishing a physical link between genotype and phenotype (left panel). Cells expressing a library of variants of the target protein are sorted into bins based upon intensity of fluorescent antibody binding, followed by deep sequencing to derive a functional score for each individual variant (middle panels). The result is a variant effect map (right panel). d. MultiSTEP design. Secreted protein coding sequences (pink) are cloned into an attB-containing landing pad donor plasmid. Secreted proteins are engineered to have C-terminally fused (GGGGS)2 flexible linkers (L1 and L2, teal) attached to a single pass transmembrane domain (TMD, blue). In between the linkers is a strep II epitope tag for surface detection (green). The construct contains an IRES (purple) driving co-transcription of an mCherry fluorophore (red) that serves as a transcriptional control. e-g. Flow cytometry of known well-secreted (p.A37T, p.S220T, WT) and poorly-secreted (p.C28Y) FIX variants displayed using MultiSTEP (n ~30,000 cells per variant). Unrecombined cells do not display FIX and serve as a negative control. Fluorescent signal was generated by staining the library with either an anti-FIX heavy chain antibody (e), an anti-FIX light chain antibody (f), or an anti-strep II tag antibody (g).

Normal secretion and post-translational γ-carboxylation of coagulation factor IX (FIX), encoded by the F9 gene, are necessary to stop bleeding and achieve hemostasis610. Genetic variation leading to loss of FIX function causes the bleeding disorder hemophilia B11,12. F9 clinical genotyping is recommended for all individuals who have hemophilia B or are at risk to inherit a hemophilia B-causing variant13,14. Some pathogenic FIX variants are known to impact secretion, while others impact FIX γ-carboxylation or activity15. However, the effect of most of the 8,759 possible FIX missense variants is unknown. This knowledge gap is evident in the My Life, Our Future (MLOF) U.S. national hemophilia clinical genotyping program, where nearly half of missense variants suspected to cause hemophilia B are VUS12,16.

To investigate the impact of variants on FIX secretion and post-translational modification, we developed Multiplexed Surface Tethering of Extracellular Proteins (MultiSTEP), which combines multiplexed human cell surface display with a genomically-encoded landing pad to measure the effects of secreted protein variants at scale17. We generated secretion and post-translational γ-carboxylation scores for nearly all 8,759 possible FIX missense variants using a panel of antibodies. The resulting 44,816 variant effect measurements revealed that 49.6% (n = 4,234/8,528) of FIX missense variants caused loss of FIX function, with 39.5% impacting secretion, and 10.1% impacting Gla-domain γ-carboxylation or conformation. Unlike for intracellular proteins, nearly all FIX missense variants resulting in loss or gain of cysteines diminished FIX secretion, making them the most damaging substitution. Secretion scores correlated strongly with FIX levels in hemophilia B and revealed that loss of secretion variants are particularly common in severe disease. Model-based integration of FIX secretion and post-translational modification scores enabled reclassification of 63.1% of F9 VUS in the MLOF hemophilia genotyping project. Lastly, we demonstrated that MultiSTEP can be applied to six other clinically important secreted proteins.

Results

Multiplexed human cell display of secreted proteins

MultiSTEP is a human cell protein display method designed to simultaneously measure the effects of a large number of secreted protein variants (Fig. 1c). MultiSTEP combines 293-F suspension cells with a genomically-integrated landing pad to drive expression of a single variant protein per cell17 (Supplementary Fig. 1). Unlike other mammalian display systems, the target protein’s endogenous secretion signal peptide is used, ensuring proper processing and allowing characterization of signal peptide variants (Fig. 1d). An mCherry fluorescent transcriptional control is expressed via an internal ribosomal entry site (IRES)17. The C-terminus of the target protein is fused to a CD28 single-pass transmembrane domain via a flexible linker containing a strep II tag orienting the target protein towards the outer surface of the cell. We chose the CD28 transmembrane domain because it has been used for cell surface expression of chimeric antigen T cell receptors, is compatible with the strep II tag, and is thought to be inert1820.

Using MultiSTEP, we expressed and detected cell surface-displayed wildtype (WT) FIX using antibodies directed against the strep II tag, the FIX heavy chain, or the FIX light chain (Fig. 1eg). Cell surface expression of p.C28Y, which reduces signal peptide cleavage efficiency and secretion, was profoundly decreased21 (Fig. 1eg). p.A37T and p.S220T, neither of which impact FIX secretion, were present on the cell surface at levels comparable to WT1,15 (Fig. 1eg). Thus, MultiSTEP enabled FIX display and could distinguish the effect of variants on secretion.

We measured the secretion of 8,528 of the 8,759 possible missense F9 variants using these three antibodies (Supplementary Table 1). We computed secretion scores for each antibody by averaging replicates (mean Pearson’s r = 0.95), normalizing such that a score of 1 indicated wild type-like secretion and a score of 0 equaled the median of the lowest-scoring 5% of missense variants (Fig. 2ac, Extended Data Fig. 2a, Supplementary Fig. 1, Supplementary Fig. 2). Synonymous variants had secretion scores similar to WT, and missense variants were distributed bimodally, similar to previous cytoplasmic protein variant abundance measurements2224 (Fig. 2d,e, Extended Data Fig. 2b). Individually tested variant antibody binding strongly correlated with MultiSTEP-derived secretion scores for seven randomly selected variants spanning the secretion score range (Fig. 2f,g, Extended Data Fig. 2c, Extended Data Fig. 3a). We also measured secretion of p.C28Y, p.A37T, p.S220T, and five additional FIX variants without the C-terminal linker and transmembrane domain and confirmed that MultiSTEP variant secretion scores reflected conventional FIX secretion measurements (Pearson’s r = 0.93, Extended Data Fig. 3b). Lastly, MultiSTEP secretion scores correlated with displayed FIX levels in adherent HEK-293T cells following non-enzymatic cell dissociation (Pearson’s r = 0.9, Extended Data Fig. 3c).

Figure 2: 17,927 MultiSTEP-derived secretion scores for 8,964 factor IX variants.

Figure 2:

a. Factor IX domain and chain architecture. Signal: Signal peptide. Pro: Propeptide. Gla: Gla domain. EGF1: Epidermal growth-like factor 1 domain. EGF2: Epidermal growth-like factor 2 domain. Activation: Activation peptide. Protease: Serine protease domain. b-c. Heatmaps showing FIX heavy chain secretion scores (n = 3 replicates) (b) or FIX light chain secretion scores (n = 2 replicates) (c) for missense FIX variants. Heatmap color indicates secretion score from 0 (blue, lowest 5% of scores) to white (1, WT) to red (increased scores). Black dots indicate the WT amino acid. Missing data are colored gray. d-e. Density distributions of heavy chain (d) or light chain (e) secretion scores for FIX missense (orange) and synonymous (blue) variants. Dashed line denotes the 5th percentile of the synonymous variant distribution. f-g. Scatter plots comparing MultiSTEP-derived heavy chain (f) or light chain (g) secretion scores for seven different FIX variants (p.C28Y, p.A37T, p.G58E, p.E67K, p.C134R, p.S220T, and p.H267L), WT, and an unrecombined negative control to the geometric mean of Alexa Fluor-647 fluorescence measured using flow cytometry individually (n = 3 replicates). Error bars show standard error of the mean. The p.E67K missense variant is not present in (g). h. Scatter plot of median MultiSTEP-derived heavy chain (n = 3 replicates) and light chain (n = 2 replicates) secretion scores at each position in FIX. Points are colored by chain architecture, using the same color scheme as (a). Black dashed line indicates perfect correlation between secretion scores. Pearson’s correlation coefficient is shown. Gray background indicates <0.3 point deviation from perfect correlation. Points with median positional scores outside gray background are labeled with their corresponding FIX position. i. AlphaFold2 model of mature, two-chain FIX (positions 47–191 and 227–461). Putative FIX heavy chain (purple) or light chain (green) epitope positions shown as colored surfaces. j. Magnified view of the putative light chain epitope within the EGF1 domain (orange). Points colored in concordance with (h). k. Magnified view of the putative heavy chain epitope within the FIX serine protease domain (yellow). Points colored in concordance with (h).

The heavy chain, light chain, and strep tag antibody-derived secretion scores were nearly identical at most positions, though the strep tag demonstrated a wider distribution of scores for synonymous variants (Fig. 2h, Extended Data Fig. 2df). However, 10 positions in the heavy chain had decreased median heavy chain secretion scores as compared to light chain scores and 10 positions in the light chain had decreased median light chain scores as compared to heavy chain scores (Fig. 2h). These two sets of positions occupy discrete, solvent-accessible sites on the corresponding FIX chains, strongly suggesting that these regions comprise each antibody’s epitope (Fig. 2ik). We surveyed all positions outside these light and heavy chain epitopes to identify other antibody-specific effects. Light chain residues within 9.15 angstroms of our identified epitope showed a small (<0.3 secretion score difference) decrease in light chain antibody secretion scores relative to heavy chain. The same effect was found for heavy chain residues within 5.71 angstroms of the heavy chain epitope (Extended Data Fig. 4ac). To prevent epitope effects from impacting subsequent variant effect analyses, we excluded scores from each antibody’s epitope positions.

MultiSTEP clarifies biochemical constraints on secretion

Signal peptides direct protein secretion but vary in length and sequence2527. Three distinct functional regions are conserved amongst human signal peptides: the N-region, which is weakly positively charged; the H-region, which contains the hydrophobic helix that binds to a signal recognition particle (SRP) to initiate translocation into the ER; and the C-region, which breaks the h-region helix and contains a canonical AxA signal peptide cleavage motif26,28. Variants had distinct effects in each region of the signal peptide (Fig. 3a). For example, positions p.A26 and p.C28 within the AxA motif tolerated substitution with small amino acids, consistent with these positions occupying shallow hydrophobic pockets near the active site of the signal peptidase28. Surprisingly, hydrophobic variants in the spacer of the AxA motif caused significant loss of secretion, potentially by extending the H-region hydrophobic helix and preventing cleavage26.

Figure 3: MultiSTEP reveals biochemical constraints on secretion.

Figure 3:

a. Predicted signal peptide regions for WT FIX from SignalP 6.0 (top). Heatmap shows FIX heavy chain secretion scores for signal peptide variants (bottom, n = 3 replicates). Heatmap color indicates secretion score. Black dots indicate the WT amino acid. Missing scores are gray. N: N-region; H: H-region; C: C-region. b. Comparison of MultiSTEP secretion scores with SignalP 6.0 (SP6) functional classification, grouped by signal peptide region. Violin plot shows distribution of points with an inset box plot representing the 25th, 50th, and 75th percentiles. Whiskers span the range of data. Dashed horizontal line is the 5th percentile of the synonymous secretion score distribution. Number of variants in each class is labeled above the violin plot. c. FIX cysteine positions colored by domain architecture (top). Sig: Signal peptide. Gla: Gla domain. EGF1: Epidermal growth-like factor 1 domain. EGF2: Epidermal growth-like factor 2 domain. Protease: Serine protease domain. Disulfide bridges in WT FIX are denoted by black connecting lines9,108110. Heatmap of FIX heavy chain secretion scores for loss-of-cysteine substitutions, colored as in (b) (bottom, n = 3 replicates). d. Mean (point) and standard error (error bars) of variant effect scores (FIX: MultiSTEP, all others: VAMP-seq) for all loss-of-cysteine substitutions for different proteins2224,34 (n = 1,031 variants). Bonferroni-corrected pairwise two-sided t-test p values are shown. e. Box plots representing the 25th, 50th, and 75th percentiles of secretion scores for all missense variants across all positions with the indicated WT amino acid (n = 8,528 variants). Whiskers span the range of data. f. Mean (point) and standard error (error bars) of variant effect scores (FIX: MultiSTEP, all others: VAMP-seq) for all gain-of-cysteine substitutions for different proteins (n = 1,404 variants). Bonferroni-corrected pairwise two-sided t-test p values are shown. g. Box plots representing the 5th, 25th, 50th, 75th, and 95th percentiles of secretion scores for all missense substitutions of the indicated variant amino acid across all positions (n = 8,528 variants). Whiskers span the range of data.

93.2% (n = 193/207) of variants predicted to be secreted by the signal peptide classifier SignalP 6.0 were also secreted in our assay, with few notable exceptions29 (e.g. p.E27P). However, 56.4% variants predicted to not to be secreted by SignalP were secreted normally in our assay, including p.M8T, a known benign variant12 (n = 172/305, Fig. 3b). The weakly positively charged N-region of the FIX signal peptide was largely tolerant of substitutions aside from negatively charged D and E, consistent with poor conservation of the N-region relative to the other signal peptide regions30 (Fig. 3a). Indeed, nearly all (87.8%, n = 72/82) N-region variants predicted to be poorly secreted by SignalP had WT-like secretion scores. 34.2% (n = 56/164) of H-domain and 35.6% (n = 21/59) of C-domain variants were similarly misclassified by SignalP (Fig. 3b). Thus, at least for FIX, SignalP has poor specificity to predict variant secretion.

Many secreted proteins require disulfide bonds to establish and maintain structure, and variants that remove important cysteines can cause disease3133. 22 of the 24 native FIX cysteines are disulfide-bonded (Fig. 3c). Most variants at WT cysteine positions dramatically reduced secretion, especially in the EGF1, EGF2, and serine protease domains (Fig. 3c). This detrimental impact of substitution on cysteines differs from the modest effects of substitution of WT cysteines in cytoplasmic and transmembrane proteins (Fig. 3d, Extended Data Fig. 5a). FIX WT cysteine positions were overwhelmingly the most intolerant of substitution compared to other positions (Fig. 3e).

Similarly, introduction of a novel cysteine was the most damaging substitution throughout FIX, unlike cytoplasmic and membrane-associated proteins where proline or tryptophan are generally most damaging2224,34 (Fig. 2bc, Fig. 3fg, Extended Data Fig. 5b). Cysteine substitutions were deleterious even within the highly flexible activation peptide where other substitutions did not appreciably impact secretion (Fig. 2ac). Thus, altering the number of cysteines in FIX is fundamentally detrimental to FIX secretion, likely through changes in disulfide bonding patterns or protein folding differences under oxidative stress.

Evolutionary conservation also corresponds with intolerance to substitution. While poorly conserved positions in FIX were generally tolerant of variation, variants at highly conserved positions were 5.46 times more likely to impact secretion35 (p < 2.2 ×10−16, two-sided Fisher’s exact test, Extended Data Fig. 6a,b). 19 variants at poorly conserved positions impacted secretion severely (secretion score < 0.1). 16 were in the signal peptide, which varies widely in sequence and length across proteins. Conversely, 87 variants at highly conserved positions had WT-like secretion scores, many of which are associated with FIX functions not captured by our secretion assay including PTMs, activation peptide cleavage, enzymatic activity, and partner binding. Thus, the effects of substitutions on FIX secretion are largely consistent with conservation.

MultiSTEP quantifies variant effects on PTMs

Twelve glutamates in the FIX Gla domain are γ-carboxylated, which is required for FIX activity. Some of these γ-carboxylated glutamates coordinate a calcium-dependent conformational change in the Gla domain, creating a three-helix structure that exposes the ω-loop69,36,37. We used two γ-carboxylation-sensitive, anti-Gla domain antibodies to interrogate the effect of FIX variants on γ-carboxylation. One is a FIX-specific antibody that recognizes γ-carboxylation-dependent exposure of the ω-loop38,39. Thus, binding of this antibody depends on γ-carboxylation and Gla domain conformation. The other is an antibody that interacts with γ-carboxylated glutamates in a conserved ExxxExC motif present in the Gla domain of multiple carboxylated proteins, including FIX40 (Extended Data Fig. 7a). As expected, WT FIX-displaying cells strongly bound both antibodies. Pre-incubation with warfarin, a drug that inhibits γ-carboxylation, eliminated binding of both γ-carboxylation-sensitive antibodies, confirming that they only detect γ-carboxylated FIX41 (Extended Data Fig. 7b,c).

We used the FIX-specific and Gla-motif γ-carboxylation-dependent antibodies to generate γ-carboxylation scores for nearly all possible FIX missense variants (Fig. 4ac). Missense variant γ-carboxylation scores were bimodally distributed and synonymous variants scored similar to WT for both antibodies (Fig. 4de). γ-carboxylation scores correlated with secretion scores except in the propeptide and Gla domain, where some variants impacted γ-carboxylation scores without affecting secretion (Pearson’s r = 0.85, Fig. 4f, Extended Data Fig. 7di). Thus, in our system, loss of FIX γ-carboxylation did not impact secretion (Extended Data Fig. 7f,i). Indeed, of the 1,154 missense variants in the propeptide and Gla domains that we assayed, 44.6% (n = 515) were associated with low carboxylation scores but normal secretion scores.

Figure 4: MultiSTEP enables measurement of variant effects on FIX post-translational modification.

Figure 4:

a. Factor IX domain and chain architecture. Signal: Signal peptide. Pro: Propeptide. Gla: Gla domain. EGF1: Epidermal growth-like factor 1 domain. EGF2: Epidermal growth-like factor 2 domain. Activation: Activation peptide. Protease: Serine protease domain. b-c. Heatmaps showing carboxylation-sensitive FIX-specific carboxylation scores (b) or carboxylation-sensitive Gla-motif carboxylation scores (c) for nearly all missense FIX variants (n = 2 replicates). Heatmap color indicates antibody score from 0 (blue, lowest 5% of scores) to white (1, WT) to red (increased antibody scores). Black dots indicate the WT amino acid. Missing data scores are colored gray. Furin cleavage site (F), ω-loop (ω), ExxxExC motif (E), and aromatic stack (AS) are annotated above (b) and (c). For higher resolution heatmaps on the propeptide and Gla domains of FIX, please refer to Extended Data Fig. 7di. d-e. Density distributions of carboxylation-sensitive FIX-specific (d) or carboxylation-sensitive Gla-motif (e) carboxylation scores for FIX missense variants (orange) and synonymous variants (blue). Dashed line denotes the 5th percentile of the synonymous variant distribution. f. Scatter plot of median MultiSTEP-derived carboxylation-sensitive FIX-specific carboxylation scores and light chain secretion scores at each position in FIX. Points are colored by domain architecture, using the same color scheme as a. Black dashed line indicates >0.2 point deviation threshold from perfect correlation between carboxylation and secretion scores. Points with deviation greater than this threshold are labeled with their corresponding FIX position. Pearson’s correlation coefficient is shown. g. Crystal structure of FIX Gla domain (positions 47–92)9. Disulfide bridges (yellow) and γ-carboxylated glutamates are shown as sticks. Calcium ions are shown as teal spheres. Residues are colored as the ratio of the median carboxylation-sensitive FIX-specific carboxylation score to median FIX light chain secretion score. Missing positions are colored gray.

The FIX propeptide mediates binding of gamma-glutamyl carboxylase, the enzyme that γ-carboxylates the Gla domain prior to secretion4245. Propeptide positions p.F31, p.A37, p.I40, p.L41, p.R43, p.K45, and p.R46 are important for Gla domain γ-carboxylation, though the importance of p.R43, p.K45, and p.R46 is contested15,4648. Some substitutions at the highly conserved position p.A37 in the propeptide (e.g. p.A37D) prevent gamma-glutamyl carboxylase binding and thus Gla domain γ-carboxylation15. In our assay, 12 of the 19 substitutions at position p.A37, including p.A37D, reduced γ-carboxylation scores for both antibodies. Serine, threonine, and glutamate substitutions at position p.I40, which is conserved across all Gla-containing coagulation proteins, also reduced both γ-carboxylation scores (Fig. 4ac, Extended Data Fig. 7df). Thus, like position p.A37, position p.I40 appears to be important for γ-carboxylation.

The propeptide furin cleavage motif (p.R43 to p.R46) harbors numerous severe hemophilia B-associated variants11 (Fig. 4ac, Extended Data Fig. 7df). Variants in the furin cleavage motif can block propeptide removal48. The impact of propeptide retention on Gla domain γ-carboxylation remains controversial, with some data suggesting that gamma-glutamyl carboxylase is blocked by the retained propeptide and other data suggesting that the retained propeptide disrupts electrostatic interactions within the Gla domain altering ω-loop conformation6,48,49. In our assay, variants in the furin cleavage motif had low scores for the FIX-specific γ-carboxylation antibody but WT-like scores for the Gla-motif and secretion antibodies (Fig. 2b,c; Fig. 4b,c, Extended Data Fig. 7df). One possible explanation is that these substitutions perturb ω-loop conformation but permit γ-carboxylation of the glutamates in the conserved Gla-motif.

We analyzed Gla domain variants scores from both γ-carboxylation-sensitive antibodies and found that many variants in the ω-loop (p.N49 to p.Q57) had WT-like scores for both antibodies (Fig. 4b,c, Extended Data Fig. 7gi). The lowest FIX-specific Gla domain antibody γ-carboxylation scores resulted from substitutions at positions involved in the coordination of calcium ions (p.N48, p.E53, p.E54, p.E67, p.E73, and p.E76) in the Gla domain core9 (Fig. 4g, Extended Data Fig. 7gi). Variants at positions p.F87 and p.W88, which form an aromatic stack required for divalent cation-induced folding of the Gla domain, had low FIX-specific γ-carboxylation scores50,51 (Fig. 4f, Extended Data Fig. 7gi). Unexpectedly, variants at half of the γ-carboxylated glutamate residues in the Gla domain (p.E66, p.E72, p.E79, p.E82, p.E86) only modestly reduced FIX-specific Gla domain and Gla-motif antibody γ-carboxylation scores (Fig. 4b,c,g, Extended Data Fig. 7gi). These residues coordinate magnesium ions rather than calcium ions at physiologic concentrations52 (Fig. 4g). Many variants outside of the ExxxExC conserved Gla motif (p.E63 to p.C69) had WT-like Gla-motif antibody scores but low FIX-specific Gla-domain antibody scores (Fig. 4b,c, Extended Data Fig. 7a,di), while within the conserved Gla-motif, variants at positions p.R62, p.E66, p.E67, and p.C69 showed similar score reductions for both antibodies, as did variants around the propeptide furin cleavage site (p.R43, p.K45, and p.R46) (Extended Data Fig. 7di).

Secretion scores reflect circulating FIX levels

We previously established that many pathogenic missense variants in intracellular proteins act by disrupting protein abundance2224,34. However, for secreted proteins, the predominance of loss of secretion among pathogenic variants remains largely unexplored. Our FIX secretion scores correlated strongly with plasma FIX antigen levels reported in individuals with hemophilia B11 (n = 85, Pearson’s r = 0.76; Fig. 5a, Extended Data Fig. 8a). 90.2% (n = 37/41) of lowly secreted variants had low circulating FIX levels, defined as <40% of pooled normal plasma11. The four lowly secreted variants, p.R75Q, p.R191C, p.R379G, and p.A436V, had normal plasma FIX levels in vivo. Notably all had secretion scores or reported FIX levels close to the assay thresholds, suggesting measurement variation as the likely explanation (Supplementary Table 2). Conversely, five variants, p.F71S, p.D93N, p.A173V, p.Q241H, and p.N393T had markedly increased secretion scores relative to plasma FIX antigen levels (Supplementary Table 2). These variants occurred throughout the protein, outside of known binding sites for extravascular compartments or clearance binding partners5355. These variants could be secreted normally but have a shortened half-life in vivo, which would be missed by our assay.

Figure 5: Secretion and gamma-carboxylation scores reveal clinical features of hemophilia B and enable variant reinterpretation.

Figure 5:

a. Scatter plot of the mean and standard error of light chain secretion scores (n = 2 replicates) and FIX plasma antigen from 457 individuals with hemophilia B in the EAHAD database11. Dashed horizontal line is 40% FIX plasma antigen. Dashed vertical line is the 5th percentile of the synonymous secretion score distribution. b. Comparison of hemophilia B disease severity in the EAHAD database with light chain secretion scores (n = 490 variants). Violin plot shows distribution of points with an inset box plot representing the 25th, 50th, and 75th percentiles. Whiskers span the range of data. Dashed horizontal line is the 5th percentile of the synonymous secretion score distribution. p values from a Kruskal–Wallis test adjusted for multiple comparisons by post-hoc Dunn’s test are shown. c. Severe hemophilia B disease-associated variants with WT-like light chain secretion scores or FIX-specific γ-carboxylation antibody scores is shown. Bars are colored by domain. d. Comparison of hemophilia B disease severity in the EAHAD database with light chain secretion scores (n = 229 variants). Violin plot shows distribution of points with an inset box plot representing the 25th, 50th, and 75th percentiles. Whiskers span the range of data. Dashed horizontal line is 40% FIX plasma antigen. p values from a Kruskal-Wallis test followed by post-hoc Bonferroni-corrected Dunn’s test are shown. e. Histograms of multiplexed functional scores for F9 missense variants of known effect curated from ClinVar, gnomAD, and MLOF. Color indicates clinical variant interpretation. Data from four antibodies are shown. Dashed vertical line indicates the 5th percentile of synonymous variants used as a threshold for abnormal function. f. Receiver-operator curve for variant function classifier. Dot indicates final classifier performance. g. Histogram depicting F9 missense variant minor allele frequencies (MAF) in hemizygotes in gnomAD 4.1. Color indicates model-based functional classification using MultiSTEP scores. Vertical dashed line indicates estimated prevalence of hemophilia B in hemizygous individuals57. h. Sankey diagram of F9 variant reinterpretation using functional data as moderate or strong evidence. Labeled nodes represent the number of variants of each class.

Severe hemophilia B is defined by undetectable FIX activity (<1%). Individuals with severe hemophilia B receive more intensive treatment and are at higher risk for complications than individuals with non-severe hemophilia B (FIX activity 1–40%)13. MultiSTEP secretion scores correlated strongly with hemophilia B severity reported in EAHAD11 (Fig. 5b, Extended Data Fig. 8b). Of 258 severe disease-associated variants, secretion scores were low in 181 (70.2%) and WT-like in 77 (29.8%)11. Many of the normally secreted severe variants reside at positions important for FIX activation or activity including propeptide cleavage (p.R43, p.K45, and p.R46), γ-carboxylation (p.E53, p.E54, p.E72, p.E73, and p.E79), activation peptide cleavage (p.R226), or enzymatic activity (p.S411). 24 of the normally secreted, severe hemophilia B-associated variants were in the propeptide or Gla domain, of which 20 had low FIX-specific γ-carboxylation scores indicative of altered γ-carboxylation or Gla domain conformation (Fig. 5c). Three variants outside the propeptide and Gla domain also had low FIX-specific γ-carboxylation scores. The remaining 54 severe variants that scored normally in all assays are found predominantly within the catalytic heavy chain (Fig. 5c).

Because of the pervasive effect of cysteine substitutions on FIX secretion (Fig. 2b,c), we investigated their relationship to plasma FIX antigen and hemophilia disease severity11. Secretion scores for gain-of-cysteine variants correlated strongly with plasma FIX antigen levels (Pearson’s r = 0.88, Extended Data Fig. 8c). Of the nine gain-of-cysteine variants with plasma FIX antigen data, five were undetectable, suggesting severe disease. Indeed, gain-of-cysteine variants were less often associated with mild disease (n = 3) compared to moderate (n = 6) or severe (n = 11) disease (two-sided Fisher’s exact test, p = 0.098; Extended Data Fig. 8d). Considering all substitutions, FIX antigen levels reported in hemophilia B were markedly decreased for the majority of severe disease variants, compared to those causing non-severe disease (Fig. 5d). Thus, loss of secretion is an important mechanism by which F9 variants cause hemophilia B, and variants that cause very low or undetectable secretion are particularly likely to cause severe disease.

Secretion and γ-carboxylation scores resolve FIX VUS

In the MLOF genotyping program, 48.6% (n = 107/220) of F9 missense variants clinically suspected to cause hemophilia B were classified as VUS due to lack of functional evidence12. We generated secretion and γ-carboxylation scores for 214 of these 220 MLOF variants, including 103 VUS. To resolve F9 VUS in hemophilia B, we first curated 149 F9 missense variants of known effect from ClinVar, MLOF, and gnomAD: 135 were pathogenic/likely pathogenic and 14 were benign/likely benign1,12,56. No single secretion or γ-carboxylation score set perfectly separated pathogenic and benign variants (Fig. 5e). Thus, we integrated the secretion and γ-carboxylation scores by developing a random forest model to classify variant function as either normal or abnormal and trained the model on 111 of the 149 curated F9 variants of known effect. The model correctly classified 63.4% (n = 64/101) of known pathogenic and likely pathogenic training variants and 100% (n = 10/10) of known benign and likely benign training variants. The model performed similarly well on the test set of the remaining 38 curated F9 variants, correctly classifying 61.8% (n = 21/34) of pathogenic and likely pathogenic variants and 100% (n = 4/4) of benign and likely benign variants (Fig. 5f). When applied to all 8,528 F9 missense variants with secretion and γ-carboxylation scores, the model classified 45.3% (n = 3,859) as functionally abnormal (Supplementary Table 3).

Of 113 F9 missense variants in the gnomAD database, 26 (23.0%) were annotated as abnormal function by our model1 (Fig. 5g). Of predicted abnormal function variants, p.G106S, a known likely pathogenic variant with a mild phenotype, had the highest hemizygous minor allele frequency11,12 (MAF 1.29 × 10−5). These findings are consistent with the prevalence of hemophilia B, which is estimated to affect 1 in 20,000 live male births57. Our classification model correctly identified 47.8% of mild, 62.2% of moderate, and 75.2% of severe hemophilia B-associated variants found in gnomAD as abnormal function11 (Extended Data Fig. 8e). Mild and moderate disease-causing variants in the propeptide and Gla domains, which likely impact γ-carboxylation-related phenotypes, were better resolved (mild: 76.5%, moderate: 72.7% severe: 80.7%, Extended Data Fig. 8f). Our model’s lower sensitivity to mild disease-causing variants correlates with the fact that fewer mild and moderate disease-associated variants impact FIX plasma levels (mild: 33.3%, moderate: 35.7%) compared to severe-associated variants (56.5%, Fig. 5d). 68.5% (n = 291) of gain-of-cysteine substitutions were predicted by our classification model as abnormal function, compared to 44.0% (n = 3,568) for non-cysteine variants (odds ratio: 2.75, two-sided Fisher’s exact test, p = 5.58 × 10−23).

Next, we compared our secretion and γ-carboxylation scores, along with the functional data-driven model, to four variant effect predictors: EVE, AlphaMissense, REVEL, and CADD5861. For secretion scores, EVE (ρ = 0.611) correlated most strongly, followed by AlphaMissense (ρ = 0.545), REVEL (ρ = 0.514), and CADD (ρ = 0.413) (Extended Data Fig. 9a). For FIX-specific γ-carboxylation scores, AlphaMissense correlated most strongly (ρ = 0.635), followed by EVE (ρ = 0.630), REVEL (ρ = 0.601), and CADD (ρ = 0.481). A comparison of 26 deep mutational scanning datasets to 55 existing variant effect predictors found similar levels of correlations62.

We quantified the ability of our functional data-driven model and the four variant effect predictors to discriminate between pathogenic/likely pathogenic and benign/likely benign variants in our test set (Extended Data Fig. 9b,c). Both MultiSTEP and AlphaMissense had perfect positive predictive value (PPV = 1), followed by CADD (PPV = 0.94), REVEL (PPV = 0.92), and EVE (PPV = 0.71). However, MultiSTEP classified more known pathogenic/likely pathogenic variants correctly (n = 21) than AlphaMissense (n = 16). CADD and REVEL had the highest negative predictive value (NPV = 0.5), followed by MultiSTEP (NPV = 0.24), AlphaMissense (NPV = 0.18), and EVE (NPV = 0). EVE’s performance was difficult to evaluate because it did not score 31.6% (n = 12/38) of variants.

We next reassessed reported F9 missense variants classified as VUS in MLOF using evidence from the model. We applied strong evidence for pathogenicity to variants predicted to have abnormal function2,12. This resulted in reclassification of 63.1% (n = 65/103) of VUS to likely pathogenic and 67.3% (n = 66/98) of likely pathogenic variants to pathogenic (Fig. 5h, Supplementary Table 4). An alternative Bayesian approach suggested that our model yielded moderate evidence for pathogenicity3,63,64 which led to 61.2% (n = 63/103) of VUS reclassified as likely pathogenic and 13.3% (n = 13/98) of likely pathogenic variants reclassified as pathogenic (Fig. 5h, Supplementary Table 4). Because our classification model incorporated only secretion and γ-carboxylation functional data, it cannot identify variants that cause hemophilia B by affecting other FIX functions. Indeed, the model classified 38% of pathogenic variants as normal function, implying that these variants could be pathogenic because of a different type of defect (e.g. activation, catalysis, etc). Therefore, we did not use normal function predictions as evidence for benign classification.

MultiSTEP can be used to study diverse secreted proteins

We evaluated MultiSTEP’s generalizability by applying it to six additional secreted proteins: coagulation factors VII, VIII, and X, alpha-1 antitrypsin, plasma protease C1 inhibitor, and proinsulin. Antibody staining of the MultiSTEP Strep II tag revealed robust surface expression in each case (Fig. 6a). To provide further evidence that proteins displayed using MultiSTEP were properly folded, we focused on factor VIII. Displayed FVIII robustly bound a panel of monoclonal antibodies directed against the FVIII A1, A2, A3, C1, and C2 domains as well as two monoclonal antibodies that recognize discontinuous FVIII epitopes, confirming that displayed FVIII is folded correctly (Fig. 6b,c, Extended Data Fig. 10).

Figure 6: MultiSTEP can be applied to diverse secreted proteins.

Figure 6:

a. Flow cytometry of cells expressing protein and control constructs in the MultiSTEP backbone following staining with an anti-strep II tag antibody (n ~30,000 cells each). Unrecombined cells do not display FIX. All other constructs contain the MultiSTEP flexible linker, strep II tag, and transmembrane domain. Δstart is a FIX cDNA that lacks a start codon. TM only lacks a secreted protein of interest. FIX Δsignal peptide expresses a FIX molecule without its secretion-targeting signal peptide. b-c. Flow cytometry of B-domain deleted coagulation factor VIII (FVIII) in the MultiSTEP backbone or unrecombined negative control cells (NC) (n ~30,000 cells each) stained with an anti-FVIII A1-A3 antibody, which targets the discontinuous epitope at the interface of the A1 and A3 domains (b) or an anti-FVIII A2 antibody, which targets a discontinuous epitope (positions 497–510 and 584–593) within the A2 domain (c). d-h. Flow cytometry of B-domain deleted coagulation factor VIII (FVIII) and 5 FVIII variants in the MultiSTEP backbone along with unrecombined negative control cells (NC) (n = ~10,000 cells each). Cells were stained with anti-FVIII antibodies specific to the A1 (d), A2 (e), light chain (f), C1 (g), or C2 (h) domains. i-m. Flow cytometry of cells expressing coagulation factor VII (i), coagulation factor X (j), proinsulin (k), plasma protease C1 inhibitor (l), and alpha-1 antitrypsin (m) constructs in the MultiSTEP backbone along with unrecombined negative control (NC) (n ~10,000 cells each) stained with an anti-strep II tag antibody.

Investigation of 5 FVIII variants known to impact FVIII secretion revealed that we could identify a wide range of secretion defects (Fig. 6dh). For the other five secreted proteins, we identified at least one variant reported to have either clinical (plasma antigen) or in vitro evidence of decreased secretion6568. Variants in FVII, FX, insulin, and plasma protease C1 inhibitor all demonstrated a marked decrease in surface expression, whereas variants in alpha-1 antitrypsin showed a small decrease (two sided t-test, p = 0.046 and p = 0.049, Fig. 6im).

Discussion

MultiSTEP is a generalizable, multiplexed method for measuring the effects of variants on the function of human secreted proteins. MultiSTEP can be combined with diverse antibodies to quantify secretion and post-translational modifications, which until now have been beyond the reach of MAVEs. We applied MultiSTEP to coagulation factor IX, a protein critical for hemostasis, measuring 44,816 variant effects on secretion and post-translational γ-carboxylation for 8,528 missense and 436 synonymous variants using a panel of five antibodies. Nearly half of FIX missense variants caused some loss of function, mostly by reducing or eliminating secretion. Adding or removing cysteines profoundly impacted FIX secretion, likely reflecting the importance of correct disulfide bonding on FIX structure2224,34. Propeptide and Gla domain variants with reduced scores for one or both carboxylation-sensitive antibodies generally did not impact FIX secretion. In the propeptide, variants that prevent propeptide cleavage can alter Gla-domain conformation while permitting the γ-carboxylation of at least some glutamates. In the Gla domain, γ-carboxylated glutamates that coordinate calcium or comprise the C-terminal aromatic stack were markedly more sensitive to variation than those that coordinate magnesium, consistent with the importance of calcium coordination for core Gla domain structure and function11,52.

We found that more than 40% of variants associated with severe hemophilia B had profound secretion defects, illustrating the importance of loss of secretion as a causal mechanism of hemophilia B. We also used our data-driven model predictions to reevaluate the classifications of all F9 missense variants found in the MLOF hemophilia genotyping program, enabling us to upgrade 65 of 103 F9 missense VUS to likely pathogenic and 66 likely pathogenic variants to pathogenic. These model predictions are available for 8,528 (97.4%) FIX missense variants, meaning that newly discovered variants in hemophilia B can be more accurately and quickly classified in the future. Variant effect predictors and functional data are considered independent lines of evidence when classifying a variant, and we showed that several variant effect predictors could distinguish pathogenic and benign variants2. Thus, the high-quality, comprehensive in vitro functional data we furnish, along with variant effect predictions, promise to dramatically diminish the number of VUS in F9 going forward.

MAVE-compatible E. coli and S. cerevisiae cell surface display methods have typically been applied to intracellular proteins69,70. These methods are not generally suitable for human secreted proteins which require specialized intracellular trafficking pathways and extensive post-translational modification. Mammalian cell surface display systems that allow for the study of post-translational modifications rely on non-native signal peptides71,72. These artificial signal peptides prevent accurate assessment of variant effects on native signal peptides and subsequent protein folding, trafficking, localization, or post-translational modification processes26. Approaches to display IgG antibodies using endogenous signal peptides have been developed recently, but, unlike MultiSTEP, rely on transient transfection or lentiviral integration to express variants71,73. MultiSTEP avoids common issues with transient transfection or lentiviral integration by employing a genomically integrated landing pad to introduce a single variant per cell17. Thus, by enabling multiplexed human cell surface display of secreted protein variants, MultiSTEP meets a unique and important need.

MultiSTEP has limitations. For example, while all seven of the secreted proteins we displayed using MultiSTEP were expressed on the cell surface, the folding, secretion, or function of some other proteins may be compromised by linkage to the C-terminal transmembrane domain. Expression of protein on the surface could be affected by the choice of transmembrane domain, or some may require expression in a particular cell type. We chose the 293-F cell line because it has been used to produce a wide variety of human proteins, improves protein expression, and enables large-scale culture. The 293-F cell line also grows in suspension, eliminating potential difficulties with dissociating adherent cells74,75. MultiSTEP could be implemented in the plethora of cell lines where recombinase-based landing pads work as well as in hESCs, iPSCs, and mouse embryonic fibroblasts17,22,24,34,7693. However, while we demonstrated a MultiSTEP-compatible adherent cell approach, some displayed proteins might still be adversely impacted.

Using antibodies as detection reagents also imposes limitations. Variants that disrupt antibody epitopes can give falsely low signals, so we recommend using multiple antibodies. Some antibodies, like the FIX-specific carboxylation-sensitive antibody, require specific conformations for binding which must be accounted for when interpreting the results. Antibody quality is also important. Our Strep II tag antibody assay replicates were less robust than the other antibody assays. Since all assays were executed with the same library and with similar parameters (e.g. number of cells sorted, depth of sequencing, analysis pipeline, etc), we believe the Strep II tag antibody assay itself was the cause of the quality difference. Thus, careful antibody selection and validation is critical94. Lastly, no functional assay fully recapitulates in vivo biology. For example, we did not measure FIX enzymatic activity in our system. Therefore, we discourage using our results as evidence for benign variant classification in hemophilia B.

MultiSTEP is a generalizable platform for measuring the effects of missense variation in secreted proteins, complementing the plethora of MAVEs already developed to study intracellular proteins. Because MultiSTEP is compatible with reagents that produce a fluorescent signal, we were able to probe multiple functional aspects of FIX using a panel of antibodies. Antibodies are available for numerous clinically and biologically important secreted proteins, as well as a variety of PTMs. Moreover, other fluorescently labeled reagents such as protein binding partners or labeled covalent substrates could be used to read out other functions. Lastly, cells could be interrogated for the presence of intracellular proteins, or fragments thereof, offering the opportunity to measure the abundance of variants both inside and outside the cell, potentially shedding light on mechanisms leading to poor secretion such as altered folding, stability, or trafficking. The ability to characterize the diverse effects of massive numbers of secreted human protein variants on the surface of human cells creates the opportunity to understand variant effects on secreted protein structure, function, and pathogenicity.

Methods

General reagents

Chemicals were purchased from ThermoFisher Scientific and synthetic oligonucleotides were purchased from IDT unless otherwise noted. Sequences of oligonucleotides used in this work can be found in Supplementary Table 5.

All E. coli were grown in Luria Broth (LB) at 37°C shaking at 225 rpm for 16–18 hours with 100 μg/mL carbenicillin, unless otherwise indicated. Routine cloning was performed in homemade chemically competent Top10F’ E. coli, whereas library cloning was performed in commercially available electrocompetent NEB-10β E. coli (New England Biolabs) using manufacturer’s protocols unless otherwise indicated.

Inverse PCR reactions, unless otherwise specified, were performed in 30 μL reactions with 2x Kapa HiFi ReadyMix (Kapa Biosystems) or Q5 2x master mix (New England Biolabs), with 40 ng starting plasmid and 0.15 μM each forward and reverse primers. Reaction conditions were 95°C for 3 minutes, 98°C for 20 seconds, 61°C for 30 seconds, 72°C for 1 minute/kb, repeating for 8 total cycles, then followed by a final 72°C extension for 1 minute/kb, and held at 4°C. PCR products were then digested at 37°C with DpnI (New England Biolabs) for 2 hours, followed by heat inactivation at 80°C for 20 minutes. PCR products were then gel extracted if needed.

For large modifications (>50 bp), PCR products were then Gibson assembled using a 3:1 molar ratio of insert(s):backbone at 50°C for 1 hour, after which 2 μL of product was transformed into homemade chemically competent Top10F’ E. coli95. For small modifications, in vivo assembly cloning (IVA cloning) of linear products was used by transforming 5 μL of PCR product directly into Top10F’ E. coli without recircularization96. Colonies were Sanger sequence confirmed and isolated using miniprep or midiprep kits according to the manufacturer’s instructions (Qiagen).

A Golden Gate assembly-compatible MultiSTEP vector (MultiSTEP-GG) was designed with BsmBI cut sites flanking the open reading frame. For Golden Gate assembly of each construct, 75 ng of the MultiSTEP-GG plasmid were combined in a 2:1 molar ratio of each gene fragment. DNA was incubated in a 25μL Golden Gate reaction containing 10x T4 ligase buffer, 3 μL Esp3I, and 2.5 μL T4 DNA ligase (both enzymes from NEB). The Golden Gate reaction was incubated on a thermocycler for 5 minutes at 37°C followed by 5 minutes at 16°C. After 30 cycles, there was a final cycle of 15 minutes at each temperature, followed by heat inactivation at 85°C. To decrease background, an additional 3 μL water, 1 μL Esp3I, and 1 μL 10x CutSmart buffer were added to each reaction mixture and incubated for four hours at 37°C. The reaction mix was cleaned using a NEB Clean and Concentrate kit, then transformed into Top10F’ as above.

Variant nomenclature

Variants are described using Human Genome Variation Society (HGVS) nomenclature, which numbers the start position (1) as the first methionine of the dominant isoform97. For FIX, other variant numbering systems have been used in other publications, particularly the legacy system (based on mature FIX protein) and the chymotrypsin system (based on evolutionary similarity of serine proteases to chymotrypsin). Direct conversions between the HGVS, legacy, and chymotrypsin numbering systems for FIX are in Supplementary Table 6.

Cloning into the MultiSTEP landing pad donor plasmid

To clone attB-F9–10L-strepII-10L-CD28-IRES-mCherry (pNP0001), first an inverse PCR was performed on attB-EGFP-PTEN-IRES-mCherry-562bgl with primers NP0207 and NP0325 to remove EGFP-PTEN and create compatible Gibson overhangs using Kapa HiFi polymerase (Kapa Biosystems). A gBlock (NPg0007) containing human F9 cDNA and a second gBlock (NPg0012) containing a GC-optimized 10 amino acid (GGGGS)2 flexible linker, a strep II protein tag, and the single-pass transmembrane domain of CD28 was then assembled following the Gibson assembly cloning protocol above. After sequence-confirmation, a second round of inverse PCR was performed to insert a second (GGGGS)2 flexible linker after the strep II protein tag using primers NP0334 and NP0356 following the IVA cloning protocol above. All other MultiSTEP construct plasmids were created using the IVA, Gibson assembly, or Golden Gate cloning protocols above (see Supplementary Table 7 for descriptions and primers). cDNA constructs for human F7, F10, SERPINA1, SERPING1, and INS were ordered from the Mammalian Gene Collection (Horizon Discovery) and cloned into the landing pad donor backbone (pNP0079) using Gibson assembly. pcDNA4/Full Length FVIII was a gift from Robert Peters (Addgene #41036). We also cloned a B-domain deleted version of F8 shared by Dr. Steven Pipe (University of Michigan) into the MultiSTEP construct98100.

Site-saturation mutagenesis library cloning

Site-saturation mutagenesis oligonucleotide pools were ordered from Twist Biosciences for each position in F9 (Supplementary Table 1). Each position contained one codon for each synonymous or missense variant. The library was designed to include all 8,759 possible missense variants and 461 synonymous variants, one for each position (Supplementary Table 1). 50 ng of each oligonucleotide were resuspended in 10 μL water, and then pooled in equal volumes into three tiled sublibraries encompassing the entire length of F9, including 20 positions of overlap between adjacent sublibraries. Tile 1 sublibrary: positions p.Q2-p.K164; Tile 2 sublibrary: positions p.A146-p.I318; Tile 3 sublibrary: positions p.L299-p.T461.

pNP0079 was inverse PCR amplified using NP0295 and NP0325. The backbone PCR product was then gel extracted and Gibson assembled with each of the three pooled sublibraries at a 5:1 molar ratio of insert:backbone at 50°C for 1 hour. Gibson assembled products were then cleaned and eluted in 10 μL water (Zymo Clean and Concentrate). Two replicates of 1 μL of cleaned product were then added to 25 μL NEB-10β E. coli in pre-chilled cuvettes, allowed to rest on ice for 30 minutes, electroporated at 2 kV for 6 milliseconds, immediately resuspended in 100 μL pre-warmed SOC, and transferred to a culture tube. Identical replicates were pooled. Re-warmed SOC was added to a final volume of 1 mL and allowed to recover at 37°C, shaking at 225 rpm, for 1 hour. The entire recovery volume was added to 49 mL of LB containing 100 μg/mL carbenicillin. After 2–3 minutes of shaking, a 200 μL sample was taken and used for serial dilutions to estimate library coverage. After 16 hours at 37°C, each 50 mL culture was spun down for 30 minutes at 4,300 x g and plasmid DNA isolated using a midiprep kit according to the manufacturers instructions (Qiagen).

Barcoding site-saturation mutagenesis libraries

To barcode each sublibrary, 1 μg of each sublibrary plasmid was digested at 37°C for 5 hours with NheI-HF and SacI-HF (New England Biolabs), incubated with rSAP for 30 minutes at 37°C, then heat-inactivated at 65°C for 20 minutes. Digested product was gel extracted (Qiagen).

An IDT Ultramer (NP0490) with 18 degenerate nucleotides was resuspended at 10 μM. 1 μL NP0490 was then annealed with 1 μL of 10 μM NP0397 primer, 4 μL CutSmart buffer, and 34 μL water by running at 98°C for 3 minutes, followed by ramping down to 25°C at −0.1°C per second. After annealing, 1.35 μL of 1 mM dNTPs and 0.8 μL Klenow exo- polymerase (New England Biolabs) were added to fill in the barcode oligo. The cycling conditions were 25°C for 15 minutes, 70°C for 20 minutes, and then ramped down to 37°C in −0.1°C per second increments. Once at 37°C, 1 μL each NheI-HF and SacI-HF were added and digested for 1 hour. Digested product was then gel extracted (Qiagen).

Both gel extracted sublibrary and barcode oligonucleotide were cleaned and eluted in 10 μL and 30 μL water, respectively (Zymo Clean and Concentrate). A 7:1 molar ratio of barcode oligonucleotide to sublibrary was ligated overnight at 16°C with T4 DNA ligase (New England Biolabs).

Ligated product was then cleaned and eluted in 10 μL water (Zymo). 1 μL of ligation product, ligation controls, or pUC19 was electroporated into NEB-10β E. coli following the same procedure as above. For sublibrary ligation products, 2 independent replicates were pooled before recovery. Each pooled ligation product was bottlenecked by diluting various recovery volumes (500 μL, 250 μL, 125 μL, and 50 μL) into 50 mL cultures for 16 hours. Colony counts were used to estimate the barcoded variants coverage in each sublibrary as above. Each 50 mL midiprep culture was spun down for 30 minutes at 4,300g and plasmid DNA isolated using a midiprep kit according to the manufacturer’s instructions (Qiagen).

Estimation of barcoded variant coverage by Illumina sequencing

Each barcoded and bottlenecked plasmid sublibrary was diluted to 10 ng/μL and amplified for Illumina sequencing. Briefly, adapters were added using NP0492 and NP0493 at a final concentration of 0.5 μM with 10 ng plasmid DNA, 25 μL Q5 polymerase (New England Biolabs), and 19 μL water. Cycling conditions were 98°C for 30 seconds, followed by 5 cycles of 98°C for 10 seconds, 61°C for 30 seconds, and 72°C for 30 seconds, followed by a final 72°C extension for 2 minutes. PCR products were then cleaned using 0.8x AmpureXP beads (Beckman Coulter) and eluted in 16 μL water following the manufacturer’s instructions.

The entire elution volume for each sample was then mixed with 25 μL Q5 polymerase, 0.25 μL of 100x SYBR Green I, 4.75 μL water, and one uniquely indexed forward (NP0565-NP0578) and reverse (NP0551-NP0564) primer at a final concentration of 0.5 μM. Samples were run on the CFX Connect (Bio-Rad) for a maximum of 15 cycles or until all samples were above 3,000 relative fluorescence units. Reactions were denatured at 98°C for 30 seconds, and cycled at 98°C for 10 seconds, 65°C for 30 seconds, and 72°C for 30 seconds, with a final extension at 72°C for 2 minutes. Samples were gel extracted with a Freeze ‘N Squeeze column (Bio-Rad) and quantified using the Qubit dsDNA HS Assay Kit (ThermoFisher Scientific), pooled in equimolar concentrations, and sequenced on a NextSeq 550 using a NextSeq 500/550 High Output v2.5 75 cycle kit (Illumina) using custom sequencing primers NP0494-NP0497. Using a custom script, sequencing reads were converted to FASTQ format and demultiplexed using bcl2fastq (v2.20), forward and reverse barcode reads were paired using PEAR (v0.9.11), and unique barcodes were counted and filtered101.

PacBio sequencing of FIX sublibraries for variant-barcode mapping

2.5 μg of each library subtile was digested with AflII (New England Biolabs) in CutSmart buffer for 4 hours at 37°C, heat inactivated at 65°C for 20 minutes and purified with AMPure PB beads (Pacific Biosciences, 100–265-900). SMRTbell sequencing libraries were prepared using the SMRTbell Express Template Prep Kit 2.0 (Pacific Biosciences, 100–938-900) with barcoded adapters (Pacific Biosciences, 101–628-400) following the manufacturer’s protocol.

After library preparation, the barcoded libraries were pooled and sequenced on two SMRT Cells 8M using Sequencing Plate v2.0, diffusion loading, 1.5 hour pre-extension, and 30-hour movie time. Additional data were collected after treatment with SMRTbell Cleanup Kit v2 to remove imperfect and damaged templates, using Sequel Polymerase v2.2, adaptive loading with a target of 0.85, and a 1.3 hour pre-extension time. CCS consensus and demultiplexing were calculated using SMRT Link version 10.2 with default settings and reads that passed an estimated quality filter of ≥Q20 were selected as “HiFi” reads and used to map barcodes to variants.

“HiFi” PacBio reads were subjected to a custom analysis pipeline, AssemblyByPacBio. Each consensus CCS sequence was aligned to the WT F9 cDNA sequence using BWA-MEM v0.7.10-r789, generating CIGAR and MD strings, which was used to extract barcodes and the variable region containing F9102. The output from AssemblyByPacBio was then passed through PacRAT with a variant agreement threshold of less than 0.6 or fewer than 3 independent CCS reads were filtered from further analysis, resulting in 260,224 unique barcodes across all three sublibraries103. A custom R script was used to generate a final barcode-variant map (Supplementary Table 1).

General cell culture conditions

HEK293T cells (ATCC CRL-3216) were grown at 37°C and 5% CO2 in Dulbecco’s modified Eagle’s medium (ThermoFisher Scientific) supplemented with 10% fetal bovine serum (ThermoFisher Scientific), 100 U/mL penicillin, and 100 ng/mL streptomycin (ThermoFisher Scientific). Cells were passaged every 2–3 days by detachment with 0.05% trypsin-EDTA (ThermoFisher Scientific). Freestyle 293-F cells were grown in Freestyle 293 Expression Medium (ThermoFisher Scientific) at 37°C and 8% CO2 while shaking at 135 rpm. Cells were regularly passaged by dilution to 3 × 105 cells/mL once reaching a concentration between 1 × 106 and 2 × 106 cells/mL, unless otherwise stated. All Freestyle 293-F cells containing a landing pad were induced with 2 μg/mL doxycycline (Sigma-Aldrich). All cells were purchased directly from the vendor but were not further authenticated.

Lentiviral transduction to generate suspension Freestyle 293-F landing pad line

2.5 × 105 HEK293T cells were passaged into 6 well plates, and transfected with 500 ng pMD-VSVg (Addgene #12259), 1,750 ng psPax2 (Addgene #12260), and 1,750 ng landing pad G384A vector template using 6 μL Fugene 6 (Promega) following the manufacturer’s protocol. The next day, media was exchanged and supernatant was collected every 12 hours for 72 hours to harvest lentivirus. The supernatant was centrifuged at 300 x g for 10 minutes and passed through a 0.45 μm filter and stored at −80°C17.

1 × 107 Freestyle 293-F cells were plated in 20 mL media and then incubated with 1 mL to 1 μL of lentivirus-containing supernatant. 24 hours later, media was removed and cells were washed before replating into 30 mL media. On day 4 post-transduction, 2 μg/mL doxycycline was added to the cells, which were then grown for 10 days with regular passaging.

Cells were then washed with PBS + 1% bovine serum albumin (BSA, Sigma-Aldrich) before assessing mTagBFP2 fluorescence on an LSR II (BD Biosciences). mTagBFP2+ cells were sorted from samples with a multiplicity of infection (MOI) <1 by FACS on an Aria III. 17,114 recovered mTagBFP2+ cells were replated into a half deep 96 well plate (Applikon Biotechnology) and expanded with 2 μg/mL doxycycline (Sigma Aldrich) and 100 μg/mL blasticidin (Invivogen) to select for landing pad function. Single landing pad integration was confirmed by co-transfection of EGFP and mCherry recombination vectors.

Fluorescence-activated cell sorting parameters

Live cells were identified using FSC-A vs SSC-A, then gated for single cells using two sequential gates–the first: FSC-A vs FSC-H, and the second: SSC-A vs SSC-H. mTagBFP2 expression was excited using the 405 nm laser and captured on 450/50 nm bandpass filter. A 410 nm long pass filter preceded the 450/50 nm bandpass filter on the Symphony A3. mCherry expressed from the recombined landing pad was excited using the 561 nm laser and captured using a 595 nm (LSR II) or 600 nm (Aria III and Symphony A3) long pass and 610/20 nm bandpass filters. EGFP or Alexa488-labeled antibodies were excited using the 488 nm laser and captured using a 505 nm long pass and 530/30 nm bandpass filters (LSR II and Aria III) or a 515/20 bandpass filter (Symphony A3). Alexa647-labeled antibodies were excited using a 637 nm (LSR II and Symphony A3) or 640 nm laser (Aria III) and captured using a 670/30 nm bandpass filter (LSR II and Aria III) with an additional 650 nm long pass filter for the Symphony A3. All flow cytometry data were collected with FACSDiva v.8.0.1 and analyzed using FlowJo v.10.7.1.

Recombination of Freestyle 293-F cells

Freestyle 293-F cells were transfected at 1 × 106 cells/mL with 293Fectin following the manufacturer’s protocol with the following alterations. For every 1 mL of cells transfected, 2 μL 293Fectin was mixed with 31.5 μL OPTI-MEM in one tube, and 1 μg of plasmid DNA was added to OPTI-MEM for a final volume of 33.5 μL in a second tube. For recombination experiments, the 1 μg of total DNA was split in a 1:15 ratio of pCAG-NLS-Bxb1 (Addgene #51271) and recombination vector. After 5 minutes at room temperature, the two tubes were gently mixed, incubated at room temperature for 20 minutes, and added to cells. For single variants or controls, 1 × 107 cells in 10 mL were transfected. For libraries, 3 × 107 cells in 30 mL of cells were transfected.

48 hours after transfection, cells were split 1:9 into two separate flasks with 2 μg/mL doxycycline. The flask containing 9 parts was additionally treated with 10 nM rimiducid (AP1903) to kill unrecombined cells. Two days after rimiducid treatment, live cells were separated from dead cells using Histopaque-1077 (Sigma-Aldrich). Cells were diluted to 35 mL and added slowly on top of 15 mL Histopaque-1077 in a 50 mL conical. Cells were centrifuged at 400 x g for 30 minutes with no acceleration or break. Cells at the interface between Histopaque-1077 and media were removed and resuspended in 30 mL of media. Cells were re-centrifuged at 300 x g for 5 minutes, resuspended in 30 mL of fresh media and 2 μg/mL doxycycline, and allowed to grow for one week prior to experiments.

ELISA

Freestyle 293-F cells were recombined and isolated as above. 48 hours after doxycycline induction, supernatant samples were collected. IX abundance in WT-expressing cell supernatants was calculated for each time point relative to pooled normal plasma (Precision Biologic) using a FIX matched-pair polyclonal antibody EIA kit (Enzyme Research Labs). The 48 hour time point supernatants were assayed for FIX abundance relative to timepoint-matched WT FIX supernatant over multiple assays containing four dilutions in duplicate for each supernatant. The FIX concentration for each well was interpolated from the standard curve for each plate. Values outside the linear range of the curve were discarded. Mean FIX abundance in supernatant was calculated from the pool of in-range values across all assays and replicates for each sample and time point.

Antibody staining for surface-displayed proteins

For secretion antibodies, cold PBS + 1% BSA was used as a staining and dilution buffer. For γ-carboxylation antibodies, a 1:10 dilution of cold PBS + Ca/Mg + 1% BSA into PBS + 1% BSA was used. Supplementary Table 8 contains additional staining details.

Cells were plated at 3 × 105 cells/mL in either 10 mL (single variants and controls) or 30 mL (libraries) of Freestyle media and 50 nM vitamin K1 (Sigma-Aldrich). Samples with and without 100 nM warfarin (Sigma-Aldrich) were used for γ-carboxylation antibodies. After 24 hours, cells were induced with 2 μg/mL doxycycline and grown for 48 hours.

On the day of staining, flasks of cells were split into 4 mL volumes (6 per 30 mL flask, 1 per 10 mL single variant or control) and spun at 300 x g for 3 minutes. Media was aspirated, and 3 washes of 1 mL cold staining buffer were performed, with 300 x g spins and supernatant aspiration. Cells were resuspended in 100 μL of diluted primary antibodies and incubated at room temperature for 30 minutes, with vortexing at 10 minute intervals. After primary antibody staining, cells were washed three times with 1 mL staining buffer and spun at 300 x g for 3 minutes. Cells were resuspended in 100 μL of diluted secondary antibodies and incubated at room temperature for 30 minutes in darkness, with vortexing at 10 minute intervals. Cells were washed three times with 1 mL cold staining buffer and spun at 300 x g for 3 minutes before resuspension in 500 μL staining buffer and pooling of identical tubes.

Sorting was performed on an Aria III (BD Biosciences), dividing the library into four approximately equally sized quartile bins based on the ratio of fluorescent antibody to mCherry signal. At least 2 million cells were sorted into each bin. After sorting, each bin was spun at 300 x g for 5 minutes and resuspended in 10 mL of Freestyle 293 Expression media supplemented with 100 U/mL penicillin and 100 ng/mL streptomycin. Recovered cells were expanded until 20 million cells could be harvested. Cells were centrifuged at 300 x g for 10 minutes, and the pellet was flash frozen in liquid nitrogen before storage at −20°C.

Genomic DNA prep, barcode amplification, and sequencing

Genomic DNA was prepared from each cell pellet using six DNEasy Blood and Tissue columns (Qiagen) following manufacturer’s instructions with the addition of a 30 minute RNase digestion at 56°C during the resuspension step, as described previously22,24,34.

Two technical PCR replicates were performed on each cell pellet (Supplementary Table 9). Barcode amplicons were generated as described above with the following changes: 1) For each replicate, 8 identical 50 μL first-round PCRs were prepared with 2,500 ng genomic DNA, 25 μL Q5 polymerase, and 0.5 μM of NP0492 and NP0546 primers. 2) During Ampure XP cleanup, samples were eluted into 21 μL water, and 40% (8 μL) was used in the second-round PCRs. 3) Second-round PCRs were amplified for a maximum of 20 cycles or until all samples were above 3,000 relative fluorescence units. 4) Samples were sequenced using custom sequencing primers NP0494, NP0495, NP0497, and NP0550 on either a NextSeq 550 using a NextSeq 500/550 High Output v2.5 75 cycle kit or on a NextSeq 2000 using a NextSeq 1000/2000 P3 50 cycle kit (Illumina).

Calculating scores and classifications

Using a custom script, forward and reverse barcode sequencing reads were converted to FASTQ format, demultiplexed using bcl2fastq (v2.20), paired using PEAR (v0.9.11), and unique barcodes counted101. Variants were then assigned to barcodes using the barcode-variant map described above. Any barcode associated with insertions, deletions, or multiple amino acid substitutions in FIX was removed. Variants below a frequency of 1 × 10−6 or observed in fewer than two replicates were removed (Supplementary Table 1).

Secretion and γ-carboxylation scores were calculated using a modified analysis pipeline22,24,34. Briefly, for each experiment, a weighted average of every variant’s frequency in each bin was calculated: wbin 1: 0.25, wbin 2: 0.5, wbin 3: 0.75, wbin 4: 1. The weighted average for each variant was then min-max normalized such that the median score of WT barcodes was 1 and the median score of the lowest 5th percentile of missense variants was 0. Final average secretion and γ-carboxylation scores and standard errors for each variant were computed using all replicates.

Clinical variant curation

A control set of pathogenic and benign FIX missense variants were collected from ClinVar (accessed 1/10/2023), MLOF, and gnomAD v.4.11,12,56 (accessed 12/3/2024) and re-evaluated using current publications. ClinVar variants with conflicting classifications or zero stars were removed (Supplementary Table 10). As the incidence of hemophilia B is approximately 3.8 to 5 per 100,000 male births, we deemed any gnomAD FIX variants with minor allele frequency in hemizygotes of greater than 1 per 1,000 as benign22,57,104,105. F9 variants from MLOF were deemed benign if there was normal FIX activity in a hemizygous male.

Data on FIX antigen levels, FIX activity, and disease severity reported in individuals with hemophilia B were collected from EAHAD (accessed 10/9/2023), resulting in 594 variants with at least one of the three clinical data points11 (Supplementary Table 11). FIX antigen and activity reported as “<1%”, “<1 IU/dL”, or “undetectable” were assigned a value of 0.1% in our analyses. FIX antigen or FIX activity values were averaged for each variant. A consensus of disease severity across individuals was used. If a variant’s antigen, activity, or severity was only reported in a single individual, that variant was removed from further analysis.

Variant reclassification

We built a classifier to distinguish between functionally normal and abnormal variants using ranger and tidymodels in R. We combined curated benign and likely benign variants as well as pathogenic and likely pathogenic variants. We split the curated variants into training (75%) and testing (25%) sets. To account for unbalanced class sizes in our training dataset (8.4% benign/likely benign and 90.6% pathogenic/likely pathogenic), we performed class-based random oversampling (ROSE) and trained the model using 5-fold cross-validation106. Model performance was evaluated using an ROC curve generated with the test set. We used the ACMG rules-based guidelines to reinterpret variant classifications using in vitro functional data as strong evidence of pathogenicity2,12. We also employed a Bayesian framework that yielded moderate evidence of pathogenicity3,63,64. All other evidence codes were applied according to ACMG adapted for an X-linked inherited monogenic Mendelian disorder12. All 103 VUS and their updated classifications are provided in Supplementary Table 4.

Extended Data

Extended Data Figure 1: MultiSTEP is based on a flexible genomically integrated approach for expressing secreted protein variants.

Extended Data Figure 1:

a. Cartoon depicting integration of a MultiSTEP plasmid construct into a genomically integrated landing pad cassette17. (Top): Lentivirally integrated landing pad cassette expressing mTagBFP2+ (royal blue) from a tetON inducible promoter. mTagBFP2 is fused to an inducible caspase-9 (iCasp9, orange) and a blasticidin resistance gene (dark yellow) with 2A sequences (dark pink) expressing mtagBFP2–2A-iCasp9–2A-BlastR from a tetON inducible promoter with a attP serine recombinase recognition site (black). Downstream is a terminator sequence (Term, brown) and tet repressor (tetR, salmon). Bxb1 serine recombinase, expressed from another plasmid, is shown in grey. (Middle): MultiSTEP plasmid construct. Secreted protein coding sequence (pink) is C-terminally fused to flexible linkers (teal), strep II epitope tag (green), and CD28 transmembrane domain (medium blue). IRES (purple) drives co-transcription of mCherry (red). Upstream is an attB serine recombinase recognition sequence (goldenrod) and a unique 18 nucleotide degenerate barcode (BC, light yellow). (Bottom): Landing pad following plasmid integration. attP and attB sequences have been recombined, forming attL and attR sequences. b. Sequential flow cytometry gating scheme for detecting and isolating landing pad cells with an integrated MultiSTEP construct. Dot pseudocolor indicates density of cells. FSC: Forward scatter; SSC: side scatter. c. Comparison of negative control 293-F cells (top) with 293-F cells incubated with lentivirus encoding the landing pad cassette (bottom, n >10,000 cells). d. Comparison of unrecombined landing pad cells (top) with cells transfected with a MultiSTEP plasmid encoding WT FIX (bottom, n > 10,000 cells). e. Comparison of cells transfected with a MultiSTEP construct encoding WT FIX treated with doxycycline (top) or doxycycline and 10 nM AP1903 (bottom, n > 10,000 cells). f. Design iterations of MultiSTEP construct plasmid in (a, top). L1-Strep MultiSTEP construct does not contain an L2 linker. Flow cytometry of MultiSTEP constructs using a anti-Strep II tag antibody (n ~30,000 cells).

Extended Data Figure 2: A flexible tag-based approach to assessing variant effects on secretion.

Extended Data Figure 2:

a. Heatmap showing strep tag secretion scores for missense FIX variants. Color indicates MultiSTEP score from 0 (blue, lowest 5% of scores) to white (1, WT) to red (increased). Black dots indicate the WT amino acid. Missing data are gray. b. Density distributions of strep tag secretion scores for FIX missense variants (orange) and synonymous variants (blue). Dashed line denotes the 5th percentile of the synonymous variant distribution. c. Scatter plot comparing MultiSTEP-derived strep tag secretion scores for seven different FIX variants (p.C28Y, p.A37T, p.G58E, p.E67K, p.C134R, p.S220T, and p.H267L), WT, and an unrecombined negative control to the geometric mean of Alexa Fluor-647 fluorescence measured using flow cytometry individually (n = 3 replicates). Error bars show standard error of the mean. d-e. Scatter plots of median MultiSTEP-derived strep tag secretion scores and heavy chain (d) or light chain (e) at each position in FIX (n = 3 replicates). Points are colored by chain architecture, using the color scheme as Fig. 2a. Black dashed line indicates the line of perfect correlation between secretion scores. Gray background indicates <0.3 point deviation from perfect correlation. f. Density plots of MultiSTEP-derived synonymous variant scores generated with the indicated antibody. The dashed vertical line shows WT score.

Extended Data Figure 3: MultiSTEP-derived FIX secretion scores correlate with orthologous measures of FIX secretion.

Extended Data Figure 3:

a. Flow cytometry of p.C28Y and WT controls (n = 10,000 cells) with the FIX library (n = 100,000 cells). b. Comparison of ELISA measurements of eight untethered FIX missense variants (p.C28Y, p.A37T, p.G58E, p.E67K, p.G125V, p.C134R, p.S220T, and p.H267L) expressed from 293-F cells and heavy chain secretion scores (n = 3 replicates). Error bars show the standard error of the mean. Pearson’s correlation coefficient is shown. c. Scatter plot comparing MultiSTEP-derived heavy chain secretion scores for 20 different FIX missense variants, WT, and unrecombined negative control (n = 3 replicates) to the geometric mean of Alexa Fluor-647 fluorescence measured using flow cytometry on cells expressing each variant individually. Error bars show standard error of the mean (n = 10,000 cells). Line Pearson’s correlation coefficient is shown.

Extended Data Figure 4: Variants near antibody epitopes demonstrate minor effects on secretion scores.

Extended Data Figure 4:

a. Scatter plot of the difference in heavy chain and light chain secretion scores and the distance in angstroms between all α-carbons in the light chain and the nearest light chain epitope α-carbon using the AlphaFold2 model of mature, two-chain FIX. Low-confidence positions with predicted local distance difference test score (pLDDT) of <70 were removed from analysis. Color indicates whether a position was identified in the light chain epitope in Fig. 2h. Horizontal dashed line indicates no difference in secretion scores. Vertical dashed line indicates boundary of likely epitope-adjacent effects on secretion scores by changepoint analysis (9.15 angstroms). b. Scatter plot of the difference in heavy chain and light chain secretion scores and the distance in angstroms between all α-carbons in the heavy chain and the nearest heavy chain epitope α-carbon using the AlphaFold2 model of mature, two-chain FIX. Low-confidence positions with pLDDT of <70 were removed from analysis. Color indicates whether a position was identified in the heavy chain epitope in Fig. 2h. Horizontal dashed line indicates no difference in secretion scores. Vertical dashed line indicates boundary of likely epitope-adjacent effects on secretion scores by changepoint analysis (5.71 angstroms). c. Scatter plot of median MultiSTEP-derived heavy chain and light chain secretion scores at each position in FIX. Points are colored by epitope (Fig. 2h) or epitope-adjacent position as in (a) and (b). Black dashed line indicates the line of perfect correlation between secretion scores. Gray background indicates <0.3 point deviation from perfect correlation.

Extended Data Figure 5: Effect of missense FIX variation on secretion compared to missense variant effects on abundance in cytosolic or transmembrane proteins.

Extended Data Figure 5:

a. Box plots of the 25th, 50th, and 75th percentiles of secretion (FIX, MultiSTEP) or abundance (all others, VAMP-seq) scores for all nonsynonymous variants across all positions with the indicated WT amino acid for six different proteins2224,34 (n = 29,287 variants). Whiskers span the range of data. b. Box plots of the 25th, 50th, and 75th percentiles of secretion (FIX, MultiSTEP) or abundance (all others, VAMP-seq) for all nonsynonymous variant amino acid substitutions across all positions for six different proteins (n = 29,287 variants).

Extended Data Figure 6: Sequence conservation strongly influences the effect of variation on FIX secretion.

Extended Data Figure 6:

a. Comparison of light chain secretion scores with Consurf conservation grades (1: least conserved, 9: most conserved)35. Violin plot shows distribution of points (n = 8,528 variants) with an inset box plot representing the 25th, 50th, and 75th percentiles. Whiskers span the range of data. Dashed horizontal line is the 5th percentile of the synonymous secretion score distribution. b. Comparison of median light chain secretion scores (n = 8,528 variants) with Consurf conservation grades. Violin plot shows distribution of points with an inset box plot representing the 25th, 50th, and 75th percentiles. Whiskers span the range of data. Dashed horizontal line is the 5th percentile of the synonymous secretion score distribution.

Extended Data Figure 7: Carboxylation-sensitive antibodies identify functional motifs.

Extended Data Figure 7:

a. Multiple sequence alignment of Gla-domain containing proteins (UniProt) that bind the carboxylation-sensitive Gla-motif (ExxxExC) antibody using MUSCLE110,111. Antibody epitopes for both the carboxylation-sensitive FIX-specific antibody (ω-loop) and the carboxylation-sensitive Gla-motif antibody are shown. hF9: human coagulation factor IX (P00740); hF2: human prothrombin (coagulation factor II, P00734); hF7: human coagulation factor VII (P08709); hF10: human coagulation factor X (P00742); hPC: human protein C (P04070); hPS: human protein S (P07225); hBGP: human osteocalcin (P02818); bBGP: bovine osteocalcin (P02820); hGAS6: human growth arrest-specific protein 6 (P14393); ppVPA: Pseudechis prophyriacus venom prothrombin activator porpharin-D (P58L93); nsVPA: Notechis scutatis venom prothrombin activator notecarin-D1 (P82807); osVPA: Oxyuranus scutellatus venom prothrombin activator oscutarin-C (P58L96). b-c. Fluorescence of unrecombined negative control and WT FIX-expressing cells with and without warfarin pretreatment generated by staining cells with a carboxylation-sensitive FIX-specific (b) or carboxylation-sensitive Gla-motif antibody (c). d-f. Heatmaps showing carboxylation-sensitive FIX-specific carboxylation scores (d), carboxylation-sensitive Gla-motif carboxylation scores (e), or light chain secretion scores (f) for FIX propeptide variants. Furin cleavage site (Furin CS), ω-loop, ExxxExC motif, and aromatic stack (AS) are annotated above (d). Heatmap color indicates antibody score from 0 (blue, lowest 5% of scores) to white (1, WT) to red (increased). Black dots indicate the WT amino acid. Missing data are gray. g-i. Heatmaps showing carboxylation-sensitive FIX-specific carboxylation scores (g), carboxylation-sensitive Gla-motif carboxylation scores (h), or light chain secretion scores (i) for FIX Gla domain variants. ω-loop, ExxxExC motif, and aromatic stack (AS) are annotated above (g). Heatmap color indicates antibody score from 0 (blue, lowest 5% of scores) to white (1, WT) to red (increased). Black dots indicate the WT amino acid. Missing data are gray.

Extended Data Figure 8: Clinical correlates of secretion and gamma-carboxylation scores map to FIX biochemical features.

Extended Data Figure 8:

a. Scatter plot of the mean and standard error of light chain secretion scores (n = 2 replicates) and FIX plasma antigen from individuals with hemophilia B in the EAHAD database (n = 416 variants). Light chain epitope-adjacent positions identified in Extended Data Fig. 4a are removed (n = 19 variants across 38 individuals)11. Dashed horizontal line is 40% FIX plasma antigen. Dashed vertical line is the 5th percentile of the synonymous secretion score distribution. b. Comparison of hemophilia B severity from individuals with hemophilia B in the EAHAD database (n = 1,781 variants) with light chain secretion scores. Light chain epitope-adjacent positions identified in Extended Data Fig. 4a are removed (n = 40 variants). Violin plot shows distribution of points with an inset box plot representing the 25th, 50th, and 75th percentiles. Whiskers span the range of data. Dashed horizontal line is the 5th percentile of the synonymous secretion score distribution. p values from a Kruskal–Wallis test adjusted for multiple comparisons by post-hoc Dunn’s test are shown. c. Scatter plot of the mean and standard error of light chain secretion scores (n = 2 replicates) and FIX plasma antigen from individuals harboring gain-of-cysteine variants in the EAHAD database (n = 9 variants across 27 individuals)11. Dashed horizontal line is 40% FIX plasma antigen. Dashed vertical line is the 5th percentile of the synonymous secretion score distribution. d. Bar plot of hemophilia B disease severity in the EAHAD database for individuals harboring gain-of-cysteine variants. e. Bar plot of the number of FIX variants in the EAHAD database and their classification using the random forest model trained on MultiSTEP functional data, by disease severity. Color indicates model prediction. f. Bar plot of the number of FIX propeptide and Gla domain variants in the EAHAD database and their classification using the random forest model trained on MultiSTEP functional data, by disease severity. Color indicates model prediction.

Extended Data Figure 9: Random forest model predictions for FIX variants in the EAHAD FIX Variant Database associated with hemophilia B.

Extended Data Figure 9:

a. Spearman correlation of MultiSTEP functional scores with EVE, AlphaMissense, REVEL, and CADD variant effect predictors. b. Histograms of four variant effect predictor scores for F9 missense variants of known effect curated from ClinVar, gnomAD, and MLOF. Color indicates clinical variant interpretation. Data from four variant effect predictors are shown. Black dashed vertical lines indicate the thresholds for each predictor. For AlphaMissense we used the thresholds recommended in the original publication for 90% precision on existing ClinVar annotated variants (≤0.34: benign, 0.34–0.564: uncertain, ≥0.564: pathogenic). For REVEL, we used the thresholds used in the initial publication to assess REVEL’s precision in ClinVar (<0.5: benign, 0.5: uncertain, >0.5 pathogenic). For EVE, we used the thresholds recommended in the original publication for the 75% most confident classifications (≤0.359: benign, 0.359–0.641: uncertain, ≥0.641: pathogenic). For CADD, we used the same thresholds used in the MLOF clinical laboratory (<10: benign, 10–20: uncertain, >20: pathogenic). Number of variants scored by each predictor is annotated. c. Classification accuracy for F9 missense variants of known effect curated from ClinVar, gnomAD, and MLOF in our test set (benign/likely benign, n = 4 variants; pathogenic/likely pathogenic, n = 34 variants) by MultiSTEP variant function classifier and the four variant effect predictors using thresholds defined in (b). True benign/likely benign and pathogenic/likely pathogenic labels are denoted on the x-axis, and columns are colored relative to the classification for each method. Solid colors indicate correct classification, whereas striped colors indicate incorrect classification. For variant effect predictors, missing variants are colored gray with stripes and uncertain predictions are colored yellow with stripes. PPV: positive predictive value; NPV: negative predictive value; Spec: specificity; Sens: sensitivity.

Extended Data Figure 10: Detection of cell-surface displayed FVIII.

Extended Data Figure 10:

Experimental flow cytometry of B-domain deleted coagulation factor VIII (FVIII) in the MultiSTEP backbone (n = ~30,000 cells per variant). Unrecombined cells (NC) do not display FVIII and serve as a negative control. Fluorescent signal was generated by staining cells with anti-FVIII antibodies specific to each of the five FVIII domains in the heavy chain [A1 (a) and A2 (b)] and light chain [A3 (c), C1 (d), and C2 (e)].

Supplementary Material

Supplementary Table 2
Supplementary Table 1
Supplementary Table 5
Supplementary Table 4
Supplementary Table 3
Supplementary Table 6
Supplementary Table 7
Supplementary Table 8
Supplementary Table 9
Supplementary Table 10
Supplementary Table 11
Supplementary Table 12
Supplementary Figures 1 and 2
Source Data for Supplementary Figures 1 and 2
Source Data for Main and Extended Data Figures

Acknowledgements

We thank A.P. Leith, C. Lee, D.E. Prunkard, and A. Silvestroni of the UW Foege Flow Lab and the UW Pathology Flow Cytometry Core for their assistance with cell analysis, staining, and sorting; K.M. Munson of the UW PacBio Sequencing Service for assistance with long-read PacBio sequencing; D.A. Nickerson, L.M. Starita, D.J. Maly, S. Nariya, J.J. Stephany, and A.E. McEwen for advice on analyzing data and feedback on the manuscript. We thank S.W. Pipe and A. Scheller of the University of Michigan Department of Pediatrics and Department of Hematology for providing FVIII constructs and advice on FVIII expression. We thank J. Kulman for discussions of FIX carboxylation. We thank R. Kruse-Jarres for her commitment to support research that improves the lives of people living with bleeding disorders. We thank and acknowledge B.A. Konkle the PI of MLOF, the MLOF partners at Bloodworks, the American Thrombosis and Hemostasis Network, the National Hemophilia Foundation (now the National Bleeding Disorders Foundation), funding from Biogen/Bioverativ, providers and staff at HTC sites, and the 11,341 participants who made MLOF a success. This work was supported by the National Heart, Lung, and Blood Institute (R01HL152066 to J.M.J. and D.M.F., F30HL151075 to N.A.P., and R01HL149855 to J.P.S.), the National Human Genome Research Institute (RM1HG010461 and UM1HG011969 to D.M.F.), the National Institute of General Medical Sciences (R01GM109110 to D.M.F.), and the Washington Center for Bleeding Disorders (to J.M.J).

Footnotes

Competing Interests

The authors declare the following competing interests: J.P.S. was an expert witness for Genentech and Paul, Weiss, Rifkind, Wharton and Garrison LLP. The funders of this work had no role in the study design, data collection, analysis, decision to publish, or preparation of this manuscript. All other authors declare no competing interests.

Data availability

VAMP-seq abundance scores for PTEN (urn:mavedb:00000013-a-1), TPMT (urn:mavedb:00000013-b-1), VKOR (urn:mavedb:00000078-b-1), CYP2C9 (urn:mavedb:00000095-b-1), and NUDT15 (urn:mavedb:00000055-a-1) were downloaded from MaveDB107. Gla domain protein sequences for human coagulation factor IX (P00740), human prothrombin (coagulation factor II, P00734), human coagulation factor VII (P08709), human coagulation factor X (P00742), human protein C (P04070), human protein S (P07225), human osteocalcin (bone gla-protein, P02818), bovine osteocalcin (bone Gla-protein, P02820), human growth arrest-specific protein 6 (P14393), Pseudechis prophyriacus venom prothrombin activator porpharin-D (P58L93), Notechis scutatis venom prothrombin activator notecarin-D1 (P82807), and Oxyuranus scutellatus venom prothrombin activator oscutarin-C (P58L96) were downloaded from UniProt.

ClinVar variants are publicly available at https://www.ncbi.nlm.nih.gov/clinvar/. gnomAD v4.1 variants are available at https://gnomad.broadinstitute.org/. MLOF variants have been previously deposited into the EAHAD FIX clinical database (https://dbs.eahad.org/FIX), the CDC CHBMP database (https://www.cdc.gov/hemophilia/mutation-project/index.html), and published11,12. A complete set of MLOF variants used in this study, along with relevant information about these variants, are provided in Supplementary Table 4.

F9 variant scores are available in Supplementary Table 12 and at MaveDB (www.https://www.mavedb.org/urn:mavedb:00001200). Raw sequencing, barcode-variant maps, and scores are available in the NCBI Gene Expression Omnibus (GEO) repository (GSE242805). All other data files are provided at https://github.com/FowlerLab/2024_multistep. Source data are provided in this paper. All other data supporting the findings of this study are available from the corresponding author on reasonable request.

Code availability

All code to reproduce analyses and figures presented in this work are available on GitHub at https://github.com/FowlerLab/2024_multistep. Versions of R packages used for analyses are described in the code file on GitHub.

References

  • 1.Karczewski KJ et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Richards S et al. Standards and guidelines for the interpretation of sequence variants: A joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–423 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Fayer S et al. Closing the gap: Systematic integration of multiplexed functional data resolves variants of uncertain significance in BRCA1, TP53, and PTEN. Am. J. Hum. Genet. 108, 2248–2258 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Tabet D, Parikh V, Mali P, Roth FP & Claussnitzer M Scalable Functional Assays for the Interpretation of Human Genetic Variation. Annu. Rev. Genet. 56, 441–465 (2022). [DOI] [PubMed] [Google Scholar]
  • 5.Uhlén M et al. The human secretome. Sci. Signal. 12, (2019). [DOI] [PubMed] [Google Scholar]
  • 6.Freedman SJ, Furie BC, Furie B & Baleja JD Structure of the Metal-free γ-Carboxyglutamic Acid-rich Membrane Binding Region of Factor IX by Two-dimensional NMR Spectroscopy. J. Biol. Chem. 270, 7980–7987 (1995). [DOI] [PubMed] [Google Scholar]
  • 7.Freedman SJ et al. Identification of the phospholipid binding site in the vitamin K-dependent blood coagulation protein factor IX. J. Biol. Chem. 271, 16227–16236 (1996). [DOI] [PubMed] [Google Scholar]
  • 8.Shikamoto Y, Morita T, Fujimoto Z & Mizuno H Crystal structure of Mg2+- and Ca2+-bound Gla domain of factor IX complexed with binding protein. J. Biol. Chem. 278, 24090–24094 (2003). [DOI] [PubMed] [Google Scholar]
  • 9.Huang M, Furie BC & Furie B Crystal Structure of the Calcium-stabilized Human Factor IX Gla Domain Bound to a Conformation-specific Anti-factor IX Antibody. J. Biol. Chem. 279, 14338–14346 (2004). [DOI] [PubMed] [Google Scholar]
  • 10.Zacchi LF et al. Coagulation factor IX analysis in bioreactor cell culture supernatant predicts quality of the purified product. Commun Biol 4, 390 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Rallapalli PM, Kemball-Cook G, Tuddenham EG, Gomez K & Perkins SJ An interactive mutation database for human coagulation Factor IX provides novel insights into the phenotypes and genetics of hemophilia B. J. Thromb. Haemost. 11, 1329–1340 (2013). [DOI] [PubMed] [Google Scholar]
  • 12.Johnsen JM et al. Results of genetic analysis of 11 341 participants enrolled in the My Life, Our Future hemophilia genotyping initiative in the United States. J. Thromb. Haemost. 20, 2022–2034 (2022). [DOI] [PubMed] [Google Scholar]
  • 13.Konkle BA, Josephson NC & Nakaya Fletcher S Hemophilia B. in GeneReviews (eds. Pagon RA et al.) (University of Washington, Seattle, Seattle (WA), 2023). [PubMed] [Google Scholar]
  • 14.MASAC Document 273 - Recommendations on Genotyping for Persons with Hemophilia. National Hemophilia Foundation; https://www.hemophilia.org/healthcare-professionals/guidelines-on-care/masac-documents/masac-document-273-recommendations-on-genotyping-for-persons-with-hemophilia. [Google Scholar]
  • 15.Gao W et al. Characterization of missense mutations in the signal peptide and propeptide of FIX in hemophilia B by a cell-based assay. Blood Adv 4, 3659–3667 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Johnsen JM et al. Novel approach to genetic analysis and results in 3000 hemophilia patients enrolled in the My Life, Our Future initiative. Blood Advances 1, 824–834 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Matreyek KA, Stephany JJ, Chiasson MA, Hasle N & Fowler DM An improved platform for functional assessment of large protein libraries in mammalian cells. Nucleic Acids Res. 48, e1–e1 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Savoldo B et al. CD28 costimulation improves expansion and persistence of chimeric antigen receptor-modified T cells in lymphoma patients. J. Clin. Invest. 121, 1822–1826 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Esensten JH, Helou YA, Chopra G, Weiss A & Bluestone JA CD28 costimulation: From mechanism to therapy. Immunity 44, 973–988 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Liu L et al. Inclusion of Strep-tag II in design of antigen receptors for T-cell immunotherapy. Nat. Biotechnol. 34, 430–434 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Bicocchi MP et al. Insight into molecular changes of the FIX protein in a series of Italian patients with haemophilia B. Haemophilia 12, 263–270 (2006). [DOI] [PubMed] [Google Scholar]
  • 22.Matreyek KA et al. Multiplex assessment of protein variant abundance by massively parallel sequencing. Nat. Genet. 50, 874–882 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Suiter CC et al. Massively parallel variant characterization identifies NUDT15 alleles associated with thiopurine toxicity. Proc. Natl. Acad. Sci. U. S. A. 117, 5394–5401 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Amorosi CJ et al. Massively parallel characterization of CYP2C9 variant enzyme activity and abundance. Am. J. Hum. Genet. 108, 1735–1751 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Kurys G, Tagaya Y, Bamford R, Hanover JA & Waldmann TA The long signal peptide isoform and its alternative processing direct the intracellular trafficking of interleukin-15. J. Biol. Chem. 275, 30653–30659 (2000). [DOI] [PubMed] [Google Scholar]
  • 26.Owji H, Nezafat N, Negahdaripour M, Hajiebrahimi A & Ghasemi Y A comprehensive review of signal peptides: Structure, roles, and applications. Eur. J. Cell Biol. 97, 422–441 (2018). [DOI] [PubMed] [Google Scholar]
  • 27.Tikhonova EB, Karamysheva ZN, von Heijne G & Karamyshev AL Silencing of Aberrant Secretory Protein Expression by Disease-Associated Mutations. J. Mol. Biol. 431, 2567–2580 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Liaci AM et al. Structure of the human signal peptidase complex reveals the determinants for signal peptide cleavage. Mol. Cell 81, 3934–3948.e11 (2021). [DOI] [PubMed] [Google Scholar]
  • 29.Teufel F et al. SignalP 6.0 predicts all five types of signal peptides using protein language models. Nat. Biotechnol. 40, 1023–1025 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Gutierrez Guarnizo SA et al. Pathogenic signal peptide variants in the human genome. NAR Genom. Bioinform. 5, lqad093 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Braakman I & Hebert DN Protein folding in the endoplasmic reticulum. Cold Spring Harb. Perspect. Biol. 5, a013201 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Zhang H et al. Unpaired Extracellular Cysteine Mutations of CSF3R Mediate Gain or Loss of Function. Cancer Res. 77, 4258–4267 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Woodard DR et al. A loss-of-function cysteine mutant in fibulin-3 (EFEMP1) forms aberrant extracellular disulfide-linked homodimers and alters extracellular matrix composition. Hum. Mutat. 43, 1945–1955 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Chiasson MA et al. Multiplexed measurement of variant abundance and activity reveals VKOR topology, active site and human variant impact. Elife 9, (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Yariv B et al. Using evolutionary data to make sense of macromolecules with a ‘face-lifted’ ConSurf. Protein Sci. 32, e4582 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Huang M et al. Structural basis of membrane binding by Gla domains of vitamin K-dependent proteins. Nat. Struct. Biol. 10, 751–756 (2003). [DOI] [PubMed] [Google Scholar]
  • 37.Grant MA, Baikeev RF, Gilbert GE & Rigby AC Lysine 5 and phenylalanine 9 of the factor IX omega-loop interact with phosphatidylserine in a membrane-mimetic environment. Biochemistry 43, 15367–15378 (2004). [DOI] [PubMed] [Google Scholar]
  • 38.Feuerstein Giora Z et al. Antithrombotic efficacy of a novel murine antihuman factor IX antibody in rats. Arterioscler. Thromb. Vasc. Biol. 19, 2554–2562 (1999). [DOI] [PubMed] [Google Scholar]
  • 39.Aktimur A, Gabriel MA, Gailani D & Toomey JR The Factor IX γ-Carboxyglutamic Acid (Gla) Domain Is Involved in Interactions between Factor IX and Factor XIa. J. Biol. Chem. 278, 7981–7987 (2003). [DOI] [PubMed] [Google Scholar]
  • 40.Brown MA, Stenberg LM, Persson U & Stenflo J Identification and Purification of Vitamin K-dependent Proteins and Peptides with Monoclonal Antibodies Specific for γ-Carboxyglutamyl (Gla) Residues. J. Biol. Chem. 275, 19795–19802 (2000). [DOI] [PubMed] [Google Scholar]
  • 41.Whitlon DS, Sadowski JA & Suttie JW Mechanism of coumarin action: significance of vitamin K epoxide reductase inhibition. Biochemistry 17, 1371–1377 (1978). [DOI] [PubMed] [Google Scholar]
  • 42.Rabiet MJ, Jorgensen MJ, Furie B & Furie BC Effect of propeptide mutations on post-translational processing of Factor IX. Evidence that beta-hydroxylation and gamma-carboxylation are independent events. J. Biol. Chem. 262, 14895–14898 (1987). [PubMed] [Google Scholar]
  • 43.Furie B & Furie BC Molecular basis of vitamin K-dependent gamma-carboxylation. Blood 75, 1753–1762 (1990). [PubMed] [Google Scholar]
  • 44.Gillis S et al. γ-Carboxyglutamic acids 36 and 40 do not contribute to human Factor IX function. Protein Sci. 6, 185–196 (1997). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Stenina O, Pudota BN, McNally BA, Hommema EL & Berkner KL Tethered processivity of the vitamin K-dependent carboxylase: factor IX is efficiently modified in a mechanism which distinguishes Gla’s from Glu’s and which accounts for comprehensive carboxylation in vivo. Biochemistry 40, 10301–10309 (2001). [DOI] [PubMed] [Google Scholar]
  • 46.Bristol JA, Freedman SJ, Furie BC & Furie B Profactor IX: The propeptide inhibits binding to membrane surfaces and activation by factor XIA. Biochemistry 33, 14136–14143 (1994). [DOI] [PubMed] [Google Scholar]
  • 47.Wolberg AS et al. Characterization of γ-carboxyglutamic acid residue 21 of human Factor IX. Biochemistry 35, 10321–10327 (1996). [DOI] [PubMed] [Google Scholar]
  • 48.Wojcik EG, Van Den Berg M, Poort SR & Bertina RM Modification of the N-terminus of human factor IX by defective propeptide cleavage or acetylation results in a destabilized calcium-induced conformation: effects on phospholipid binding and activation by factor XIa. Biochem. J 323 ( Pt 3), 629–636 (1997). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Ware J et al. Factor IX San Dimas. Substitution of glutamine for Arg-4 in the propeptide leads to incomplete γ-carboxylation and altered phospholipid binding properties. J. Biol. Chem. 264, 11401–11406 (1989). [PubMed] [Google Scholar]
  • 50.Liebman HA The metal-dependent conformational changes in factor IX associated with phospholipid binding. Studies using antibodies against a synthetic peptide and chemical modification of factor IX. Eur. J. Biochem. 212, 339–345 (1993). [DOI] [PubMed] [Google Scholar]
  • 51.Jacobs M, Freedman SJ, Furie BC & Furie B Membrane binding properties of the factor IX gamma-carboxyglutamic acid-rich domain prepared by chemical synthesis. J. Biol. Chem. 269, 25494–25501 (1994). [PubMed] [Google Scholar]
  • 52.Agah S & Bajaj SP Role of magnesium in factor XIa catalyzed activation of factor IX: calcium binding to factor IX under physiologic magnesium. J. Thromb. Haemost. 7, 1426–1428 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Westmark PR, Tanratana P & Sheehan JP Selective disruption of heparin and antithrombin-mediated regulation of human factor IX. J. Thromb. Haemost. 13, 1053–1063 (2015). [DOI] [PubMed] [Google Scholar]
  • 54.Plautz WE et al. Anticoagulant Protein S Targets the Factor IXa Heparin-Binding Exosite to Prevent Thrombosis. Arterioscler. Thromb. Vasc. Biol. 38, 816–828 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Cooley B et al. Dysfunctional endogenous FIX impairs prophylaxis in a mouse hemophilia B model. Blood 133, 2445–2451 (2019). [DOI] [PubMed] [Google Scholar]
  • 56.Landrum MJ et al. ClinVar: Public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 42, D980–D985 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Iorio A et al. Establishing the Prevalence and Prevalence at Birth of Hemophilia in Males: A Meta-analytic Approach Using National Registries. Ann. Intern. Med. 171, 540–546 (2019). [DOI] [PubMed] [Google Scholar]
  • 58.Ioannidis NM et al. REVEL: An ensemble method for predicting the pathogenicity of rare missense variants. Am. J. Hum. Genet. 99, 877–885 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Frazer J et al. Disease variant prediction with deep generative models of evolutionary data. Nature 599, 91–95 (2021). [DOI] [PubMed] [Google Scholar]
  • 60.Rentzsch P, Schubach M, Shendure J & Kircher M CADD-Splice-improving genome-wide variant effect prediction using deep learning-derived splice scores. Genome Med. 13, 31 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Cheng J et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 381, eadg7492 (2023). [DOI] [PubMed] [Google Scholar]
  • 62.Livesey BJ & Marsh JA Updated benchmarking of variant effect predictors using deep mutational scanning. Mol. Syst. Biol. 19, e11474 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Tavtigian SV et al. Modeling the ACMG/AMP variant classification guidelines as a Bayesian classification framework. Genet. Med. 20, 1054–1060 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Brnich SE et al. Recommendations for application of the functional evidence PS3/BS3 criterion using the ACMG/AMP sequence variant interpretation framework. Genome Med. 12, 1–12 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Fokkema IFAC et al. LOVD v.2.0: the next generation in gene variant databases. Hum. Mutat. 32, 557–563 (2011). [DOI] [PubMed] [Google Scholar]
  • 66.Matamala N et al. Characterization of novel missense variants of SERPINA1 gene causing alpha-1 antitrypsin deficiency. Am. J. Respir. Cell Mol. Biol. 58, 706–716 (2018). [DOI] [PubMed] [Google Scholar]
  • 67.McVey JH et al. The European Association for Haemophilia and Allied Disorders (EAHAD) Coagulation Factor Variant Databases: Important resources for haemostasis clinicians and researchers. Haemophilia 26, 306–313 (2020). [DOI] [PubMed] [Google Scholar]
  • 68.Seixas S & Marques PI Known mutations as the cause of alpha-1 antitrypsin deficiency: an updated overview of SERPINA1 variation spectrum. Appl. Clin. Genet. 14, 173–194 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Gai SA & Wittrup KD Yeast surface display for protein engineering and characterization. Curr. Opin. Struct. Biol. 17, 467–473 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Salema V & Fernández LÁ Escherichia coli surface display for the selection of nanobodies. Microb. Biotechnol 10, 1468–1484 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Ho M & Pastan I Mammalian cell display for antibody engineering. in Therapeutic Antibodies: Methods and Protocols (ed. Dimitrov AS) 337–352 (Humana Press, Totowa, NJ, 2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Frank F et al. Deep mutational scanning identifies SARS-CoV-2 Nucleocapsid escape mutations of currently available rapid antigen tests. Cell 185, 3603–3616.e13 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Parthiban K et al. A comprehensive search of functional sequence space using large mammalian display libraries created by gene editing. MAbs 11, 884–898 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Vink T, Oudshoorn-Dickmann M, Roza M, Reitsma J-J & de Jong RN A simple, robust and highly efficient transient expression system for producing antibodies. Methods 65, 5–10 (2014). [DOI] [PubMed] [Google Scholar]
  • 75.do Amaral RLF et al. Approaches for recombinant human factor IX production in serum-free suspension cultures. Biotechnol. Lett. 38, 385–394 (2016). [DOI] [PubMed] [Google Scholar]
  • 76.Duportet X et al. A platform for rapid prototyping of synthetic gene networks in mammalian cells. Nucleic Acids Res. 42, 13440–13451 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Zhu F et al. DICE, an efficient system for iterative genomic editing in human pluripotent stem cells. Nucleic Acids Res. 42, e34 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Matreyek KA, Stephany JJ & Fowler DM A platform for functional assessment of large variant libraries in mammalian cells. Nucleic Acids Res. e102 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Starita LM et al. A Multiplex Homology-Directed DNA Repair Assay Reveals the Impact of More Than 1,000 BRCA1 Missense Substitution Variants on Protein Function. The American Journal of Human Genetics 103, 498–508 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Hasle N et al. High-throughput, microscope-based sorting to dissect cellular heterogeneity. Mol. Syst. Biol. 16, e9442 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Low BE, Hosur V, Lesbirel S & Wiles MV Efficient targeted transgenesis of large donor DNA into multiple mouse genetic backgrounds using bacteriophage Bxb1 integrase. Sci. Rep. 12, 5424 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Durrant MG et al. Systematic discovery of recombinases for efficient integration of large DNA sequences into the human genome. Nat. Biotechnol. 41, 488–499 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Zhang M et al. SHIELD: a platform for high-throughput screening of barrier-type DNA elements in human cells. Nat. Commun. 14, 5616 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Aslanzadeh V et al. Deep mutational scanning of the human insulin receptor ectodomain to inform precision therapy for insulin resistance. bioRxiv 2024.09.07.611782 (2024) doi: 10.1101/2024.09.07.611782. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Blanch-Asensio A et al. STRAIGHT-IN Dual: a platform for dual, single-copy integrations of DNA payloads and gene circuits into human induced pluripotent stem cells. bioRxiv 2024.10.17.616637 (2024) doi: 10.1101/2024.10.17.616637. [DOI] [Google Scholar]
  • 86.Boyle GE et al. Deep mutational scanning of CYP2C19 in human cells reveals a substrate specificity-abundance tradeoff. Genetics 228, iyae156 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Hew BE et al. Directed evolution of hyperactive integrases for site specific insertion of transgenes. Nucleic Acids Res. 52, e64 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Hong CKY et al. Massively parallel characterization of insulator activity across the genome. Nat. Commun. 15, 8350 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Huhtinen O, Prince S, Lamminmäki U, Salbo R & Kulmala A Increased stable integration efficiency in CHO cells through enhanced nuclear localization of Bxb1 serine integrase. BMC Biotechnol. 24, 44 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Kent JD, Klug LR & Heinrich MC A novel human SDHA-knockout cell line model for the functional analysis of clinically relevant SDHA variants. Clin. Cancer Res. 30, 5399–5412 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Kim J, Muller RY, Bondra ER & Ingolia NT CRISPRi with barcoded expression reporters dissects regulatory networks in human cells. bioRxiv 2024.09.06.611573 (2024) doi: 10.1101/2024.09.06.611573. [DOI] [Google Scholar]
  • 92.Pandey S et al. Efficient site-specific integration of large genes in mammalian cells via continuously evolved recombinases and prime editing. Nat. Biomed. Eng 1–18 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Wang Z, Sarkar A & Ge X De novo functional discovery of peptide-MHC restricted CARs from recombinase-constructed large-diversity monoclonal T cell libraries. bioRxiv 2024.11.27.625413 (2024) doi: 10.1101/2024.11.27.625413. [DOI] [Google Scholar]
  • 94.Acharya P, Quinlan A & Neumeister V The ABCs of finding a good antibody: How to find a good antibody, validate it, and publish meaningful data. F1000Res. 6, 851 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

Methods-only references

  • 95.Gibson DG et al. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat. Methods 6, 343–345 (2009). [DOI] [PubMed] [Google Scholar]
  • 96.García-Nafría J, Watson JF & Greger IH IVA cloning: A single-tube universal cloning system exploiting bacterial In Vivo Assembly. Sci. Rep. 6, 1–12 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.den Dunnen JT et al. HGVS recommendations for the description of sequence variants: 2016 update. Hum. Mutat. 37, 564–569 (2016). [DOI] [PubMed] [Google Scholar]
  • 98.Miao HZ et al. Bioengineering of coagulation factor VIII for improved secretion. Blood 103, 3412–3419 (2004). [DOI] [PubMed] [Google Scholar]
  • 99.Kessler CM et al. B-domain deleted recombinant factor VIII preparations are bioequivalent to a monoclonal antibody purified plasma-derived factor VIII concentrate: a randomized, three-way crossover study. Haemophilia 11, 84–91 (2005). [DOI] [PubMed] [Google Scholar]
  • 100.Ward NJ et al. Codon optimization of human factor VIII cDNAs leads to high-level expression. Blood 117, 798–807 (2011). [DOI] [PubMed] [Google Scholar]
  • 101.Zhang J, Kobert K, Flouri T & Stamatakis A PEAR: a fast and accurate Illumina Paired-End reAd mergeR. Bioinformatics 30, 614–620 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Li H Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv [q-bio.GN] (2013). [Google Scholar]
  • 103.Yeh C-LC, Amorosi CJ, Showman S & Dunham MJ PacRAT: a program to improve barcode-variant mapping from PacBio long reads using multiple sequence alignment. Bioinformatics 38, 2927–2929 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Kulkarni R et al. Sites of initial bleeding episodes, mode of delivery and age of diagnosis in babies with haemophilia diagnosed before the age of 2 years: A report from the Centers for Disease Control and Prevention’s (CDC) Universal Data Collection (UDC) project. Haemophilia 15, 1281–1290 (2009). [DOI] [PubMed] [Google Scholar]
  • 105.Majithia AR et al. Prospective functional classification of all possible missense variants in PPARG. Nat. Genet. 48, 1570–1575 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.Menardi G & Torelli N Training and assessing classification rules with imbalanced data. Data Min. Knowl. Discov. 28, 92–122 (2014). [Google Scholar]
  • 107.Esposito D et al. MaveDB: an open-source platform to distribute and interpret data from multiplexed assays of variant effect. Genome Biol. 20, 223 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.Baron M et al. The three-dimensional structure of the first EGF-like module of human factor IX: comparison with EGF and TGF-alpha. Protein Sci. 1, 81–90 (1992). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Johnson DJD, Langdown J & Huntington JA Molecular basis of factor IXa recognition by heparin-activated antithrombin revealed by a 1.7-Å structure of the ternary complex. Proceedings of the National Academy of Sciences 107, 645–650 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.UniProt Consortium. UniProt: The universal protein knowledgebase in 2025. Nucleic Acids Res. 53, D609–D617 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111.Edgar RC MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5, 113 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Table 2
Supplementary Table 1
Supplementary Table 5
Supplementary Table 4
Supplementary Table 3
Supplementary Table 6
Supplementary Table 7
Supplementary Table 8
Supplementary Table 9
Supplementary Table 10
Supplementary Table 11
Supplementary Table 12
Supplementary Figures 1 and 2
Source Data for Supplementary Figures 1 and 2
Source Data for Main and Extended Data Figures

Data Availability Statement

VAMP-seq abundance scores for PTEN (urn:mavedb:00000013-a-1), TPMT (urn:mavedb:00000013-b-1), VKOR (urn:mavedb:00000078-b-1), CYP2C9 (urn:mavedb:00000095-b-1), and NUDT15 (urn:mavedb:00000055-a-1) were downloaded from MaveDB107. Gla domain protein sequences for human coagulation factor IX (P00740), human prothrombin (coagulation factor II, P00734), human coagulation factor VII (P08709), human coagulation factor X (P00742), human protein C (P04070), human protein S (P07225), human osteocalcin (bone gla-protein, P02818), bovine osteocalcin (bone Gla-protein, P02820), human growth arrest-specific protein 6 (P14393), Pseudechis prophyriacus venom prothrombin activator porpharin-D (P58L93), Notechis scutatis venom prothrombin activator notecarin-D1 (P82807), and Oxyuranus scutellatus venom prothrombin activator oscutarin-C (P58L96) were downloaded from UniProt.

ClinVar variants are publicly available at https://www.ncbi.nlm.nih.gov/clinvar/. gnomAD v4.1 variants are available at https://gnomad.broadinstitute.org/. MLOF variants have been previously deposited into the EAHAD FIX clinical database (https://dbs.eahad.org/FIX), the CDC CHBMP database (https://www.cdc.gov/hemophilia/mutation-project/index.html), and published11,12. A complete set of MLOF variants used in this study, along with relevant information about these variants, are provided in Supplementary Table 4.

F9 variant scores are available in Supplementary Table 12 and at MaveDB (www.https://www.mavedb.org/urn:mavedb:00001200). Raw sequencing, barcode-variant maps, and scores are available in the NCBI Gene Expression Omnibus (GEO) repository (GSE242805). All other data files are provided at https://github.com/FowlerLab/2024_multistep. Source data are provided in this paper. All other data supporting the findings of this study are available from the corresponding author on reasonable request.

RESOURCES