Abstract
Protein-Protein Interactions (PPIs) are a key interface between virus and host, and these interactions are important to both viral reprogramming of the host and to host restriction of viral infection. In particular, viral-host PPI networks can be used to further our understanding of the molecular mechanisms of tissue specificity, host range, and virulence. At higher scales, viral-host PPI screening could also be used to screen for small-molecule antivirals that interfere with essential viral-host interactions, or to explore how the PPI networks between interacting viral and host genomes co-evolve. Current high-throughput PPI assays have screened entire viral-host PPI networks. However, these studies are time consuming, often require specialized equipment, and are difficult to further scale. Here, we develop methods that make larger-scale viral-host PPI screening more accessible. This approach combines the mDHFR split-tag reporter with the iSeq2 interaction-barcoding system to permit massively-multiplexed PPI quantification by simple pooled engineering of barcoded constructs, integration of these constructs into budding yeast, and fitness measurements by pooled cell competitions and barcode-sequencing. We applied this method to screen for PPIs between SARS-CoV-2 proteins and human proteins, screening in triplicate >180,000 ORF-ORF combinations represented by >1,000,000 barcoded lineages. Our results complement previous screens by identifying 74 putative PPIs, including interactions between ORF7A with the taste receptors TAS2R41 and TAS2R7, and between NSP4 with the transmembrane KDELR2 and KDELR3. We show that this PPI screening method is highly scalable, enabling larger studies aimed at generating a broad understanding of how viral effector proteins converge on cellular targets to effect replication.
Introduction
Emerging and endemic viral pathogens are a persistent threat to human health. These obligate parasites infect susceptible cells and redirect human cellular processes towards the replication and release of transmissible viral particles. Studying how this process functions at the molecular scale is important for understanding the mechanisms of pathogenesis and the determinants of cell-type susceptibility and permissibility. Protein-Protein Interactions (PPIs) are an important mechanistic factor in a viral infection, with PPIs contributing to both viral replication and host restriction of viral replication [1–4]. Thus viral-human PPIs (vhPPIs) are a complex and critical interface between virus and host where genetic variation can partly determine viral host range and virulence [5–9]. Mapping these vhPPIs will establish systematic insight into how interacting viral proteins converge on common human protein targets, cellular functions, or pathways, and identify interactions leading to therapeutic strategies [10].
A variety of techniques have been used to assay PPIs at proteome-scale. Mass-spectrometry after affinity-purification [11–13] or proximity-labeling [14] methods are perhaps most easily scalable to screening virus-host PPIs as they only require engineering a tag fusion to the "bait" viral open reading frame (ORF), but these approaches are also less sensitive to transient PPIs that may be common for regulatory interactions [15–19] or to PPIs involving membrane-associated proteins critical for the complex morphogenesis of viral replication centers or viral particles [20]. Split-reporter constructs assayed in yeast, canonically exemplified by the yeast two-hybrid assay [21–23], are another class of approaches that fuse each "tag" of a split reporter gene to a protein pair of interest such that interaction of the chosen proteins reconstitutes the reporter’s function. One such split-reporter assay is the murine dihydrofolate reductase (mDHFR) Protein-fragment Complementation Assay (PCA) [24, 25], and although the proteins of interest are assayed in a transgenic context of budding yeast, the mDHFR PCA has the advantages of being sensitive to membrane-bound proteins, reporting on transient interactions, and not requiring that proteins of interest be localized to the nucleus while not interfering with the assay’s reporter gene transcription. Yeast-based split-reporters have been quantified in the past with colony growth [26–28], but can also leverage barcode-sequencing to multiplex PPI assays within a single pooled culture and sequencing library [29–33]. Each method for screening PPI networks introduces diverse biases in detection, therefore, multiple complementary approaches are often compared to generate a more thorough functional and biochemical understanding [17, 34]. Machine-learning prediction of protein 3D structures [35, 36] has also recently been used to predict PPIs [37]. However, these models currently have limited capabilities to generalize to dynamic and disordered protein ensembles (~30% of the human proteome, [38]). These dynamic and disordered vhPPIs are especially important for understanding viral cell biology [8, 39–43], suggesting that empirical PPI screens remain important in exploring cell biology.
A challenge for scaling any approach that relies on a synthetic tag-fusion is the engineering of these DNA constructs. One advance that accelerates high-throughput construct engineering is the Gateway system [44], which was developed to easily move ORF sequences from "Entry" plasmids into "Destination" plasmids that bear the appropriate expression and assay systems. Efforts like the ORFeome Collaboration have generated a collection of Entry plasmids bearing ORFs that encode human proteins (hORFs) [45], and this collection can be easily integrated into Destination plasmids for multiple PPI screening approaches. Thus the ORFeome Collection has been key to mapping PPI networks at proteome-scale [46, 47], and efforts to generate viral ORF (vORF) collections also offer the opportunity to expand our systematic understanding of pan-viral-human PPI networks [48–50]. Current PPI assay platforms require generating or isolating each assay construct as clones. While methods have been developed to perform this cloning in high throughput [33], the overhead and resources required limits proteome-scale PPI network screening to a handful of well-resourced and specialized research groups. Modern pooled DNA engineering methods can be leveraged to integrate (or synthesize de novo) ORFs from more viruses and natural genetic variants. When combined with highly scalable barcode-based PPI assay quantification, pooled DNA engineering may permit a wider range of research groups to address research questions at unprecedented scales.
We previously reported an approach called PPIseq [29, 51] that combined the yeast mDHFR split-tag PPI assay collection [28] with our double-barcode iSeq2 approach [52] to screen ~9% of the potential yeast PPI network in 9 growth conditions. The mDHFR assay works by rescuing the growth defect of yeast upon methotrexate (MTX) treatment, and the mDHFR split-tags F[1,2] and F[3] can be fused to target proteins to assay the target protein’s proximity as a reconstitution of the full mDHFR and thus a quantitative rescue of growth [24, 28, 53–56]. We were able to leverage an existing collection of yeast strains, with mDHFR split-tags fused at the native protein locus [28], by introducing our iSeq2 barcoding system by mating and selection. This permits us to use the Cre/lox system and an artificial-intron split-marker selection to generate recombinants where previously known barcodes can be cheaply quantified in pools by amplicon short-read Illumina sequencing. However, this approach required that tagged-protein haploid strains are generated and maintained in clonal arrays where each position’s genotype is previously known. This approach bears both the assay tagged-protein construct and the double-barcode locus on the chromosome, and, although these may have advantages over assay constructs expressed from an episome [57, 58], chromosomally-integrated reporters and barcodes are more challenging to engineer and assay with amplicon sequencing. Plasmid-based engineering and quantification has proven successful for various reporters in yeast expression, including the mDHFR split-tag [33], and we sought to offer an alternative strategy of pooled library generation and assay using the iSeq2 interaction-barcoding system.
Here, we expand the utility of the PPIseq system by leveraging pooled engineering techniques to screen for PPIs amongst libraries of heterologous ORFs from Gateway ORFeome plasmid pools. We develop and demonstrate this approach by generating and annotating a pool of mDHFR-tagged and DNA-barcoded ORF expression plasmids bearing ORFs encoding human or SARS-CoV-2 proteins. We then assayed this PPIseq strain collection to screen for PPIs between ~9,000 human ORFs and ~30 SARS-CoV-2 ORFs in biological triplicate. We found that this method largely agrees with, but is, as expected, complementary to other PPI screening approaches, and finds potential novel interactions between proteins of SARS-CoV-2 and human proteins involved in, for example, ER retention and taste receptors.
Results
Scaling PPIseq to efficiently screen new ORF libraries
A major bottleneck to PPI screening assays is the engineering of the DNA constructs to be tested. To overcome this challenge, we aimed to develop a PPI screening platform that utilizes pooled DNA engineering techniques. We generate each mDHFR-tagged ORF expression construct on a plasmid, integrate a random barcode into this plasmid, and use long-read sequencing to annotate an expression construct with the physically-linked barcodes (Fig 1A). We use our iSeq 2.0 system [52] to integrate these plasmids into genomic landing pads of two complementary yeast strains, and then combine them onto the same chromosome via yeast mating and recombination between homologous chromosomes. Once combined on the same chromosome, the barcodes from each plasmid are in close enough proximity to be PCR amplified and sequenced on a paired-end Illumina read (Fig 2A). Because all steps of this process are performed in pools, it is, in theory, highly scalable.
Fig 1. Complex PPIseq libraries were generated from plasmid pools using a scalable high-throughput workflow.
A) Diagram of the workflow. 1) An in vitro recombinase integrates ORFs from an Entry vector pool into Gateway Destination plasmids, 2) the plasmids are digested at a I-SceI restriction site, and 3) in vitro recombination and cloning repairs plasmid with a lox site, barcode, and priming site, 4) Linearized plasmid pools are sequenced by long-read sequencing to determine the identity of the barcode and ORF integrated in each plasmid, 5) plasmids are "landed" into the yeast iSeq 2.0 genomic "landing pads" using transformation and Cre recombinase, 6) haploid pools of yeast are mated and Cre recombinase recombines the loci to form a double-barcode locus, and 7) the reconstituted split-URA3 marker is selected for to select the final double-barcoded diploid pool, ready to assay. B) A scanned image of double-barcoded diploid strains grown on media selective for the mDHFR reporter (SC-URA+100ug/mL MTX, raw image S1 Fig). ORF combinations are indicated on the margins, with F[1,2]-tagged ORF named along columns and F[3]-tagged ORF named along rows. TRIM15, AKT1, Bim, Bad, Noxa, UBE2S, and Mcl-1 are from the human ORFeome v8.1 collection, LEU2 from the yeast ORFeome collection, and the mDHFR F[1,2] and mDHFR F[3] tagged ORFs were generated here (Methods). C) A complex library of barcoded and mDHFR-tagged ORFs can be generated by transformation. The distribution of the number of unique barcoded plasmids detected per ORF, for human and viral ORF plasmid libraries, is plotted as lines and points, with each distribution’s mean and median denoted by vertical lines. D) A complex library of ORF combinations can be generated by mating. A Sankey diagram illustrates that most (~60%) of the possible (~350k) diploid double-barcode PPI-reporter ORF-ORF combinations that can be generated are observed to be represented by at least three barcode-replicate lineages in the initial library sample.
Fig 2. PPIseq enables reproducible screening of a complex PPI reporter library.
A) Schematic of the assay workflow. B) Examples of growth trajectories of each type of control strain generated in this pool, where points and lines denote the relative abundance of a lineage. For each control strain, a randomly selected lineage is shown from each of the three replicate cultures (line style). X-axis denotes the time point when the sample was taken. C) Violin plots of the relative fitness for experimental (vORF + hORF) and control (all other) lineages, separated by lineage type. Dot plots of the underlying data are shown for some lineage types. Lineages with low counts and outliers are excluded (Results, Methods). Horizontal line is the median of each lineage type’s fitnesses, with linetype corresponding to the lineage type. D) The mean fitness of the population as estimated by FitSeq. E) Heatmap of the fitness of each PPI assayed, averaged across all three replicates. Note that NSP2 is adjusted to mitigate bias, see Results. F,G,H) Fitness of lineages across assay replicates. High fitness lineages (putative PPIs) are correlated, but not low fitness lineages (no PPI). I) The Pearson R2 (y-axis) of the fitness of a PPIseq lineage across growth replicates, calculated for all lineages with fitnesses above a particular threshold (x-axis). Low fitness lineages (no PPI) appear uncorrelated, while high fitness lineages (putative PPIs) are well correlated. J) A Sankey diagram showing how the lineage fitness results were filtered to detect PPIs. We required a fitness call in at least three lineages representing the same ORF combination to call a PPI.
To develop this pooled construction and screening approach, we adapted the iSeq 2.0 pBAR4 and pBAR5 plasmids [52] to express an integrated ORF as an in-frame fusion with the mDHFR F[1,2] or F[3] tags [24, 28, 53, 59] from a TDH3 promoter. This design uses the Gateway system [44] to insert ORFs lacking a stop codon from compatible Gateway Entry ORF libraries [45, 48]. We then developed methods to insert a random DNA barcode (N30) from a highly complex pool of oligos after this step. Inserting barcodes by this method minimizes the "collision" of one barcode being associated with multiple ORFs. To enable future studies screening small-molecule perturbations of PPIs (not described here), we modified our base yeast landing-pad strains [52] to delete four ABC transporter genes that are associated with sensitivity of yeast to small-molecules: PDR1, PDR3, PDR5, and SNQ2 [60]. To enable adequate sampling of chromosomally integrated double-barcodes during amplification from high-complexity barcoded cell libraries, we developed a genomic DNA extraction (S2 File) and library-prep PCR (S3 File) protocol to enable cheap and efficient PCR of double-barcodes from large inputs of quantities of genomic DNA. This amplicon design incorporates a four barcode amplicon design [61, 62] to allow scalable sample multiplexing while bioinformatically detecting chimeras generated by the Illumina sequencing workflow [63, 64].
To determine whether PPIs could still be detected in this modified platform, we constructed and tested the following controls: 1) a positive control mDHFR F[1,2]-F[3] fusion such that the covalent linkage between tags maximally rescues growth of yeast upon MTX addition, 2) a negative control F[1,2]-F[1,2] fusion that is incapable of such a growth rescue, and 3) each negative control mDHFR F[1,2] or F[3] tag fused to only a methionine codon (a NULL construct). These negative control NULL-F[1,2] and NULL-F[3] constructs serve to control for spurious interaction between proteins and the mDHFR tags [34, 65]. We tested the PPI assay (Fig 1B) for these controls as well as all pairwise combinations amongst several human ORFs (TRIM15, AKT1, BCL2L11, Bad, Noxa, UBE2S, Mcl-1) and the ORF from yeast LEU2 (Methods). As expected, none of the negative controls showed growth in the presence of MTX. However, we observed growth, indicating a likely PPI, for the Leu2p homodimer [66], the TRIM15 homodimer [67], and of BCL2L11 and Mcl-1 [68]. Five other potential interactions indicated in the Intact PPI database did not result in growth detectable in this plate-based assay (S15 Table), although the occurrence of false-negatives is consistent with other high-throughput PPI assays [17].
Pooled generation of PPIseq libraries
The SARS-CoV-2 pandemic spurred the application of a variety of high-throughput PPI assays to map out vhPPIs of SARS-CoV-2 and human proteins. So to test and compare our approach to other genome-wide surveys we screened for PPIs between human and SARS-CoV-2 proteins. We used Gateway Entry plasmids as sources of human ORFs (hORFs) and viral ORFs (vORFs) to the screen. We grew and pooled the human ORFeome v8.1 collection, and for SARS-CoV-2 ORFs we purchased the plasmid collection generated by Kim et al. [69]. This collection was generated from synthetic DNA and included 28 complete annotated coding sequences and 8 additional fragments or mutants thereof (S2 Table). hORFs and vORFs were moved from their pooled Gateway Entry vectors into our base PPIseq assay plasmids, the appropriate lox cis-element (loxP/66 or lox5171/66) and 30bp random DNA barcode was integrated, and this library was cloned at ~3-6x coverage of colonies per ORF. vORFs were inserted into the F[3]-tagged vector (pSL737, S5 File) and hORFs into the F[1,2]-tagged vector (pSL51 S4 File).
A crucial aspect of a barcode-sequencing approach is to efficiently map which barcode represents which genetic variant, so we sought to determine the barcode-ORF map by using long-read sequencing. We linearized plasmid pools using restriction digest (I-SceI) and generated sequencing datasets using both PacBio and Nanopore technologies. We devised a bioinformatic Nextflow pipeline to trim and extract the 30p barcode from these raw reads, separate out reads that contain each barcode, and use these reads to generate and polish a consensus ORF sequence integrated into the plasmid with this barcode. We generated a mapping of 9,191 F[1,2]-tagged hORFs represented by 35,191 barcodes and 36 F[3]-tagged vORFs represented by 608 barcodes, each with the ORF sequence verified as correct or a synonymous codon change (S4 and S5 Tables). By comparing the small Pacbio dataset and the more extensive Nanopore dataset for the barcoded hORF plasmids, we found that only four barcodes mapped to multiple ORFs amongst the >16,000 barcoded plasmids seen by both annotations runs, and excluded these four barcodes from downstream analysis.
We then integrated each plasmid pool into the appropriate yeast strain, choosing the MATa ySL507 for the vORFs tagged with F[3] and the MATalpha ySL508 for the hORFs tagged with F[1,2]. While our routine use of the published PEG/LiAc protocol [70] for yeast transformation was sufficient to integrate the small vORF library at high coverage (estimated as 25-196x yeast transformant colonies per barcoded vORF plasmid), we found that the efficiency of iSeq plasmid integration would be limiting for transforming the larger hORF plasmid library of >35,000 barcoded plasmids. We systematically tried variations on the published PEG/LiAc transformation protocol and found that recovering the heat-stressed cells in glucose-containing media without a nitrogen source was associated with a noticeable increase in transformants per ug of plasmid (S12 Fig), theoretically consistent with the role of nutrient shifts in triggering transporter endocytosis as part of the endocytosis hypothesis for the mechanism of PEG/LiAc transformation [71–75]. Using this modified protocol (S1 File) we found an approximately 2.5x fold improvement in transformant colonies per ug of plasmid, and were able to generate a library of approximately 337,000 genomic integrants of the barcoded F[1,2]-tagged hORF plasmids.
Screening for human and SARS-CoV-2 ORF protein-protein interactions
With complimentary pools of yeast with tagged hORFs or tagged vORFs integrated into yeast genomic landing pads, we generated all-by-all combinations by pooled mating. We also generated pools of positive and negative controls by mating small (8–10 distinct barcoded isolates) haploid yeast pools with control constructs (S6 Table). Mating the pools of NULL-F[3] or NULL-F[1,2] haploid strains together generated double-negative controls (NULL-F[3] x NULL-F[1,2]) with no resistance to MTX, while a positive control was generated by integrating the F[1,2] tag into the F[3]-tagged expression context—generating a covalent F[1,2]-F[3] fusion with 100% co-localization of the mDHFR tags. The hORF and vORF pools were similarly mated to the NULL-F[1,2] or NULL-F[3] construct pool to detect promiscuous proteins that interact with the complementary mDHFR tag directly. We induced recombination for each of these mated pools and selected for the double-barcode amplicon using the split URA3 marker (Methods). We then combined the pools of diploid PPIseq assay strains for quantification of the selection. A challenge for barcode-sequencing screens is in balancing the library such that sufficient read depth is available to adequately quantify individual barcode abundance in highly complex libraries, even as those barcoded lineages change in abundance in response to selection-based assays [76]. Thus, we first cultured the negative control, positive control, and test strain pools separately for one cycle of MTX selection before combining the pools with a relatively small abundance of positive controls. We sampled this initial double-barcode lineage pool, then continued cycles of MTX and selection and sampling (Methods). The selection culturing and sampling of the lineage pool was done in triplicate.
From these samples, we used our modified DNA extraction and amplicon library generation protocols (S3 File) to generate double-barcode amplicon libraries for sequencing. We found quantification of the double-barcode lineages (Fig 2B) to be reliable, with two independent biological samples of the initial lineage pool being in good agreement (R^2 = 0.976 S4 Fig). In this first time point 211,688 possible combinations of ORFs were detected with at least three double-barcode replicate lineages each, representing at least 60.5% of the possible ORF combinations (Fig 1D, for counts table see Methods, Data access). We analyzed these abundance estimates with a fork of FitSeq [77, 78] to calculate per-lineage fitness estimates while accounting for the change in mean population fitness (Fig 2D, S7, S16 and S17 Tables). Comparing the fitness of each lineage (Fig 2F, 2G and 2H, scaled by the average fitnesses of the positive and negative control lineages Fig 2C) showed positive PPIs are reproducibly quantified (R^2 > 0.9 for lineages with >0.2 fitness, Fig 2I). In total we obtained ~1.5 million unique double-barcode lineages per replicate that we analyzed to call PPIs.
Using double-barcoded lineage replicates to detect promiscuity, toxicity, and dropouts
The control lineages largely behaved as expected (Fig 2B and 2C), with the F[1,2]-F[3] (covalent tag fusion) construct showing the highest fitness and sweeping to high relative abundance, and the double-negative controls (NULL-F[1,2] x NULL-F[3]) showing low relative fitness. We tested for promiscuous proteins by analyzing the NULL-F[1,2] x vORF-F[3] or the hORF-F[1,2] x NULL-F[3] strains, and did not find any SARS-CoV-2 vORFs that mediate escape from the MTX selection by themselves. However, the F[1,2]-tagged human ORF of CSNK1G2 robustly escaped MTX selection when expressed along with a NULL-F[3] tag, and may reflect some interaction of human proteins with the yeast metabolic regulation of the native DFR1 or a direct interaction of CSNK1G2 with the PPIseq assay constructs. We were also able to detect that expression of the F[1,2]-tagged human DHFR rescued growth from the mild (1ug/mL) MTX treatment we use here (Fig 2B and 2C), and this served as an additional positive control. We did not find any specific inhibition of yeast growth for particular vORFs (S7 Fig). However, we did not obtain any barcoded plasmid lineages for either wildtype or mutant NSP3 or the C-truncated version of ORF7B, but this may be explained by the fact that they are the longest and shortest ORFs in the collection (5835bp and 60bp, respectively). Examining these DHFR-F[1,2] positive controls, we unexpectedly found that a few lineages of positive controls would drop out, meaning that the lineage decreased in abundance and produced a low fitness (S5A Fig). Consistent dropouts of the same lineages across replicates (S5B Fig) indicated that the pattern wasn’t simply due to sampling noise from culturing. Manual inspection ruled out a bioinformatics error as the explanation, so it became apparent that some lineages simply yielded a fitness discordant with the other lineages representing that specific biological ORF combination. This was also true for some positive hit PPIs (S2 Fig). We suspect this problem is due to the frequent generation of petite-mutants in yeast, which will be addressed in the Discussion. To address these technical artifacts in our analysis pipeline, we took advantage of our platform’s approach of generating many barcoded replicates for each genotype-of-interest (each ORF-ORF combination). We used a heuristic to drop lineages that were greater than 1 standard-deviation away from the mean for a particular ORF-ORF combination in at least two biological replicates, as long as this did not reduce the number of lineages to below three.
Another unexpected finding was that the positive-controls for the ORF expressing NSP2 (DHFR-F[1,2] x NSP2-F[3]) had a slightly higher fitness than other positive-controls, consistent across the replicate selections (S7 Fig), yet we did not detect any NSP2-specific fitness difference for the NULL-F[1,2] x NSP2-F[3] negative controls. This may reflect some subtle effect that NSP2 expression has on folate metabolism [79] in budding yeast or the mDHFR tags directly, so we mitigated the effect of this by subtracting the average NSP2-specific advantage from all NSP2-containing lineage fitnesses. While this effect appears to be limited to NSP2-containing strains, this adjustment may affect the False Discovery Rate correction for other ORF-ORF combinations.
Detecting PPIs from a massively-multiplexed sequencing assay readout
To detect PPIs, we looked for sets of double-barcode lineages representing the same ORF-ORF combination with a higher than expected fitness. We unexpectedly found that NULL-F[1,2] x NULL-F[3] double-negative controls were on average more fit than either the hORF (hORF-F[1,2] x NULL-F[3]) or vORF (NULL-F[1,2] x vORF-F[3]) negative controls. We also found that the median fitness of the fitness distribution of all hORF-F[1,2] x vORF-F[3] test strains is similar to that of either hORF (hORF-F[1,2] x NULL-F[3]) or vORF (NULL-F[1,2] x vORF-F[3]) negative controls (Fig 2C). This appeared to be a general effect, as we did not find good evidence of any specific fitness cost associated with any individual hORF or vORF. Because of this effect, we chose to compare the fitnesses for each set of lineages (grouped by hORF and vORF) to the average fitness of the NULL-F[1,2] x vORF-F[3] negative control for that particular vORF.
We tested 184,491 groups of hORF-F[1,2] x vORF-F[3] lineages, requiring first that the group contains at least three lineages with fitness estimates (Fig 2J). We then used Welch’s one-sided t-test with a False Discovery Rate (FDR, Benjamini Hochberg) correction to detect PPI hits (S18 Table). Across the three replicates at a FDR < 0.05 threshold we found 51 ORF combinations that produced hits in only one replicate, 13 that produced hits in only two replicates, and 10 combinations that produced hits in all three replicates. As shown in Fig 2J, ~53% (184,491) of the ~350k possible lineages (~330k when counting unique genes) are able to be assayed in at least one replicate. Thus, we detect putative PPIs for 0.04% of the testable ORF combinations.
PPIseq complements previous human-SARS-CoV-2 PPI screens
We next compared our results to previously observed PPIs in the IMEx Coronavirus dataset [80] to determine the overlap with our 74 PPI hits or the 184,417 ORF combinations for which we did not detect a PPI (Fig 3A). We found a significant overlap (p-value < 0.05) of 8.1% of hits being previously observed compared with 0.75% of the combinations not called as a PPI. This confirms that the mDHFR split-tag assay reproduces detections by some other testing modalities (two-hybrid, GFP split-tag, or affinity purification), but also suggests that it has complementary sensitivities or biases to help complete a better understanding of PPI networks [17]. To validate the replicability of PPIseq in quantifying the mDHFR split-tag assay we repeated the mDHFR assay using a low-throughput clonal measurement of colony growth on agar. We selected human and viral ORFs that were constituents of several detected PPIs and regenerated 11 clones of haploid yeast with each human ORF integrated in a mDHFR-tagged expression locus. To obtain the 7 complementary vORF-F[3] haploid yeast strains we used REDIseq [81] (Methods), then we generated each combination of hORF-F[1,2] x vORF-F[3] diploid strains by pairwise mating and selection as before. Scans of the resulting MTX growth assay of the diploids (S19 Table) were analyzed to quantify colony size. Despite assaying the strains on agar media and with 100-fold larger concentrations of MTX (necessary to generate an effect large enough for visual analysis), we found 11 of 13 PPIs amongst these ORFs were detected again (Fig 3B). By contrast, 8 of 64 background combinations produced detectable growth and are examples of false negatives in the PPIseq assay—likely due to insufficient coverage of lineages per PPI in the selection pool (Discussion). Although false negatives are pervasive in PPI measurement assays, this shows that our PPIseq quantification of the mDHFR split-tag assay is consistent with the performance of other platforms, detects a fraction of the PPIs that have been previously observed, and also detects PPIs that may be inaccessible by other techniques. These new detections are likely by virtue of the mDHFR assay’s compatibility with membrane-bound proteins, diverse organelle localization, and proteins that are poorly expressed in human cell culture lines or poorly purified.
Fig 3. PPIseq confirms previously detected human-SARS-CoV-2 PPIs and finds new interactions involving taste receptors and ER-retention.
A) The fraction of protein pairs screened by PPIseq that have been previously reported as PPIs in the IntAct Coronavirus dataset. Protein pairs are partitioned into those that have not (left) and have (right) been called as PPIs by PPIseq. B) The fraction of protein pairs that were validated as a PPI in a clonal growth assay when 0, 1, 2, or 3 of the PPiSeq growth replicates detected a PPI. C) GO term enrichment analysis (using clusterProfiler [82]) of human proteins that interact with the products of viral genes NSP4, ORF7B, and ORF7A. The color of each tile is the negative log10 of the adjusted p-value for the GO term enrichment, all shown are adjusted p-value < 0.05. The number in each tile is the number of human ORFs with that annotation in the set.
To generate hypotheses of how these interactions might contribute to the functional roles of each viral ORF, we analyzed the properties of the human ORFs detected as PPI hits with each viral ORF. We calculated enrichment using a hypergeometric test or the hypergeometric-distribution-based enrichment test, as implemented in clusterProfiler [82], for GO terms, having transmembrane domains, PROSITE domain annotations, and Reactome pathway annotations. Here we find that the human proteins of PPI hits for two viral proteins (NSP4, ORF7B), are significantly enriched (p-value < 0.05) for interactions with proteins containing transmembrane domains (by Uniprot annotation, 5/6, and 11/12 respectively), while ORF6 and ORF7A were only detected with transmembrane proteins. This is consistent with ORF7B partners being enriched for GO terms related to lipid metabolism and endoplasmic reticulum membrane (Fig 3C), although this may also reflect a hypothesized promiscuity of the proposed leucine-zipper motif in binding other transmembrane proteins [83]. ORF7A was only found to interact with the taste receptors TAS2R41 and TAS2R7, which is intriguing considering the genetic evidence of variant-of-concern-specific ORF7AB alleles in a hamster-model of anosmia [84] and TAS2R41 being identified as a potential host-factor in a loss-of-function screen in cell-culture [85]. We confirmed the previous detection of ORF6 interacting with ABHD16A, a mechanism that may supplement ORF6’s antagonism of cellular innate immunity [86–89]. NSP4 was found here to interact with KDEL-receptors (both KDELR2 and KDELR3), and this interaction with the ER-retention retrograde signaling pathway might shed light on the role this protein plays in shaping ER membrane into the double membrane vesicles important for viral replication [90–92]. NSP2 was enriched (p-value < 0.05) for interactions with proteins annotated with the Reactome pathway of Smooth Muscle Contraction, including the Calmodulin CALM3 and Calmodulin-binding protein CALD1. These analyses show that the complementary sensitivities of the mDHFR, specifically to PPIs of transmembrane proteins involved in the cellular membrane biology critical for some viral processes, can help us identify more complete hypotheses of how viral proteins interact with host factors.
Discussion
We extend the capabilities of the PPIseq platform to efficiently screen for protein-protein interactions (PPIs) using pooled methods for plasmid engineering, DNA barcoding, annotation, library generation, and selection. In this work, one researcher was able to use the PPIseq platform to screen at least 55.8% of the possible PPIs between the SAR-CoV-2 and human proteins in a library of >330,000 protein pairs, finding 74 hits. Six of these hits were previously observed in the IMEx Coronavirus dataset and 68 are new. PPIseq uses the mDHFR split-tag as our PPI selection system, which has high sensitivity to transmembrane protein PPIs and serves to complement previous SARS-CoV-2-human PPI screens. In particular, we found interactions that suggest a potential role for ORF7A in mediating interactions with human taste receptors and NSP4 in mediating ER retention via the KDEL-receptors, interactions that might be critical for function during infection but are likely challenging to detect using other split-tag reporters or mass-spectrometry-based approaches.
A key advantage to PPIseq is scale. We used a pooled approach whenever possible, which lowers the cost and hands-on time per PPI assayed relative to array-based alternatives. Another advantage is our use of whole-plasmid sequencing of barcode-ORF pairs, which detects any errors in the ORF sequence. ORF sequence errors can be a problem for libraries of arrayed constructs that can accumulate mutations or be cross-contaminated between positions. Another advantage of the all-pooled approach is that the methods here do not require specialized equipment. High-throughput sequencing is available commercially, and the provided protocols (Supplement) and code (Methods) can be used with typical microbiology lab equipment to screen other Gateway ORF collections.
However, one major caveat of our pooled approach is that we are unable to balance the abundance of each species of the pool, resulting in a skewed abundance distribution (S11 Fig). Future use of this workflow may benefit from isolating arrays of clones to rebalance the relative abundance of haploid strains before mating them to generate the diploid double-barcode PPIseq assay library. While isolating arrays of clones is generally a laborious undertaking, the elaboration on the PPIseq platform described here yields a PPIseq haploid library that is compatible with the iSeq 2.0 platform [51] and thus can use REDI-seq to rapidly identify the identities of randomly arrayed clones [81]. We tested this indexing capability here by randomly arraying the pool of barcoded haploid tagged-vORF yeast into an arrayed library and using REDI-seq to annotate the identity of each position (S10 Fig). The strains from this array were used for the PPI verification shown in Fig 3B. This approach, or alternative technologies for “indexing” or "demultiplexing" a mixed pool of plasmids [93], offer opportunities for efficiently separating and balancing the abundance of constructs before combining again in pools. Such an approach may be valuable when reducing the all-by-all test pool to smaller subsets of test strains that are of interest, such as when screening a set of PPIs for small-molecule disruption.
There are also limitations to our split-tag reporter assay, and split-tag reporter assays in general, that could contribute to a high false-positive or false-negative rate. We have not verified that each of the thousands of ORFs integrated into the tag-fusion construct are faithfully and accurately expressed in each strain, that the expression of each tagged-ORF is similar, or that they fold to generate all functional domains relevant for an interaction. Each of these proteins may also be missing post-translational modifications from host or pathogen factors, or subject to aberrant post-translational modification when expressed in budding yeast. Obligatory tertiary binding partners may be absent, or promiscuous binding could be mediated by a partner present in yeast but not the natural host cells [94]. Additionally, the mDHFR split-tag may interfere with a functional binding interface or prevent proper co-localization. The seemingly low overlap with previously reported PPIs (6 of 74) is consistent with the difficulty of reproducible detection while limiting false-positives in the massive search space of PPI screens [89], but this also likely reflects that the mDHFR split-tag suffers from the same kinds of limitation and bias as other assay modalities. While the mDHFR split-tag reporter may be more sensitive to membrane-associated proteins it is also likely biased against detecting PPIs of nuclear proteins [17], for example. The expression of ORFs may also modify yeast physiology to prevent a faithful increase in batch culture fitness as a result of the mDHFR assay. For example, 5.6% of approximately 10,000 human cDNAs were found to be toxic when over-expressed in yeast [95]. Despite these limitations, the significant overlap with previous SARS-CoV-2-human screens and the established performance of the mDHFR-tag suggests that PPIseq is a viable platform for scaling the mDHFR split-tag assay.
Besides the above limitations that are common to most split-reporter screens, other technical limitations of the method as realized here are more readily addressable. For example, we observed lineage-specific "dropouts" where a few lineages that represent a particular protein pair would decrease in relative abundance (despite an average high fitness), but never the converse where rare lineages would have an increased fitness. The consistency of lineage fitness for dropouts across replicates suggests this is not a function of noisy genetic drift due to limited cell bottlenecks, and instead may be caused by a mutation that reduces growth rate. A likely candidate is "petite" mutations, a common class of mutation in lab strains of budding yeast that results in the loss of the mitochondrial function and slow growth. If this is the cause, future work could pre-culture strains on a non-fermentable carbon source or use a "repaired" lab strain with a lower rate of petite formation [81]. There are also opportunities to improve statistical analysis. Here we used a statistical approach similar to our previous work [51] wherein we test if the sample of fitnesses for the group of lineages representing a particular ORF-ORF combination are greater than a null distribution. More sophisticated statistical methods that can confidently detect per-lineage or per-group fitness changes while moderating the variance ubiquitous to the use of sequencing platforms for quantification. For example, the FitSeq-derived population average fitness can be used to adjust library scaling parameters in a tool such as the limma/voom differential expression analysis pipeline [96], and rigorous development and testing of these approaches could enable detecting changes in single lineage fitnesses between treatments. Moreover, non-linear modeling of the signal could be pursued to enable more biophysically interpretable results [97].
Decades of technology development has enabled foundational genome-scale studies of PPI networks and how they evolve [28, 32, 46, 47, 80, 89, 98, 99]. This paper builds on that work by enabling an all-pooled library-generation and measurement workflow for PPIseq, contributing to the diversity of genome-scale PPI assays [100] with a workflow accessible to non-specialist research groups. Applying these methods to characterize virus-host PPI networks for many virus species and strains will help better describe the dynamics of PPI network evolution, and the use of pooled tagged-ORF plasmid libraries as inputs can leverage high-throughput pooled reverse genetic tools to dissect the genetic contributions to altered PPIs [101]. PPIseq is based on an interaction-barcoding platform that can also be readily adapted to screen alternative split-tags [102] and tag-orientations, or to assay toxicity or extracellular interactions [103, 104], and together these technologies can help test foundational concepts of the mutational supply of PPI alterations, their effects on fitness, and the resulting dynamics of PPI network evolution. This path of combining PPI network surveys with measurements of the cause and consequence of altered PPIs will permit an evolutionary systems biology analysis of how the accessibility or "roughness" of mutational paths across a fitness landscape and the ecology of viral/host systems interact to manifest biological complexity in the structure of PPI networks [105].
Materials and methods
Code access
Computer code used to analyze data and generate graphics is available in several git repositories. For the analysis of the barcode annotation datasets, see https://gitlab.com/darachm/ppiseq_dme353/ for analysis of the first PacBio dataset as part of Fig 1, then see https://gitlab.com/darachm/ppiseq_d368/ for analysis that extends this mapping using Nanopore dataset to generate the barcode annotation map finally used for Fig 1. For analysis of the PPIseq selection data and/or REDIseq array identification data (three lanes of Illumina HiSeq) for Fig 2, see https://gitlab.com/darachm/ppiseq_d359d360. For analysis of the PPIseq hits used to generate the work surrounding Fig 3, see the analysis at https://gitlab.com/darachm/ppiseq_d359bio. For analysis of the plate images for the re-assay on agar media (Fig 3), see https://gitlab.com/darachm/ppiseq_dme383/. For each, additional associated "data" files are available at OSF (doi.org/10.17605/OSF.IO/B8G3H), as well as ZIP-ed archives of each git repository in case the hosted repositories are unavailable. Direct any technical questions to Darach Miller using the ’Issues’ feature on the GitLab website.
Data access
Sequencing datasets are available on the Sequence Read Archive. For long-read sequencing (PacBio and Nanopore) used to annotate DNA barcodes with the linked ORF they represent, see PRJNA1073210. For Illumina sequencing of the double-barcode amplicon used to quantify the fitness of each diploid lineage, see PRJNA1073201. See the supplemental table (S8 Table) for the more metadata to contextualize these. For other intermediate files see the supplemental files for this paper, or retrieve them as a sqlite3 database file from this OSF repository (doi.org/10.17605/OSF.IO/B8G3H). For example, because of the particularly large size of the counts table, that table is available as a separate sqlite3 database in the linked OSF repository. Raw TIFF scans of the agar plates used for validation of the modified mDHFR assay (Fig 1) or PPI re-testing on agar media (analyzed for Fig 3), is available in the above OSF repository in an appropriate folder.
Yeast media and strain construction
Yeast media was made as described in the 2015 edition of the CSHL Yeast Course manual [106], specifically YPD, SC (synthetic with complete amino-acid supplementation), and SD (synthetic dropout with only specific amino-acids supplemented back). Dextrose, yeast-extract, peptone, and yeast nitrogen base (without amino-acids or ammonium sulfate) were supplied by BD, and supplemented by ammonium sulfate from Sigma. Amino-acid additive stocks and selective plates with 5-FOA (GoldBio), hygromycin, or G418 were made as in CSHL Yeast Course manual.
Previously engineered BY4741/4742-background Saccharomyces cerevisiae strains containing iSeq Gal-Cre landing pads ySL167/XLY5 (MATa) and ySL173/XLY11 (MATalpha) [52] were engineered further to delete the coding sequence of PDR1, PDR3, PDR5, and SNQ2. We used the 50:50 method [107] several times to do this. Briefly, we designed primers that amplify a URA3MX3 cassette from pRS426 [108] with flanking sequence-identity that mediates homologous-recombination into the yeast genome. This insertion cassette is selected on SC-URA for the presence of the URA3MX3 marker, but the cassette is designed such that it also introduces an identical sequence repeat that spans both the URA3MX3 cassette and the target coding sequence. We expand these URA+ colonies in YPD liquid media overnight, then plate this population on SC+5-FOA to select for clones where the identical repeat has promoted excision of both the marker and the target coding sequence. Colonies are restruck on SC+5-FOA and clones from this are screened by colony PCR. Deletions were carried out in series, then qualitatively checked for function (growth on non-fermentable carbon source, integration of iSeq plasmids, mating, and function of MTX selection using positive/negative control constructs as described in the Controls Strains section). A clone of each type were saved as strains ySL507 (MATa, loxP/71) and ySL508 (MATalpha, lox5171/71).
Construction of PPiSeq base plasmids
Previously published plasmids pBAR4 and pBAR5 [52] were further engineered (using typical methods of digestion, ligation, Gibson assembly, and cloning in E. coli) to integrate a Gateway cassette (attR1-ccdB-CmR-attR2) for expressing a Gateway-integrated ORF as a fusion to either a F[1,2] or F[3] mDHFR split-reporter tag [28]. Gateway recombination from an attL1-attL2 Gateway Entry vector results in the ORF being fused in frame with the standard Gateway scar, then a 4xGGGGS linker [53], and then either of the two mDHFR reporter tags. The promoter of this expression construct is the common TDH3pr / pGPD promoter, and the terminator is a 35bp minimal terminator (Synthetic 7) as designed and tested by Curran et al 2015 [59]. Immediately downstream of the terminator is an I-SceI site. The rest of the plasmid is derived from pBAR4/pBAR5, and so each of the plasmids has a complementary intron donor or branch site [52, 109], adjacent to that either ura3ΔC or ura3ΔN with a promoter and terminator as appropriate, either a KanMX or HygMX drug-selectable yeast marker, and a ColE1 origin and AmpR marker for propagation in E. coli. Importantly, these base plasmids do not contain any lox sites, barcodes, or amplicon-sequencing priming sites. These plasmids were saved as plasmid-bearing clonal strains of E. coli, with plasmids pSL51 bearing the F[1,2] mDHFR tag and pSL737 bearing the F[3] tag.
Entry vector (Gateway) plasmid pool sources
The human ORFeome (hORF) v8.1 collection (purchased from Dharmacon, https://horizondiscovery.com/en/gene-modulation/overexpression/cdna-and-orfs/products/horfeome-v8-1-collection) was pinned from stocks onto LB+Spectinomycin agar, and colonies were collected by scraping into 9 sub-pools of up to 16 96-position plates. These pellets were resuspended and frozen in 15% glycerol, and a frozen aliquot of each was later collected and plasmids were extracted via miniprep kit (Qiagen).
Vectors with ORFs that encode proteins similar to those encoded by SARS-Cov-2 (vORFs) were obtained as single clonal E. coli plasmid-propagating strains from Addgene, and are a generous gift from the Roth laboratory by way of Addgene [69]. The identity of each clone was verified with partial Sanger sequencing. ORF-containing plasmids were pooled into 3 subpools, with membership in each pool being arranged such that no pool contains more than half of the genome (by total length). These ORFs were handled in separate pools at all stages until the MTX-selection competition assay of diploid yeast libraries, or after preparing barcode-indexed libraries for PacBio or ONT Nanopore sequencing.
ORF integration and tagged ORF plasmid linearization
Each subpool (9 hORF or 3 vORF subpools) of plasmids was combined with either pSL51 (hORF) or pSL737 (vORF), TE (Tris-EDTA buffer), 5x reaction buffer, and LR recombination enzymes from the Gateway LR Clonase kit (Thermo) as according to manufacturer instructions. Reactions for hORFs were 20uL total volume with ~300ng each plasmid pool / destination vector, and 10uL total volume with ~150ng each plasmid pool / destination vector for vORFs. The vORF reaction was left at RT for 1 hour, digested with proteinase K, 2uL was transformed into 10beta Mix-and-Go chemicompetent cells (Zymo), and cells were plated to select on LB+Carb plates. The hORF reaction was left at 25°C for 24 hours, treated with proteinase K, concentrated with an ethanol precipitation, electroporated into 10beta electrocompetent cells (NEB), then the 1-hour-recovered cells were plated to select on LB+Carb plates. Cell counts were estimated from a plated dilution and the rest of the colonies were scraped to collect and freeze cells at -80°C. An aliquot of each subpool was miniprepped to extract out the LR-recombined tagged-ORF plasmid pools.
Integration of the lox site, random barcode, and priming site cassette
Each "loxcode" cassette is a 130-138bp dsDNA fragment that contains: sequence-identity to the terminator, loxP/66 or lox5171/66 sites (cis-elements for Cre/lox recombination), a 30bp random barcode (IDT, N_30 machine mix), priming sites for amplicon sequencing, and an I-SceI site. For the lox5171 loxcode cassette we re-used the iSeq reverse priming site [52] with some modifications (here calling the priming site "iseq2R"), but for the loxP fragment we devised a novel priming site based on a synthetic spike-in [110] that was manually adjusted based on personal experience (here calling the priming site "mjtF").
The dsDNA loxcode cassettes were formed by combining two purchased oligos (IDT) in a 50uL Q5 (NEB) polymerase reaction with the Q5 polymerase, 200mM dNTPs, 500nM primers, and 1x Q5 buffer. Each reaction is run on a thermocycler to melt at 98°C for 30s, brought to 70°C for 1s, then cooled to 40°C at 0.1°C/s rate, then brought to 72°C for 20s. Each reaction is then cleaned using a DNA Clean and Concentration kit (Zymo) at a 5:1 buffer:sample ratio, and eluted with 20uL EB and frozen. oSL1186 was combined with oSL1187 to generate the loxP/66-barcode30-mjtF-ISceI loxcode cassette, and oSL1159 was combined with oSL1158 to generate the lox5161/66-barcode30-iseq2R-ISceI loxcode cassette.
Each tagged-ORF plasmid subpool was treated with I-SceI (NEB) in the provided buffer (at a volume of 50uL for ~1ug hORF plasmids and 200uL for ~4ug vORF plasmids). This reaction was incubated at 37°C at least overnight, then was treated at 65°C for 30min to denature the I-SceI (a critical step). This was cooled and treated with phosphatase rSAP (NEB) at 37°C for 30min, then denatured again at 65°C for 10min before ethanol purification. With this approach test samples appeared to yield complete linearization as viewed on an agarose gel and this procedure was important to minimize self-closure in the next cloning step.
Approximately 6ng of each loxcode cassette was combined with approximately 75ng of the linear digested plasmid pool in a final 10uL of 1x HiFi assembly reaction (NEB). This was incubated on a thermocycler at 50°C for 15min, then put on ice. The reaction was ethanol precipitated, then resuspended in water. A dilution of each reaction, as determined based on empirical tests of colony yields, was electroporated into 10beta electrocompetent cells (NEB). After recovery, these were plated, grown overnight, and approximately 2-4x coverage of the ORFs expected to be in the pool were scraped into a cell pool, yielding each E. coli pool containing barcoded tagged-ORF PPiSeq assay plasmids. Aliquots were frozen in 15% glycerol, and one aliquot of each pool was extracted using a Qiagen miniprep kit.
After the barcode-ORF map was determined (see below), we thawed frozen aliquots of selected pools of E. coli containing barcoded-ORF plasmids, the same pools from which the barcode-ORF map data was generated, and diluted these in LB+Carb liquid media. For the three vORF pools these back dilutions were grown to approximately exponential phase growth and Qiagen miniprepped. For the hORF pools, these were each diluted 1/20 and incubated with shaking for approximately 3 hours at 37°C and confirmed to reach about saturation (by OD). These starters were then back diluted again approximately 1 in 200 into LB+Carb liquid media. After 2 hours, OD for each pool was used to combine the hORF pools with volume determined by the library complexity (ie as the number of unique plasmids found in the PacBio barcode-ORF sequencing) and this pool were cultured at 37°C overnight (15 hours) after the addition of chloramphenicol to a concentration of 170ug/mL. In the morning, the culture was collected by centrifugation and plasmid pool was extracted with a Qiagen maxiprep kit in two batches. During this entire process, each vORF pool was kept separate, but all 9 of the hORF subpools were combined before maxi prep.
Testing the plasmid-based mDHFR-tag PPiSeq system and constructing control strains
To facilitate the construction process we first took the base expression plasmids (pSL51 and pSL737) and integrated the lox cis-element, random barcode, and priming site cassette as described above—importantly these plasmids still contained the Gateway ccdB counter-selection landing site but the pool of generated plasmids also contained a complex barcode library at the barcode locus. This is the opposite order as used in the large-scale experiment (where ORF is integrated first, then barcode), but is easier for engineering of single clones of plasmids. We isolated ORF-bearing Entry plasmids from clones in the arrayed collection of the Gateway entry vectors for the human ORFeome v8.1 collection (purchased from Dharmacon). We used a Gateway LR Clonase mix (Invitrogen) to move the ORFs from these plasmids into the pre-barcoded pSL51 and pSL737 plasmid pools, and cloned these into 10beta cells made competent via the Mix-and-go kit (Zymo Research). Clones from these were arrayed in a 96-well plate and screened for chloramphenicol resistance that would indicate presence of the Gateway counterselection cassette, then screened using colony PCR of oligos oSL231 against oSL261 on the pSL737-based plasmid and oSL231 against oSL260 on pSL51-based plasmid. These were then Sanger sequenced to confirm the identity of the ORF integrated—for example we found that two attempts at isolating plasmid from position GDEh81037@B04 identified the ORF UBE2S instead of the ORF BCL2L1. These plasmids were all transformed using a lithium-acetate transformation protocol, as described in CSHL Yeast Manual [106], into the yeast strains ySL167 and ySL173, as appropriate given the loxP or lox5171 landing pad present in the genome, and selected using the drug markers on the integrated plasmids. Clones of these strains were picked into a 96-well plate and crossed by mixing in YPD overnight. Diploids were selected with YPD+G418+Hyg, then these were pinned onto YPGal agar plates for overnight induction before recombinant diploids were selected on SC-URA agar plates. These were pinned and grown again in 96-well SC-URA liquid media, then pinned to SC-URA+MTX (100ug/mL) agar media. After two days of growth this plate was imaged with a flat-plate document scanner.
We generated two types of control plasmids for this tagged-ORF expression system, a covalently-fused co-localization positive control and a NULL negative control of passive co-localization. For the first type of control, we generated a Gateway entry vector with attL1/2 sites flanking a F[1,2] mDHFR tag fused on its C-terminus to a 2xGGGGS linker, and this was integrated into the pre-barcoded assay plasmids (barcoded pSL51 and barcoded pSL737). Integration into the pre-barcoded pSL51 yields a mDHFR F[1,2]-linker-F[1,2] fusion that does not grow readily upon MTX treatment (a negative control), while integration into pre-barcoded pSL737 yields a mDHFR F[1,2]-linker-F[3] fusion that rescues growth upon MTX treatment (positive control). For the second type of NULL negative control, we directly engineered the pre-barcoded pSL51 and pre-barcoded pSL737 plasmid pools using PCR and HiFi in vitro recombinational assembly (NEB reagents) to generate a construct that should express a single methionine start-codon fused to the Gateway attL2/attR2 "scar" sequence (YPAFLYKVV), the 4xGGGGS linker, and each of the two mDHFR tags. Even when combined, these "NULL" mDHFR tags untethered to any ORF did not rescue growth on MTX that we could detect given our experimental design, thus serving as a control for the subtle confounding effects of passive co-localization [65].
Mapping the ORF-barcode annotation
To generate the barcode-to-ORF annotation mapping vital for using a barcode-sequencing approach, We digested ~2ug of each of 24 barcoded ORF plasmid pools (2 replicate pools of each of 9 hORFs and 3 vORFs) overnight with I-SceI (NEB) according to manufacturer’s recommendations, then denatured the reactions at 65°C for 30min before purifying each individual reaction using a phenol-chloroform reaction and ethanol precipitation. Each digested plasmid pool was subject to PacBio library prep using the SMRTbell Express Template Prep Kit 2.0 and SMRTbell Enzyme Clean Up Kit, as per manufacturer instructions (101-646-100 v7 and 101-800-100). Plasmid pools were inspected with a Bioanalyzer DNA 12,000 chip to confirm the absence of primers and approximate library size. These libraries were pooled as a multiplexed library and submitted the University of Arizona core facility for PacBio sequencing on the Sequel II platform.
This PacBio run yielded a high fraction of truncated reads, so we tried another platform. We pooled a set of 12 digested plasmids that had been miniprepped as part of their isolation, then prepared this pool using the ONT Nanopore Minion system with the ONT Native Barcoding (24) and Ligation Sequencing kits as per manufacturer instructions for a small-fragment sample and sequenced it on a R9.4.1 Minion flow cell. We observed the qualitatively the same pattern of read truncation as had been seen on the PacBio platform, so we revised our methods to include an Ampure XP bead cleanup step (0.6x) after the I-SceI digestion, then prepared libraries for ONT using the FFPE treatment and UltraII End Repair (NEB) as described in Nanopore library preparation protocols for genomic DNA. These performed well on a R9.4.1 Flongle, so we also sequenced these libraries using an entire R10.4 Minion flowcell.
To analyze these datasets, we devised a bioinformatics pipeline that (1) extracts the barcodes from the plasmid read, (2) trims away the backbone, (3) does a Multiple Sequence Alignment -based assembly of a draft consensus, and (4) uses Racon and Medaka to polish the assembly. This was done for both the PacBio (fewer reads, high quality) and Nanopore data (many reads, low quality), and the map of barcode to ORF, as well as any mutations in the ORFs detected, was compared (see Results). These maps were unified to determine what ORF and mutational-status is represented by a barcode (S4 and S5 Tables).
Transformation and integration of barcoded tagged-ORF plasmids into landing pads
To rapidly express the Cre recombinase without a non-glucose preculture that would inhibit the transformability of the yeast, we used a transient transformation approach. Briefly, we amplified a TEF promoter and Cre recombinase coding sequence cassette using primers oSL331 and oSL702 from a template of pSL57 (55°C annealing and 2min extension), using OneTaq polymerase (NEB). This was purified on a McGeachy PCR Purification column.
For the barcoded vORF plasmid library the assay yeast strain ySL507 (MATa, loxP) was grown overnight in YPD, then back diluted and grown to ~0.4 OD in YPD. 300mL of this was collected in 50mL conicals with spins at 1000g for 3min, and 1mL of ice cold water was used to pool these into three tubes. These were collected by centrifugation (at 800g 1min) and washed with 500uL cold 0.1M LiAc, before resuspending with 20uL cold water and pooling into one tube. By hemacytometer this was 1.84e8 cells per 20uL. 20uL of these cells were put onto 180uL of a transformation mix (6uL 10mg/mL ssDNA, 140uL 50% PEG 3350, 21uL 1M LiAc, 2.1uL 1M DTT, ~500ng of TEF-Cre expression cassette, 5ug of each of three vORF plasmid pools, and water to bring to 180uL total volume). These were vortexed to mix, put at 30°C for 30min, 42°C for 30min, and collected by centrifugation. This was resuspended with 1mL of YPGal to recover, then spun and resuspended in 4mL YPGal after 1 hour and put at 30°C on a roller drum overnight. By hemacytometer, the induction cultures about doubled in the number of cells. The induced cells were then plated, and after growing colonies were counted and scraped into water suspension. By colony count, these had from between 25-196x coverage of colonies per unique plasmid in that subpool. This cell pool was brought to 15% glycerol and frozen at -80°C for later use.
The much more complex barcoded hORF plasmid library posed a challenge, as the yield achievable by the above protocol would require a laborious and expensive amount of plasmid and repetition. So, we devised adaptations of the lithium acetate and PEG protocol of Gietz et al. [70] to increase transformation efficiency, and used this to transform and integrate the barcoded tagged-ORF plasmids. An important modification is the gentle recovery of yeast, after the 65°C heat-stress, using YNB+Dex—media that contains "Yeast Nitrogen Base" and 2% glucose, critically without any nitrogen source. Using this protocol (S1 File) we were able to generate a library of ~337,000 genomic integrants of the barcoded F[1,2]-tagged hORF plasmids. To do this, the assay yeast strain ySL508 (MATalpha, lox5171/77) was grown overnight in 2x YPD. In the morning, this was diluted with 2x YPD and grown for approximately 5 hours until reaching 0.8–0.9 OD. 400mL of this yeast culture was collected in 50mL conicals (3000g 5min) by centrifugation. These were washed twice, resuspending in 25mL and 10mL water and centrifuging, before another centrifugation and removal of all possible water from the pellets. During these steps the yeast in the eight conicals were pooled to two conicals. These pellets were loosened by vigorous tapping and vortexing, before adding 10 volumes of the TRAFO mix (1 volume is: 240uL PEG 3350, 36uL 1M LiAc, 50uL of freshly denatured 10mg/mL salmon sperm DNA, 2ug of barcoded tagged-hORF plasmid pool, 500ng of the TEF-Cre transient expression cassette, and water to bring to a total of 360uL). Cell pellets were vigorously resuspended in this with a sereological pipette and vortexing, then put in a 42°C waterbath for 70 minutes with agitation of the waterbath in the first 5min and inversion of the conical tubes to mix at 15min, 35min, and 55min. Pelleted cells at 3000g 5min, room temperature, then aspirated supernatant completely, using a second brief spin to remove all. The pellet was loosened again and then resuspended in 10mL (per conical tube) of YNB+Dex ("Yeast Nitrogen Base" without ammonium sulfate or any amino acids, supplemented with 2% glucose) and incubated at 30°C shaking for 2 hours. This was pelleted again at 3000g for 5min, with supernatant aspirated, washed once with 5mL of YNB+Gal ("Yeast Nitrogen Base" without ammonium sulfate or any amino acids, supplemented with 2% galactose) before resuspension in 10mL of YNB+Gal and incubated overnight (approximately 16 hours) at 30°C. During this step, no statistically significant increase in cell numbers was detected by hemacytometer counts. The cells were collected by centrifugation and resuspended in YNB+Dex before further dilution and plating on YPD+G418 (400mg/L) agar plates. This library spanned 39 plates, in addition to the dilution plates used for counting. By this method, the library was estimated to have approximately 337,000 integrations of plasmids into the ySL508 background. All the 39 plates were scraped and collected into a pool at 15% glycerol with approximately ~1.8e9 cells per mL, and frozen at -80°C.
Mating haploid yeast to generate a diploid assay library
We combined haploid yeast pools to mate and form double-barcoded diploid yeast lineages expressing two tagged-ORF constructs each. To do this, we first thawed and cultured each haploid yeast pool in YPD with the appropriate drug, using 100ug/mL hygromycin for yLSL47 (mDHFR F[1,2] positive control), yLSL48 (NULL negative control), yLSL52, yLSL53, and yLSL54 (the three SARS-CoV-2-like ORF pools) and 400ug/mL G418 for yLSL49 (NULL negative control) and yLSL55 (human ORF pool). We collected these overnight 30°C cultures by centrifugation, then washed the cells with YPD and resuspended them at approximately 2e8 cells per mL density. We then mixed pools for each of the matings. Briefly we aimed to combine 1,000 times more cells of each pool than we expected there to be unique combinations in the mating library. These were combined and brought up to approximately 4e7 cells per mL density (2e7 cells per each pool per mL) in YPD at 10% glucose content. These were shaken at 30°C for 4 hours, or incubated on the roller drum for small volume low-complexity libraries. Cells were collected by centrifugation and washed once with 1x YP (no glucose) media, then resuspended in nitrogen-source-less 1x YNB ("Yeast Nitrogen Base" without the nitrogen) supplemented with 2% Galactose. These were incubated overnight at 30°C with shaking or roller-drum agitation, then plated on SC-URA plates at approximately 1e9 cells per plate density. After allowing the lawns to grow 2 days, these were scraped and collected into liquid SD+His+Leu media and expanded to culture approximately 200 cells per each lineage expected in the pool. These were grown two days at 30°C shaking, then diluted 1/10 again onto SD+his+leu media and grown two more days to select for URA+ double-barcode diploids.
Culturing for the methotrexate (MTX) selection of diploid strain pools
To select on the MTX-treated growth phenotype of the pool of lineages, we diluted the diploid cell pools (still separated into three pools of vORF-hORF test strains, one pool of hORF-NULL negative strains, three separate NULL-vORF negative strains, and separate pools of NULL-NULL and positive controls) as 1 in 8 volumes back dilution into SD+His+Leu+1ug/mL MTX (methotrexate). These were grown two days at 30°C shaking. The pools were then combined into one test pool and samples were taken for the first timepoint. The singular pool of strains was split into three, back-diluting 1 in 8 into SD+His+Leu+1ug/mL MTX and grown at 30°C shaking in three separate flasks. This back-dilution, sampling, and culturing procedure was repeated several more times for each of the three separate culture replicates, ultimately collecting more samples than were processed for sequencing. Samples were taken by centrifuging 50mL of saturated liquid culture (grown in SD+His+Leu+1ug/mL MTX), completely removing the supernatant, dislodging the pellet by tapping, and freezing at -20°C.
DNA extraction for amplicon sequencing
We devised a protocol (S2 File) for DNA extraction that permitted large inputs of cells without excessive inhibition of the downstream PCR, and used this to process the samples for double-barcode amplicon sequencing. We prepared Xbuffer by dissolving into 30mL water (at 65°C) 1g polyvinylpyrrolidone, 1g CTAB, 4.09g NaCl, 5mL of 1M Tris-HCl (pH 7.5), and 1mL of 0.5mM EDTA, then bringing to a final volume of 50mL of water. We used 400uL of this Xbuffer to resuspend the cell pellet of approximately 2.5e9 diploid stationary-phase yeast cells, transferred this into screw-top bead beating tube with ~0.3mL of glass beads, added ~20ug of RNAseA (Monarch NEB), and agitated using a bead beater Biospec 607 for 5min. After this, tubes were incubated at 65°C in a waterbath for 30min. 400uL of 24:1 chloroform:isoamyl-alcohol was added to each tube, these were closed and inverted vigorously to mix, then put back at the 65°C waterbath for 15min. Tubes were centrifuged for 2 minutes at room temperature at max speed in a benchtop centrifuge, then the supernatant was removed to a new eppendorf tube. To this, 600uL more 24:1 chloroform:isoamyl-alcohol was added, and vortexed to mix well before centrifuging for 2 minutes at max speed. The supernatant (translucent yellow at this stage) from this was carefully aspirated to a new tube. Approximately 0.7x volumes of isopropanol was added, inverted to mix, and spun at 2 minutes max speed. This pellet (large white, chalky) was washed with 1mL of 70% ethanol, then the supernatant was removed and the pellet dried for 10 minutes at room temperature. To this pellet, we added 200uL of TE (10mM Tris 1mM EDTA) and incubated on bench for 30min to dissolve with two bouts of vortexing during incubation. To this 200uL of resuspended pellet we added 1 volume (200uL) of the QX1 buffer from the Qiaex2 DNA extraction kit (Qiagen) [56]. This was immediately spun for 30 seconds at max speed, and the supernatant aspirated to a new tube. To this supernatant we added 30uL of the Qiaex2 DNA-binding solution, and this was vortexed to mix at 2 minute intervals across a 10 minute room temperature incubation. This was then spun 30 seconds at 16,000g and supernatant aspirated and discarded. The pellet was washed twice by completely resuspending it in 0.5mL of PE (Qiaex2 kit) then spinning again to pellet and discard supernatant. After this, the supernatant was completely removed and pellet dried 10-15min at room temperature. The pellet was resuspended in 30uL of EB (10mM Tris) and incubated at 50°C for 10min, then spun 30s as before. Supernatant was aspirated to a new tube, then the pellet was again resuspended with 30uL of EB and incubated at 50°C for 10 minutes. This was spun again, the supernatant was pooled with the previous eluted supernatant, and the genomic DNA content was estimated with a Qubit fluorimetric Broad Range dsDNA kit. Yield was typically about 15ug by this metric, with a Nanodrop returning a 260/280nm ratio of about 1.9 and a 260/230 ratio of about 1.8–2.3.
PCR of double-barcode amplicons for Illumina sequencing
To amplify the double-barcode loci for amplicon sequencing we devised a protocol that robustly amplifies from >1ug (qubit-estimated) of genomic dsDNA from the previous extraction step, with minimal UMI-primer carryover (S8 Fig). We setup 50uL PCR reactions with 1x OneTaq GC Buffer (NEB), 400uM of additional MgSO_4, 200uM each dNTP, 500nM each primer, 0.25uL of OneTaq polymerase (NEB), and 1ug of genomic DNA sample from the previous extraction step. For this first round we used primers as denoted in S6 File, S10 and S11 Tables, which amplify from the engineered priming sites to amplify each double-barcode locus while adding on a random barcode (UMI) and known barcode (sample index) outside of the amplified locus. These were run on a BioRad T100 thermocycler to melt at 94°C for 4 minutes, before going through 4 cycles of 94°C for 1 minute, 52°C for 30 seconds, and 68°C for 1minute. Eight 50uL reactions were run for each timepoint sampled in this work, and so the products of this first reaction were pooled together and ethanol precipitated to concentrate. These were resuspended in 25uL of water and immediately purified using 50uL of Ampure XP beads, washing twice with 80% freshly prepared ethanol. After drying, we added 20uL water, resuspended the beads, incubated for 5 minutes at room temperature, then added 11uL of a 20% PEG 8000 2.5M NaCl mixture and mixed the solution well with a pipette. After 5 minutes at room temperature, this was separated using a magnetic rack for 5 minutes, then the supernatant was removed to a fresh new tube. Presumably the large genomic DNA remain bound on the Ampure beads in this buffer, while the short amplicons are in the supernatant. So, the supernatant was ethanol precipitated, resuspended in 17uL distilled water, and quantified using a Qubit High-Sensitivity dsDNA assay. 15uL of the first round products were mixed into a second PCR reaction that contained 1x KAPA HiFi GC Buffer (Roche), 300uM dNTPs, 300uM each primer, and 0.02 U/uL KAPA HiFi DNA Polymerase (Roche) in a 30uL volume. For this round 2 PCR, we used sample multiplexing primers as denoted in S11 Table that amplify the double-barcode locus but add on the p5/p7 Illumina sequences as well as additional sample indices in the index-read position. This was run on a BioRad T100 thermocycler as 95°C for 1 minute to melt, then 16 cycles of 98°C for 20 seconds, 55°C for 15 seconds, and 72°C for 15 seconds. The products were gel purified (Zymo Gel Clean and Concentrator), then quantified using Qubit High-Sensitivity dsDNA fluorimeter kit before submission for sequencing on Illumina HighSeq runs at NovoGene (S8 Table).
Illumina double-barcode amplicon-sequencing data analysis pipeline
This Illumina sequencing data was processed to generate per-lineage fitnesses for each selection. This pipeline was written in Nextflow and is available at the git repos linked at the start of Methods. Three lanes of Illumina HighSeq data were generated by NovoGene (S8 Table), with sample multiplexing using the primer indices as indicated above. We first used `itermae`(https://gitlab.com/darachm/itermae/) to apply fuzzy regular expressions in robustly extracting the sample index, UMI, and lineage barcodes from each set of forward/reverse reads and filter on adequate read quality. Each different type of barcode on each read was clustered using `starcode`(https://github.com/gui11aume/starcode) [111], with the lineage barcodes generated from the long-read barcode-annotation step being clustered along with the lineage barcodes from the Illumina data to correct any small systematic misannotations. Counts per sample for each unique (distinct UMI) pair of lineage barcodes were analyzed to estimate chimera rates by fitting a model of how each sample and lineage barcode on either the forward or reverse read predicts the observation of these two specific sample and lineage barcodes together. We fit this model to lineage barcode combinations that we did not mate together (thus known chimeras), and found this to well fit the abundances of known chimera combinations (R^2 of 0.98, 0.94, and 0.95 for each replicate A, B, and C respectively). We then used this model to subtract the expected contribution of chimeras to each lineage barcode combinations, and the total chimera adjustment (1.8%, 4.9%, and 4.8% of for each replicate A, B, and C respectively) appears reasonable given the stated lower limit (~1.5%) on chimera rate using Illumina’s ExAmp chemistry. These chimera-adjusted counts were then filtered for lineages that appear in at least three timepoints or are seen with at least five counts in only one timepoint. These counts are then subject to analysis with FitSeq [77], software that iteratively fits a model of Wrightian fitness to the counts data while modeling the effect of a rising population average fitness on each individual lineage’s relative proportions (i.e. abundance counts). We adapted the reference python version of this tool (https://github.com/FangfeiLi05/PyFitSeq) [78] to elaborate better control of the model fitting process and user interface, as well as packaging on PyPi (https://github.com/darachm/fitseq, https://pypi.org/project/fitseq/) [112]. Various GNU tools, such as parallel [113], were instrumental to this pipeline. The resulting fitness per double-barcode lineage in each replicate were then analyzed in R.
Analysis of lineage fitnesses into ORF-ORF PPI calls
Using R [114] we ran several scripts (available in git repo linked at start of Methods section). First we evaluated some quality-control metrics and filtered out double-barcode lineages that (1) did not fit a model by FitSeq with at least an R^2 of 0.5 or (2) had less than 5 counts in the initial timepoint sample. In the data that passed this filter, we excluded putative dropout lineages (as described in the text). For each group of lineage barcodes representing an ORF-ORF combination, we identified the lineages that were observed to be at least one standard deviation (of the distribution of all fitnesses for all lineages) away from the specific median fitness of this ORF-ORF combination. We removed these putative dropouts as long as at least three lineages remained, starting with the most extreme absolute deviation in fitness. This conservative heuristic removed some outlier lineages (S6 Fig), but there is room for further technical development to identify and mitigate these dropouts before measurement (Discussion). To facilitate interpretation we re-scaled each replicate’s lineage fitnesses such that 0 was the median fitness of the NULL-F[1,2 [x NULL-F[3] double-negative controls and 1 was the median fitness of the NULL-F[1,2] x F[1,2]-F[3] covalent fusion positive control, however we believe this should not affect the calls of PPIs as each scaling and statistical test is performed within each replicate.
We used a t-test to detect groups of lineages with higher than expected fitnesses, but false-positives can arise if the group variance of fitnesses is very small by random sampling. To make the analysis more robust to this, we moderated the variance of each group of ORF-ORF combinations towards the variance for that set of NULL-F[1,2] x vORF-F[3] controls (S9 Fig). The concept is widely used in gene expression analysis [96]. We used this moderated variance approach to test if the fitnesses of each ORF-ORF combination group of lineages is higher than the fitnesses of the group of NULL-F[1,2] x vORF-F[3] control lineages. We then used the Benjamini and Hochberg method [115] on the distribution of calculated p-values to generate a False Discovery Rate, and called a hit in a replicate as an ORF-ORF combination with an FDR < 0.05. See the analysis scripts in the pipeline as linked from the beginning of Methods section.
Detecting potential biological significance of PPI calls
To interpret the hits and shed light on how these might help generate biological hypotheses for future in vivo or in vitro assays, we analyzed the list of 74 PPIs where a hit had been detected in at least one of three replicates. Looking at the IMEX Coronavirus dataset provided by IntAct [80], we filtered this table for interactions between human and SARS-CoV-2 proteins of types "physical association", "direct interaction", "proximity", or "association", then used a hypergeometric test to determine how likely it was to that we called these 74 ORF-ORF combinations as a PPI, given that list of other putative PPIs. We then used the `clusterProfiler`package [82] to test for significant enrichment of PROSITE domain annotations, annotations of "membrane" or "intramembrane" domains in Uniprot, Pathway annotations from Pathway Commons (v12, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3013659/), and GO terms obtained from Uniprot (GO version 2022-03-22). Reported p-values are all adjusted using the Benjamini and Hochberg method [115].
Using REDIseq to efficiently parse the PPIseq haploid viral-ORF strain library
Cell pools of haploid yeast containing mDHFR tagged-ORF expression constructs were generated using methods described above. For the three pools of haploid yeast expressing ORFs similar to ORFs in the SARS-CoV-2 genome, these were inoculated and grown overnight in 5mL of YPD+Hygromycin. In the morning, these were collected by centrifugation and resuspended in the same volume of distilled sterile water, then diluted 1:100 in the same. These were counted by hemacytometer and adjusted to a density of approximately 5e5 cells per mL. Then, we used a Labcyte Echo 550 to dispense 7.5nL or 5nL into all wells of four 384-well Labcyte-approved plates containing 30uL of YPD+Hygromycin+Carbenicillin in each well. There were grown overnight at 30°C, and for two of the three pools these were agitated and OD was measured before re-innoculating into empty wells using LabCyte dispensing of 7.5nL or 5nL of the pre-cultured and diluted cell pool prepared similarly as above. pLSL53 was the exception to this, and instead we dispensed droplets into all wells under the logic of the established colony competitively excluding the potentially invasive new cell. All of these underwent four rounds of re-innoculation with dilute cell suspension and overnight culture at 30°C, before 30uL of 30% glycerol was added to each well. This resuspended array was pinned to YPD before the primary liquid culture (at 15% glycerol) was frozen at -80°C. A complementary array of iSeq2 alpha strains [52] was pinned out from frozen stocks, and the subsequent day the two arrays of MATa (vORF PPIseq strains) and MATalpha (iSeq2 interaction barcoders) were pinned together and grown overnight on YPD. These matings were pinned to SC-met-lys media to select diploids, then to YPGal to induce recombination of the barcodes and split-URA3 marker. The next day, arrays were pinned to SC-URA to select for these faithful double-ORF double-barcode diploids, and these resulting arrays were scraped into pools and DNA was extracted using the same procedure as described above for the pool-selection samples. PCR was performed similar to as described above, but with less input genomic DNA (200ng), more cycles during PCR (8 for first round and 20 for second round), and different primers for amplification of the iSeq2 side of the amplicon with appropriate sample indicies (S11 Table). Products were gel-purified (Zymo Gel Clean and Concentrate kit), quantified with Qubit, and pooled for sequencing (S8 Table). This sequencing was analyzed to determine colonies that appeared to be pure colonies of yeast containing particular known vORF barcodes. Colonies with barcodes annotated (above) as mapping to a clearly correct ORF were identified. With these results we identified colonies that comprise replicate sets of yeast vORF colonies. With this information we used a Singer PIXL colony picking robot to rearray the desired colonies to liquid media in a 384-well plate with 75uL YPD+Hyg+Carb. After a day of growth, obviously empty wells were re-picked by the PIXL, and after another night of growth the collection was brought to 15% glycerol and frozen.
Confirming hits of the PPiSeq screen
To verify that the method of scaling the established mDHFR PPI assay works with the scaling method of PPIseq, we repeated the assay with clonal arrays on agar. Colonies containing expression constructs of mDHFR-tagged vORFs were arrayed as previously described (REDIseq), so we picked colonies representing several vORFs that were participants in detected PPIs. We then also re-generated clonal strains expressing mDHFR-tagged hORFs. To do this, we used Gateway LR Clonase as per manufacturer’s instructions to integrate the ORFs from entry vectors (picked from the human ORFeome collection v8.1) into a pre-barcoded pool of pSL51-derived plasmids, in vitro. These pools were each transformed into competent 10beta cells prepared with a Mix and Go kit (Zymo Research), selected with LB+Carb media, and clones were picked into a 96-well plate for screening. These were screened first for growth on LB+Chloramphenicol to eliminate colonies that retained the Gateway negative-selection cassette, then screened by PCR across the ORF (primers oSL269 and oSL26). Plasmids with the right ORF inserted were transformed and integrated into the landing pad in yeast strain ySL508 using a similar transformation procedure as previously described, and clones were again screened with colony PCR using oSL231 against oSL10 (for pSL51-derivatives) and oSL1160 against oSL450 (for pSL737-derivatives). Yeast colony PCR products were Sanger sequenced, and a clone of each ORF construct were picked into the margin columns and rows of a 96-well plate (7 vORFs and 11 hORFs, as in S19 Table). These were then crossed in the same multi-well plate in liquid YPD for several hours before being pinned to YPD+G418+Hyg agar media. Colonies were then pinned to YPGal, and the resulting colonies after growth were pinned to SD+his+leu to select for faithful double-ORF double-barcode recombinants and frozen at -80°C with 15% glycerol. The same procedure was carried out for the controls, wherein the pooled libraries pLSL47 (F[1,2]-F[3] positive control), pLSL48 (F[1,2]-NULL negative control), and pLSL49 (NULL-F[3] negative control) were grown in liquid media overnight and then crossed against each other and the relevant complementary ORF assay strains—ie pLSL48 (F[1,2]-NULL) against all F[3]-tagged vORFs and pLSL49 (NULL-F[3]) against all F[1,2]-tagged hORF strains. The final mated diploid assay strains were grown overnight in SD+His+Leu, then back-diluted 1/20 into SD+His+Leu+1ug/mL MTX liquid media and grown until saturation, then pinned in quadruplicate to SD+His+Leu+MTX media with concentrations of 100, 10, 1, and 0 ug/mL MTX. These were grown two days, then scanned. The images were analyzed using a custom R script (see Code Access at beginning of Methods section) using EBImage to segment colony from background. filter out noise, and register colonies to positions before total colony size was used as a proxy for growth during this assay. The 100ug/mL MTX analysis was reported because it was the clearest effect and the effects appeared completely consistent between different MTX concentrations.
Supporting information
(HTML)
(HTML)
(HTML)
(GBK)
(GBK)
(TIF)
Raw scans of plates used for initial verification that modified plasmids and modified strains still reproduce the mDHFR assay results. Layout as described in S13 Table.
(PDF)
For each timepoint (x-axis), the relative frequency of each lineage (y-axis) is shown for each replicate and each ORF-ORF combination (panel facetting). ’Hit’ or ’non-significant’ label indicates if this data was called as a significant PPI hit, and line color indicates the lineage fitness.
(PDF)
Raw scans of some regenerated strains assaying PPIseq hits were grown using clonal growth on agar MTX plates (see Methods). Layout as described in S19 Table.
(TXT)
PPIseq double-barcode counts are well correlated. Two biological samples of the initial timepoint’s lineage pool were subject to independent DNA extraction, PCR, and sequencing. A) Chimera-adjusted counts from each sample (Methods). B) R^2 for a subset of the dataset (y-axis), calculated for all observations with both counts greater than the minimum counts threshold (x-axis).
(TIF)
Dropout lineages are consistently less fit and not explained by ORF mutations. A) Examples of lineages for tagged human DHFR and tagged vORF positive controls that are expected to have high fitness. Proportion of counts (y-axis) per lineage is shown for each timepoint (x-axis). Some lineages are not detected in some replicates. B) The un-scaled fitness of each lineage for all tagged human DHFR and tagged vORF positive control lineages are plotted for each replicate (x and y axes). Color indicates if the lineage contains an ORF that was annotated by the long-read plasmid annotation to contain a non-synonymous mutation.
(TIF)
Examples of outlier filtering. For each panel the un-scaled fitness of each lineage for all tagged human DHFR and a certain tagged vORF positive control lineages are plotted, for replicate A (x-axis) against replicate B (y-axis). Color indicates if the lineage passed the outlier filer or not (FALSE indicates exclusion).
(TIF)
NSP2 tends to have a slight fitness advantage with human DHFR positive control. For each vORF (x-axis) for each replicate (color), the scaled fitness of each lineage of tagged human DHFR and tagged vORF lineages are plotted (y-axis), and the mean of each grouping is indicated by the solid line.
(TIF)
New PPIseq priming sites and PCR protocol are efficient and effective at generating UMI-containing libraries. A) qPCR of Round 1 PCR (genomic DNA template and genomic priming sites) and Round 2 PCR (purified Round 1 template with p5/p7-containing primers) were set up with Kapa HiFi polymerase and SybrGreen dye (Methods). Plotted is the "relative concentration" of the template as estimated from 2^{-C_t} where C_t is the maximum second-derivative cycle threshold calculated by the AB HT7900 qPCR machine. Dilution is from a serial dilution of the template. B) A similar experiment was performed, but the Round 2 PCR was carried out with template either from a normal Round 1 PCR (primers added, thermocycled, then template purified), or from a reaction where the primers were withheld but a purified product of a previous Round 1 template was added, or from a reaction where the primers were withheld until after the cycling but primers and a purified Round 1 template were added just before purification. Three different purification strategies (raw with no purification, column purification, or ExoI treatment) show different efficacy at repressing the primer-after reaction to the same level as the no-primer reaction.
(TIF)
A summary of the variance shrinking approach to moderate false positives. A), B), C) show for each replicate the observed variance of fitnesses within an ORF-ORF lineage group (x-axis) and the moderated, or shrunk group fitness variance used for the one-sided t-test. D) The distribution (y-axis) of the variance in fitnesses per ORF-ORF group (x-axis) is shown for the observed and moderated variance (linetype) for each replicate (color).
(TIF)
REDIseq is a viable strategy for converting pooled PPIseq haploid precursor yeast strain pools into arrayed libraries. We used dilution to separate pools of PPIseq haploids (yLSL52, yLSL53, yLSL54, yeast that contained integrated F[3]-tagged vORFs) 9 384-well plates. Many wells were left empty by this random dispersion process, but the entire plates were subject to REDIseq wherein these were crossed to plates of compatible iSeq 2.0 strains with known barcodes. For each tagged vORF (x-axis), we recovered some number of uniquely-barcoded haploid clones (y-axis).
(TIF)
The number of lineages per ORF-ORF combination is well dispersed. The distribution (y-axis) of the number of double-barcoded lineages (x-axis) that assay each ORF-ORF pair is shown. Linetype indicates different replicates.
(TIF)
(PDF)
(CSV)
(CSV)
(CSV)
This table shows which vORF (viral ORF) a particular barcode corresponds to.
(CSV)
This table shows which hORF (human ORF) a particular barcode corresponds to.
(CSV)
(CSV)
This table shows which pair of barcodes corresponds to which lineage tracked in the double-barcode sequencing.
(TXT)
(CSV)
(CSV)
This table denotes the sequence of the sample multiplexing primers used for the PPIseq screen, the main bulk of the paper (as shown in main second figure). Nomenclature corresponds to as in S6 File Diagram.
(CSV)
This table denotes the sequence of the sample multiplexing primers used for the REDIseq (isolation of the single vORF haploid strains). Nomenclature corresponds to as in S6 File Diagram.
(CSV)
(CSV)
The plate layout for the experiment to initially verify that modified plasmids and modified strains still reproduce the same mDHFR assay results (S1 Fig).
(CSV)
The sources of each human ORF used for initial verification that modified plasmids and modified strains still reproduce the mDHFR assay results.
(CSV)
A table of the PPIs amongst the ORFs used for initial verification that were previously reported in IntAct.
(CSV)
(CSV)
A table of the fitness of each double-barcoded lineage in each replicate, essentially our best estimate of the PPIseq signal for that lineage in that replicate of selection.
(TXT)
A table of, for each combination of ORFs, if each combination was detected as a PPI, the FDR calculated, variances, effect sizes, lineages, mutation flags, and other features described in the table header.
(CSV)
Layout of strains in the plate regenerated to confirm PPI hits using a plate-growth assay. See S3 Fig.
(CSV)
Acknowledgments
We thank the members of Sasha Levy’s lab at SLAC at Stanford, Gavin Sherlock’s lab at Stanford Genetics, and the Steinmetz and Davis groups at the Stanford Genome Technology Center. In particular we thank Zhimin Liu, FangFei Li, Fabiana Gnatta, David Catoe, Michi Henri Tai, Angela Chu, Joe Horecka, Kevin Roy, Katja Schwartz, Weiyi Li, and Takeshi Matsui. Thanks to the University of Arizona Arizona Genomics Institute (AGI) core sequencing team for the paid PacBio sequencing service. We thank Fredrick Roth’s group and others for quickly generating and freely sharing their collection of SARS-CoV-2 sequences, as well as the ORFeome Collaboration for generating the human ORFeome collection. The BioIcon of a genome sequencer in Fig 2 panel A is title "genomesequencer-2" and is by DBCLS, licensed under CC-BY 4.0 Unported https://creativecommons.org/licenses/by/4.0/. We thank the American people for the continued investment, by means of the National Institutes of Health, that enabled the training and sharing of the body of knowledge, techniques, and vision expressed here.
Data Availability
Please see the "Data Access" section of the manuscript, copied here: "Sequencing datasets are available on the Sequence Read Archive. For long-read sequencing (PacBio and Nanopore) used to annotate DNA barcodes with the linked ORF they represent, see PRJNA1073210. For Illumina sequencing of the double-barcode amplicon used to quantify the fitness of each diploid lineage, see PRJNA1073201. See the supplemental table (S13) for the more metadata to contextualize these. For other intermediate files see the supplemental files for this paper, or retrieve them as a sqlite3 database file from this OSF repository (doi.org/10.17605/OSF.IO/B8G3H). For example, because of the particularly large size of the counts table, that table is available as a separate sqlite3 database in the linked OSF repository. Raw TIFF scans of the agar plates used for validation of the modified mDHFR assay (Fig 1) or PPI re-testing on agar media (analyzed for Fig 3), is available in the above OSF repository in an appropriate folder."
Funding Statement
This work was solely funded by the National Institutes of Health National Institute for Allergies and Infections Diseases (https://www.niaid.nih.gov/) grant R01 AI164530 awarded to Dr. Sasha Levy (SL) and carried out within Dr. Sasha Levy’s research group at Stanford University. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1.Rawlinson SM, Moseley GW. The nucleolar interface of RNA viruses. Cell Microbiol. 2015;17: 1108–1120. doi: 10.1111/cmi.12465 [DOI] [PubMed] [Google Scholar]
- 2.Van Vliet K, Mohamed MR, Zhang L, Villa NY, Werden SJ, Liu J, et al. Poxvirus Proteomics and Virus-Host Protein Interactions. Microbiol Mol Biol Rev. 2009;73: 730–749. doi: 10.1128/MMBR.00026-09 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Mänz B, Dornfeld D, Götz V, Zell R, Zimmermann P, Haller O, et al. Pandemic Influenza A Viruses Escape from Restriction by Human MxA through Adaptive Mutations in the Nucleoprotein. PLOS Pathog. 2013;9: e1003279. doi: 10.1371/journal.ppat.1003279 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Jureka AS, Kleinpeter AB, Tipper JL, Harrod KS, Petit CM. The influenza NS1 protein modulates RIG-I activation via a strain-specific direct interaction with the second CARD of RIG-I. J Biol Chem. 2020;295: 1153–1164. doi: 10.1074/jbc.RA119.011410 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Jureka AS, Kleinpeter AB, Cornilescu G, Cornilescu CC, Petit CM. Structural Basis for a Novel Interaction between the NS1 Protein Derived from the 1918 Influenza Virus and RIG-I. Structure. 2015;23: 2001–2010. doi: 10.1016/j.str.2015.08.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Riegger D, Hai R, Dornfeld D, Mänz B, Leyva-Grado V, Sánchez-Aparicio MT, et al. The nucleoprotein of newly emerged H7N9 influenza A virus harbors a unique motif conferring resistance to antiviral human MxA. J Virol. 2015;89: 2241–2252. doi: 10.1128/JVI.02406-14 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Rebsamen M, Kandasamy RK, Superti-Furga G. Protein interaction networks in innate immunity. Trends Immunol. 2013;34: 610–619. doi: 10.1016/j.it.2013.05.002 [DOI] [PubMed] [Google Scholar]
- 8.Brito AF, Pinney JW. Protein–Protein Interactions in Virus–Host Systems. Front Microbiol. 2017;8. doi: 10.3389/fmicb.2017.01557 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Goodacre N, Devkota P, Bae E, Wuchty S, Uetz P. Protein-protein interactions of human viruses. Semin Cell Dev Biol. 2020;99: 31–39. doi: 10.1016/j.semcdb.2018.07.018 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.von Brunn A, Teepe C, Simpson JC, Pepperkok R, Friedel CC, Zimmer R, et al. Analysis of Intraviral Protein-Protein Interactions of the SARS Coronavirus ORFeome. PLoS ONE. 2007;2. doi: 10.1371/journal.pone.0000459 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Gavin A-C, Bösche M, Krause R, Grandi P, Marzioch M, Bauer A, et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature. 2002;415: 141–147. doi: 10.1038/415141a [DOI] [PubMed] [Google Scholar]
- 12.Krogan NJ, Cagney G, Yu H, Zhong G, Guo X, Ignatchenko A, et al. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature. 2006;440: 637–643. doi: 10.1038/nature04670 [DOI] [PubMed] [Google Scholar]
- 13.Gordon DE, Jang GM, Bouhaddou M, Xu J, Obernier K, White KM, et al. A SARS-CoV-2 protein interaction map reveals targets for drug repurposing. Nature. 2020;583: 459–468. doi: 10.1038/s41586-020-2286-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.May DG, Martin-Sancho L, Anschau V, Liu S, Chrisopulos RJ, Scott KL, et al. A BioID-Derived Proximity Interactome for SARS-CoV-2 Proteins. Viruses. 2022;14: 611. doi: 10.3390/v14030611 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Nooren IMA, Thornton JM. Structural Characterisation and Functional Significance of Transient Protein–Protein Interactions. J Mol Biol. 2003;325: 991–1018. doi: 10.1016/s0022-2836(02)01281-0 [DOI] [PubMed] [Google Scholar]
- 16.Nooren IMA, Thornton JM. Diversity of protein–protein interactions. EMBO J. 2003;22: 3486–3492. doi: 10.1093/emboj/cdg359 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Jensen LJ, Bork P. Not Comparable, But Complementary. Science. 2008;322: 56–57. doi: 10.1126/science.1164801 [DOI] [PubMed] [Google Scholar]
- 18.Yu X, Ivanic J, Memišević V, Wallqvist A, Reifman J. Categorizing Biases in High-Confidence High-Throughput Protein-Protein Interaction Data Sets. Mol Cell Proteomics. 2011;10: M111.012500. doi: 10.1074/mcp.M111.012500 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Richards AL, Eckhardt M, Krogan NJ. Mass spectrometry-based protein–protein interaction networks for the study of human diseases. Mol Syst Biol. 2021;17: e8792. doi: 10.15252/msb.20188792 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Altan-Bonnet N. Lipid Tales of Viral Replication and Transmission. Trends Cell Biol. 2017;27: 201–213. doi: 10.1016/j.tcb.2016.09.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Fields S, Song O. A novel genetic system to detect protein–protein interactions. Nature. 1989;340: 245. doi: 10.1038/340245a0 [DOI] [PubMed] [Google Scholar]
- 22.Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, et al. A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae. Nature. 2000;403: 623. doi: 10.1038/35001009 [DOI] [PubMed] [Google Scholar]
- 23.Uetz P, Dong Y-A, Zeretzke C, Atzler C, Baiker A, Berger B, et al. Herpesviral Protein Networks and Their Interaction with the Human Proteome. Science. 2006;311: 239–242. doi: 10.1126/science.1116804 [DOI] [PubMed] [Google Scholar]
- 24.Remy I, Michnick SW. Clonal selection and in vivo quantitation of protein interactions with protein-fragment complementation assays. Proc Natl Acad Sci. 1999;96: 5394–5399. doi: 10.1073/pnas.96.10.5394 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Pelletier JN, Campbell-Valois F-X, Michnick SW. Oligomerization domain-directed reassembly of active dihydrofolate reductase from rationally designed fragments. Proc Natl Acad Sci. 1998;95: 12141–12146. doi: 10.1073/pnas.95.21.12141 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Zhang L, Villa NY, Rahman MM, Smallwood S, Shattuck D, Neff C, et al. Analysis of Vaccinia Virus−Host Protein−Protein Interactions: Validations of Yeast Two-Hybrid Screenings. J Proteome Res. 2009;8: 4311–4318. doi: 10.1021/pr900491n [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.McCraith S, Holtzman T, Moss B, Fields S. Genome-wide analysis of vaccinia virus protein–protein interactions. Proc Natl Acad Sci. 2000;97: 4879–4884. doi: 10.1073/pnas.080078197 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Tarassov K, Messier V, Landry CR, Radinovic S, Molina MMS, Shames I, et al. An in Vivo Map of the Yeast Protein Interactome. Science. 2008;320: 1465–1470. doi: 10.1126/science.1153878 [DOI] [PubMed] [Google Scholar]
- 29.Schlecht U, Liu Z, Blundell JR, St.Onge RP, Levy SF. A scalable double-barcode sequencing platform for characterization of dynamic protein-protein interactions. Nat Commun. 2017;8: 15586. doi: 10.1038/ncomms15586 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Schlecht U, Miranda M, Suresh S, Davis RW, St.Onge RP. Multiplex assay for condition-dependent changes in protein–protein interactions. Proc Natl Acad Sci. 2012;109: 9213–9218. doi: 10.1073/pnas.1204952109 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Celaj A, Schlecht U, Smith JD, Xu W, Suresh S, Miranda M, et al. Quantitative analysis of protein interaction network dynamics in yeast. Mol Syst Biol. 2017;13: 934. doi: 10.15252/msb.20177532 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Yachie N, Petsalaki E, Mellor JC, Weile J, Jacob Y, Verby M, et al. Pooled-matrix protein interaction screens using Barcode Fusion Genetics. Mol Syst Biol. 2016;12: 863. doi: 10.15252/msb.20156660 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Evans-Yamamoto D, Rouleau FD, Nanda P, Makanae K, Liu Y, Després PC, et al. Barcode fusion genetics-protein-fragment complementation assay (BFG-PCA): tools and resources that expand the potential for binary protein interaction discovery. Nucleic Acids Res. 2022; gkac045. doi: 10.1093/nar/gkac045 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Levy ED, Landry CR, Michnick SW. How Perfect Can Protein Interactomes Be? Sci Signal. 2009;2: pe11–pe11. doi: 10.1126/scisignal.260pe11 [DOI] [PubMed] [Google Scholar]
- 35.Baek M, DiMaio F, Anishchenko I, Dauparas J, Ovchinnikov S, Lee GR, et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science. 2021. [cited 15 Jul 2021]. doi: 10.1126/science.abj8754 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596: 583–589. doi: 10.1038/s41586-021-03819-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Humphreys IR, Pei J, Baek M, Krishnakumar A, Anishchenko I, Ovchinnikov S, et al. Computed structures of core eukaryotic protein complexes. Science. 374: eabm4805. doi: 10.1126/science.abm4805 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Ruff KM, Pappu RV. AlphaFold and Implications for Intrinsically Disordered Proteins. J Mol Biol. 2021;433: 167208. doi: 10.1016/j.jmb.2021.167208 [DOI] [PubMed] [Google Scholar]
- 39.Xue B, Uversky VN. Intrinsic Disorder in Proteins Involved in the Innate Antiviral Immunity: Another Flexible Side of a Molecular Arms Race. J Mol Biol. 2014;426: 1322–1350. doi: 10.1016/j.jmb.2013.10.030 [DOI] [PubMed] [Google Scholar]
- 40.Uversky VN. Intrinsically Disordered Proteins and Their “Mysterious” (Meta)Physics. Front Phys. 2019;7. doi: 10.3389/fphy.2019.00010 [DOI] [Google Scholar]
- 41.Kumar N, Kaushik R, Tennakoon C, Uversky VN, Longhi S, Zhang KYJ, et al. Comprehensive Intrinsic Disorder Analysis of 6108 Viral Proteomes: From the Extent of Intrinsic Disorder Penetrance to Functional Annotation of Disordered Viral Proteins. J Proteome Res. 2021;20: 2704–2713. doi: 10.1021/acs.jproteome.1c00011 [DOI] [PubMed] [Google Scholar]
- 42.Tokuriki N, Oldfield CJ, Uversky VN, Berezovsky IN, Tawfik DS. Do viral proteins possess unique biophysical features? Trends Biochem Sci. 2009;34: 53–59. doi: 10.1016/j.tibs.2008.10.009 [DOI] [PubMed] [Google Scholar]
- 43.Halehalli RR, Nagarajaram HA. Molecular principles of human virus protein–protein interactions. Bioinformatics. 2015;31: 1025–1033. doi: 10.1093/bioinformatics/btu763 [DOI] [PubMed] [Google Scholar]
- 44.Walhout AJM, Temple GF, Brasch MA, Hartley JL, Lorson MA, van den Heuvel S, et al. [34] GATEWAY recombinational cloning: Application to the cloning of large numbers of open reading frames or ORFeomes. In: Thorner J, Emr SD, Abelson JN, editors. Methods in Enzymology. Academic Press; 2000. pp. 575–IN7. [DOI] [PubMed] [Google Scholar]
- 45.Wiemann S, Pennacchio C, Hu Y, Hunter P, Harbers M, Amiet A, et al. The ORFeome Collaboration: a genome-scale human ORF-clone resource. Nat Methods. 2016;13: 191–192. doi: 10.1038/nmeth.3776 [DOI] [PubMed] [Google Scholar]
- 46.Huttlin EL, Bruckner RJ, Navarrete-Perea J, Cannon JR, Baltier K, Gebreab F, et al. Dual proteome-scale networks reveal cell-specific remodeling of the human interactome. Cell. 2021;184: 3022–3040.e28. doi: 10.1016/j.cell.2021.04.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Luck K, Kim D-K, Lambourne L, Spirohn K, Begg BE, Bian W, et al. A reference map of the human binary protein interactome. Nature. 2020;580: 402–408. doi: 10.1038/s41586-020-2188-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Yu X, Bian X, Throop A, Song L, Moral LD, Park J, et al. Exploration of Panviral Proteome: High-Throughput Cloning and Functional Implications in Virus-host Interactions. Theranostics. 2014;4: 808–822. doi: 10.7150/thno.8255 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Pellet J, Tafforeau L, Lucas-Hourani M, Navratil V, Meyniel L, Achaz G, et al. ViralORFeome: an integrated database to generate a versatile collection of viral ORFs. Nucleic Acids Res. 2010;38: D371–D378. doi: 10.1093/nar/gkp1000 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Uetz P, Rajagopala SV, Dong Y-A, Haas J. From ORFeomes to Protein Interaction Maps in Viruses. Genome Res. 2004;14: 2029–2033. doi: 10.1101/gr.2583304 [DOI] [PubMed] [Google Scholar]
- 51.Liu Z, Miller D, Li F, Liu X, Levy SF. A large accessory protein interactome is rewired across environments. Landry CR, editor. eLife. 2020;9: e62365. doi: 10.7554/eLife.62365 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Liu X, Liu Z, Dziulko AK, Li F, Miller D, Morabito RD, et al. iSeq 2.0: A Modular and Interchangeable Toolkit for Interaction Screening in Yeast. Cell Syst. 2019;8: 338–344.e8. doi: 10.1016/j.cels.2019.03.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Chretien A-E, Gagnon-Arsenault I, Dubé AK, Barbeau X, Després PC, Lamothe C, et al. Extended linkers improve the detection of PPIs by DHFR PCA in living cells. Mol Cell Proteomics. 2017; mcp.TIR117.000385. doi: 10.1074/mcp.TIR117.000385 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Freschi L, Torres-Quiroz F, Dubé AK., Landry CR. qPCA: a scalable assay to measure the perturbation of protein–protein interactions in living cells. Mol Biosyst. 2013;9: 36–43. doi: 10.1039/c2mb25265a [DOI] [PubMed] [Google Scholar]
- 55.Diss G, Landry CR. Combining the Dihydrofolate Reductase Protein-Fragment Complementation Assay with Gene Deletions to Establish Genotype-to-Phenotype Maps of Protein Complexes and Interaction Networks. Cold Spring Harb Protoc. 2016;2016: pdb.prot090035. doi: 10.1101/pdb.prot090035 [DOI] [PubMed] [Google Scholar]
- 56.Diss G, Lehner B. The genetic landscape of a physical interaction. Barkai N, editor. eLife. 2018;7: e32472. doi: 10.7554/eLife.32472 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Inoue F, Kircher M, Martin B, Cooper GM, Witten DM, McManus MT, et al. A systematic comparison reveals substantial differences in chromosomal versus episomal encoding of enhancer activity. Genome Res. 2017;27: 38–52. doi: 10.1101/gr.212092.116 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Davis JE, Insigne KD, Jones EM, Hastings QA, Boldridge WC, Kosuri S. Dissection of c-AMP Response Element Architecture by Using Genomic and Episomal Massively Parallel Reporter Assays. Cell Syst. 2020. [cited 6 Jul 2020]. doi: 10.1016/j.cels.2020.05.011 [DOI] [PubMed] [Google Scholar]
- 59.Curran KA, Morse NJ, Markham KA, Wagman AM, Gupta A, Alper HS. Short Synthetic Terminators for Improved Heterologous Gene Expression in Yeast. ACS Synth Biol. 2015;4: 824–832. doi: 10.1021/sb5003357 [DOI] [PubMed] [Google Scholar]
- 60.Celaj A, Gebbia M, Musa L, Cote AG, Snider J, Wong V, et al. Highly Combinatorial Genetic Interaction Analysis Reveals a Multi-Drug Transporter Influence Network. Cell Syst. 2020;10: 25–38.e10. doi: 10.1016/j.cels.2019.09.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Yelagandula R, Bykov A, Vogt A, Heinen R, Özkan E, Strobl MM, et al. Multiplexed detection of SARS-CoV-2 and other respiratory infections in high throughput by SARSeq. Nat Commun. 2021;12: 3132. doi: 10.1038/s41467-021-22664-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Bendixsen DP, Roberts J, Townshend B, Hayden EJ. Phased nucleotide inserts for sequencing low-diversity RNA samples from in vitro selection experiments. RNA. 2020; rna.072413.119. doi: 10.1261/rna.072413.119 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Illumina. Effects of Index Misassignment on Multiplexing and Downstream Analysis. Illumina; 2018. https://www.illumina.com/content/dam/illumina-marketing/documents/products/whitepapers/index-hopping-white-paper-770-2017-004.pdf?linkId=36607862
- 64.Kircher M, Sawyer S, Meyer M. Double indexing overcomes inaccuracies in multiplex sequencing on the Illumina platform. Nucleic Acids Res. 2012;40: e3–e3. doi: 10.1093/nar/gkr771 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Levy ED, Kowarzyk J, Michnick SW. High-Resolution Mapping of Protein Concentration Reveals Principles of Proteome Architecture and Adaptation. Cell Rep. 2014;7: 1333–1340. doi: 10.1016/j.celrep.2014.04.009 [DOI] [PubMed] [Google Scholar]
- 66.Hsu YP, Kohlhaw GB. Leucine biosynthesis in Saccharomyces cerevisiae. Purification and characterization of beta-isopropylmalate dehydrogenase. J Biol Chem. 1980;255: 7255–7260. [PubMed] [Google Scholar]
- 67.Lee O-H, Lee J, Lee KH, Woo YM, Kang J-H, Yoon H-G, et al. Role of the focal adhesion protein TRIM15 in colon cancer development. Biochim Biophys Acta BBA—Mol Cell Res. 2015;1853: 409–421. doi: 10.1016/j.bbamcr.2014.11.007 [DOI] [PubMed] [Google Scholar]
- 68.Chen L, Willis SN, Wei A, Smith BJ, Fletcher JI, Hinds MG, et al. Differential Targeting of Prosurvival Bcl-2 Proteins by Their BH3-Only Ligands Allows Complementary Apoptotic Function. Mol Cell. 2005;17: 393–403. doi: 10.1016/j.molcel.2004.12.030 [DOI] [PubMed] [Google Scholar]
- 69.Kim D-K, Knapp JJ, Kuang D, Cassonnet P, Samavarchi-Tehrani P, Abdouni H, et al. A Flexible Genome-Scale Resource of SARS-CoV-2 Coding Sequence Clones. 2020. [cited 7 Apr 2020]. doi: 10.20944/preprints202004.0009.v1 [DOI] [Google Scholar]
- 70.Gietz RD, Schiestl RH. Large-scale high-efficiency yeast transformation using the LiAc/SS carrier DNA/PEG method. Nat Protoc. 2007;2: 38–41. doi: 10.1038/nprot.2007.15 [DOI] [PubMed] [Google Scholar]
- 71.Schiestl RH, Gietz RD. High efficiency transformation of intact yeast cells using single stranded nucleic acids as a carrier. Curr Genet. 1989;16: 339–346. doi: 10.1007/BF00340712 [DOI] [PubMed] [Google Scholar]
- 72.Gietz RD, Schiestl RH, Willems AR, Woods RA. Studies on the transformation of intact yeast cells by the LiAc/SS-DNA/PEG procedure. Yeast. 1995;11: 355–360. doi: 10.1002/yea.320110408 [DOI] [PubMed] [Google Scholar]
- 73.Mitrikeski PT. Yeast competence for exogenous DNA uptake: towards understanding its genetic component. Antonie Van Leeuwenhoek. 2013;103: 1181–1207. doi: 10.1007/s10482-013-9905-5 [DOI] [PubMed] [Google Scholar]
- 74.Hein C, Springael J-Y, Volland C, Haguenauer-Tsapis R, André B. NPI1, an essential yeast gene involved in induced degradation of Gap1 and Fur4 permeases, encodes the Rsp5 ubiquitin—protein ligase. Mol Microbiol. 1995;18: 77–87. doi: 10.1111/j.1365-2958.1995.mmi_18010077.x [DOI] [PubMed] [Google Scholar]
- 75.Truong D. A Library Screen for Yeast Mutants Defective in Transformation by the Lithium Acetate/single Stranded DNA/polyethylene Glycol Method. University of Manitoba. 2008. https://library-archives.canada.ca/eng/services/services-libraries/theses/Pages/item.aspx?idNumber=669240867
- 76.Bendel AM, Skendo K, Klein D, Schimada K, Kauneckaite-Griguole K, Diss G. Optimization of a deep mutational scanning workflow to improve quantification of mutation effects on protein-protein interactions. bioRxiv; 2023. p. 2023.10.23.563542. doi: 10.1101/2023.10.23.563542 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Li F, Salit ML, Levy SF. Unbiased Fitness Estimation of Pooled Barcode or Amplicon Sequencing Studies. Cell Syst. 2018;7: 521–525.e4. doi: 10.1016/j.cels.2018.09.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Hale JJ, Matsui T, Goldstein I, Mullis MN, Roy KR, Ville CN, et al. Genome-scale analysis of interactions between genetic perturbations and natural variation. bioRxiv; 2023. p. 2023.05.06.539663. doi: 10.1101/2023.05.06.539663 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Zhang Y, Guo R, Kim SH, Shah H, Zhang S, Liang JH, et al. SARS-CoV-2 hijacks folate and one-carbon metabolism for viral replication. Nat Commun. 2021;12: 1676. doi: 10.1038/s41467-021-21903-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Perfetto L, Pastrello C, del-Toro N, Duesbury M, Iannuccelli M, Kotlyar M, et al. The IMEx coronavirus interactome: an evolving map of Coronaviridae–host molecular interactions. Database. 2020;2020: baaa096. doi: 10.1093/database/baaa096 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Smith JD, Schlecht U, Xu W, Suresh S, Horecka J, Proctor MJ, et al. A method for high-throughput production of sequence-verified DNA libraries and strain collections. Mol Syst Biol. 2017;13: 913. doi: 10.15252/msb.20167233 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Yu G, Wang L-G, Han Y, He Q-Y. clusterProfiler: an R Package for Comparing Biological Themes Among Gene Clusters. OMICS J Integr Biol. 2012;16: 284–287. doi: 10.1089/omi.2011.0118 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Fogeron M-L, Montserret R, Zehnder J, Nguyen M-H, Dujardin M, Brigandat L, et al. SARS-CoV-2 ORF7b: is a bat virus protein homologue a major cause of COVID-19 symptoms? bioRxiv; 2021. p. 2021.02.05.428650. doi: 10.1101/2021.02.05.428650 [DOI] [Google Scholar]
- 84.de Melo GD, Perraud V, Alvarez F, Vieites-Prado A, Kim S, Kergoat L, et al. Neuroinvasion and anosmia are independent phenomena upon infection with SARS-CoV-2 and its variants. Nat Commun. 2023;14: 4485. doi: 10.1038/s41467-023-40228-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Wang R, Simoneau CR, Kulsuptrakul J, Bouhaddou M, Travisano KA, Hayashi JM, et al. Genetic Screens Identify Host Factors for SARS-CoV-2 and Common Cold Coronaviruses. Cell. 2021;184: 106–119.e14. doi: 10.1016/j.cell.2020.12.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Xu J, Gu W, Ji K, Xu Z, Zhu H, Zheng W. Sequence analysis and structure prediction of ABHD16A and the roles of the ABHD family members in human disease. Open Biol. 2018;8: 180017. doi: 10.1098/rsob.180017 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Shi X, Li X, Xu Z, Shen L, Ding Y, Chen S, et al. ABHD16A Negatively Regulates the Palmitoylation and Antiviral Function of IFITM Proteins. mBio. 2022;13: e02289–22. doi: 10.1128/mbio.02289-22 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Miyamoto Y, Itoh Y, Suzuki T, Tanaka T, Sakai Y, Koido M, et al. SARS-CoV-2 ORF6 disrupts nucleocytoplasmic trafficking to advance viral replication. Commun Biol. 2022;5: 1–15. doi: 10.1038/s42003-022-03427-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Kim D-K, Weller B, Lin C-W, Sheykhkarimli D, Knapp JJ, Dugied G, et al. A proteome-scale map of the SARS-CoV-2–human contactome. Nat Biotechnol. 2023;41: 140–149. doi: 10.1038/s41587-022-01475-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Angelini MM, Akhlaghpour M, Neuman BW, Buchmeier MJ. Severe Acute Respiratory Syndrome Coronavirus Nonstructural Proteins 3, 4, and 6 Induce Double-Membrane Vesicles. mBio. 2013;4: doi: 10.1128/mBio.00524-13 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Roingeard P, Eymieux S, Burlaud-Gaillard J, Hourioux C, Patient R, Blanchard E. The double-membrane vesicle (DMV): a virus-induced organelle dedicated to the replication of SARS-CoV-2 and other positive-sense single-stranded RNA viruses. Cell Mol Life Sci. 2022;79: 425. doi: 10.1007/s00018-022-04469-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Santerre M, Arjona SP, Allen CN, Shcherbik N, Sawaya BE. Why do SARS-CoV-2 NSPs rush to the ER? J Neurol. 2021;268: 2013–2022. doi: 10.1007/s00415-020-10197-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Li W, Miller D, Liu X, Tosi L, Chkaiban L, Mei H, et al. Arrayed in vivo barcoding for multiplexed sequence verification of plasmid DNA and demultiplexing of pooled libraries. bioRxiv; 2023. p. 2023.10.13.562064. doi: 10.1101/2023.10.13.562064 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Zhong Q, Pevzner SJ, Hao T, Wang Y, Mosca R, Menche J, et al. An inter-species protein–protein interaction network across vast evolutionary distance. Mol Syst Biol. 2016;12: 865. doi: 10.15252/msb.20156484 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Sekigawa M, Kunoh T, Wada S-I, Mukai Y, Ohshima K, Ohta S, et al. Comprehensive Screening of Human Genes with Inhibitory Effects on Yeast Growth and Validation of a Yeast Cell-Based System for Screening Chemicals. J Biomol Screen. 2010;15: 368–378. doi: 10.1177/1087057110363822 [DOI] [PubMed] [Google Scholar]
- 96.Law CW, Chen Y, Shi W, Smyth GK. voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol. 2014;15: R29. doi: 10.1186/gb-2014-15-2-r29 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Faure AJ, Schmiedel JM, Baeza-Centurion P, Lehner B. DiMSum: an error model and pipeline for analyzing deep mutational scanning data and diagnosing common experimental pathologies. Genome Biol. 2020;21: 207. doi: 10.1186/s13059-020-02091-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98.Choi SG, Olivet J, Cassonnet P, Vidalain P-O, Luck K, Lambourne L, et al. Towards an ″assayome″ for binary interactome mapping. bioRxiv. 2019; 530790. doi: 10.1101/530790 [DOI] [Google Scholar]
- 99.Diss G, Filteau M, Freschi L, Leducq J-B, Rochette S, Torres-Quiroz F, et al. Integrative avenues for exploring the dynamics and evolution of protein interaction networks. Curr Opin Biotechnol. 2013;24: 775–783. doi: 10.1016/j.copbio.2013.02.023 [DOI] [PubMed] [Google Scholar]
- 100.Choi SG, Olivet J, Cassonnet P, Vidalain P-O, Luck K, Lambourne L, et al. Maximizing binary interactome mapping with a minimal number of assays. Nat Commun. 2019;10: 1–13. doi: 10.1038/s41467-019-11809-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Starr TN, Greaney AJ, Addetia A, Hannon WW, Choudhary MC, Dingens AS, et al. Prospective mapping of viral mutations that escape antibodies used to treat COVID-19. Science. 2021;371: 850–854. doi: 10.1126/science.abf9302 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Michnick SW, Landry CR, Levy ED, Diss G, Ear PH, Kowarzyk J, et al. Protein-Fragment Complementation Assays for Large-Scale Analysis, Functional Dissection, and Spatiotemporal Dynamic Studies of Protein–Protein Interactions in Living Cells. Cold Spring Harb Protoc. 2016;2016: pdb.top083543. doi: 10.1101/pdb.top083543 [DOI] [PubMed] [Google Scholar]
- 103.Younger D, Berger S, Baker D, Klavins E. High-throughput characterization of protein–protein interactions by reprogramming yeast mating. Proc Natl Acad Sci. 2017;114: 12166–12171. doi: 10.1073/pnas.1705867114 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 104.Starr TN, Greaney AJ, Hilton SK, Ellis D, Crawford KHD, Dingens AS, et al. Deep Mutational Scanning of SARS-CoV-2 Receptor Binding Domain Reveals Constraints on Folding and ACE2 Binding. Cell. 2020;182: 1295–1310.e20. doi: 10.1016/j.cell.2020.08.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.Hinegardner R, Engelberg J. Biological complexity. J Theor Biol. 1983;104: 7–20. doi: 10.1016/0022-5193(83)90398-3 [DOI] [Google Scholar]
- 106.Dunham Maitreya J., Gartenberg Marc R., Brown Grant W. Methods in Yeast Genetics and Genomics. Cold Spring Harbor Laboratory Press; 2015. [Google Scholar]
- 107.Horecka J, Davis RW. The 50:50 method for PCR-based seamless genome editing in yeast. Yeast. 2014;31: 103–112. doi: 10.1002/yea.2992 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108.Christianson TW, Sikorski RS, Dante M, Shero JH, Hieter P. Multifunctional yeast high-copy-number shuttle vectors. Gene. 1992;110: 119–122. doi: 10.1016/0378-1119(92)90454-w [DOI] [PubMed] [Google Scholar]
- 109.Yoshimatsu T, Nagawa F. Control of gene expression by artificial introns in Saccharomyces cerevisiae. Science. 1989;244: 1346–1348. doi: 10.1126/science.2544026 [DOI] [PubMed] [Google Scholar]
- 110.Jiang L, Schlesinger F, Davis CA, Zhang Y, Li R, Salit M, et al. Synthetic spike-in standards for RNA-seq experiments. Genome Res. 2011;21: 1543–1551. doi: 10.1101/gr.121095.111 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 111.Zorita E, Cuscó P, Filion GJ. Starcode: sequence clustering based on all-pairs search. Bioinformatics. 2015;31: 1913–1919. doi: 10.1093/bioinformatics/btv053 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 112.Hale JJ, Matsui T, Goldstein I, Mullis MN, Roy KR, Ville CN, et al. Genome-scale analysis of interactions between genetic perturbations and natural variation. Nat Commun. 2024;15: 4234. doi: 10.1038/s41467-024-48626-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 113.Tange O. GNU Parallel—The Command-Line Power Tool. USENIX Mag. 2011;1: 42–47. [Google Scholar]
- 114.R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2023. https://www.R-project.org
- 115.Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J R Stat Soc Ser B Methodol. 1995;57: 289–300. doi: 10.1111/j.2517-6161.1995.tb02031.x [DOI] [Google Scholar]