SUMMARY
While knowledge of protein-protein interactions (PPIs) is critical for understanding virus-host relationships, limitations on the scalability of high-throughput methods have hampered their identification beyond a number of well-studied viruses. Here, we implement an in-silico computational framework (termed P-HIPSTer; Pathogen Host Interactome Prediction using STructurE similaRity) that employs structural information to predict ~282,000 pan viral-human PPIs with an experimental validation rate of ~76%. In addition to rediscovering known biology, P-HIPSTer has yielded a series of new findings: the discovery of shared and unique machinery employed across human-infecting viruses; a likely role for ZIKV-ESR1 interactions in modulating viral replication; the identification of PPIs that discriminate between HPVs with high and low oncogenic potential; a structure-enabled history of evolutionary selective pressure imposed on the human proteome. Further, P-HIPSTer enables discovery of previously unappreciated cellular circuits that act on human-infecting viruses and provides insight into experimentally intractable viruses.
Graphical Abstract

INTRODUCTION
While viruses employ a complex network of protein-protein interactions (PPIs) to coopt cellular processes - such as endocytosis, transcription and capping, nuclear transport, protein translation and secretion – host cells respond by initiating a complex transcriptional program targeted at activating innate anti-viral defenses that control viral replication and activate the adaptive immune system. In this regard, knowledge of virus-host PPIs is critical for understanding the precise series of events that control cellular responses to infection as well as mediate viral replicative cycles. Yet, our knowledge of the machinery that mediates and controls the interaction between virus and host remains exceedingly sparse. Considerable effort involving a multitude of methods (including yeast two-hybrid assays and affinity purification) has been invested in delineating physical interactions between viral and human proteins (Nicod et al., 2017). These approaches have yielded critical insights into virus-human relationships, identification of key mediators of immunity, and discovery of cellular factors that control viral replication. However, limitations on scalability have hampered identification of PPIs en masse. Indeed, though a modest collection of PPIs exists for a handful of well-studied viruses, virtually nothing beyond genome sequence is known for the great majority of the ~1000 human infecting viruses that have been identified and whose public health importance is unquestionable.
Here, we report a systematic interrogation of PPIs predicted for a compiled a set of 1,001 fully sequenced human infecting viruses represented by 12,237 proteins from virus-host DB, a repository of viral genomes and curated host information (Mihara et al., 2016). Based on an adaptation of the extensively validated PrePPI algorithm (Predicting Protein-Protein Interactions) (Garzon et al., 2016; Zhang et al., 2012), P-HIPSTer (Pathogen-Host Interactome Prediction using STructurE similaRity) exploits protein structural information, taken from the Protein Data Bank (PDB) and homology modeling, to predict viral-host PPIs by accounting for both domain-domain and peptide-domain interactions (Figure 1A).
Figure 1. P-HIPSTer enables human-virus interactome mapping and interrogation.
A) P-HIPSTer uses protein structure homology modeling to evaluate viral-human PPIs mediated by domain-domain or peptide domain contacts. B) empirical validation by co-IP of 65 predicted viral-human PPIs. Positive and negative interactions are shown in yellow and purple, respectively (TP: true positive rate; TN: true negative rate). C) Posterior analyses leveraging P-HIPSTer human-virus protein interactome.
The P-HIPSTer database, comprised of about 282,000 PPIs, represents a comprehensive catalog of virus-human PPIs that spans the Baltimore classification system and is a major expansion on previously available or reported pathogen-host interactions. We have subjected in-silico P-HIPSTer predictions to extensive statistical and empirical validation and demonstrate its ability to rediscover viral-human PPIs and cellular determinants of viral replication identified by orthogonal approaches and genome-wide screens. We report five applications of P-HIPSTer that highlight both its ability to capture known biology and to provide biological insights not available from existing tools (Figure 1C). These include: 1) validation and analysis of Zika virus (ZIKV)-human PPIs and discovery of Estrogen Receptor 1 (ESR1) as a major rheostat of viral replication; 2) analysis of human papilloma virus (HPV) determinants of oncogenicity and development of an interaction-based classifier of high and low risk viruses that may be deployed in clinical settings; 3) identification of cellular pathways coopted across the human virome, many of which were not previously recognized; 4) identification of evolutionary and functional relationships across viral families and nucleic acid types that add a new layer of biological complexity to the Baltimore Classification System; 5) discovery of selection imposed by viruses on the human-primate lineage. P-HIPSTer provides a unique tool both for structure-based functional interrogation of viral proteins and for the generation of testable hypotheses that have the potential to uncover new biology. Moreover, as will be shown, the unprecedented scale of the P-HIPSTer database provides a framework for discovering shared and unique cellular pathways across 1,001 viruses representing 28 viral families known to infect humans.
RESULTS
THE P-HIPSTER ALGORITHM, DATABASE AND VALIDATION
P-HIPSTer uses three sources of evidence to predict whether two proteins interact (Figure 1A). Briefly, an integrated likelihood ratio (LR) for a given PPI is based on: a) LR that two structured domains interact based on evaluation of a model derived from a known complex comprised of their “structural neighbors”, that is, proteins with similar three dimensional folds; b) LR that an unstructured peptide will bind to a given structured domain based on known binding motifs in the peptide sequence; c) LR based on evidence that multiple structural neighbors of a query protein interact with a target protein. The three LRs are then combined to yield a final LR score representing the probability that a candidate pair of proteins form a complex.
We applied the P-HIPSTer algorithm to 12,237 viral proteins from 1,001 viruses spanning all viral families known to infect humans and 20,113 human proteins (Figure S1A and Table S1). Of the 12,237 viral proteins interrogated, we compiled structural information for 7,593 (62%) represented by a total of 799 experimentally determined structures from the PDB and 109,888 homology models. We generated multiple models per viral protein to account for possible conformational heterogeneity and variable modes of protein binding. The structural coverage ranges from 73% in (+)ssRNA viruses to 33% in dsDNA-RT (retro-transcribing) viruses (Figures S1B and S1C) and represents a significant expansion on the limited knowledgebase for viral protein structures.
The PrePPI algorithm, on which P-HIPSTer is based, combines structural and non-structural clues to yield a final LR for PPIs within a given genome. For PrePPI, a physical interaction was found to occur with high confidence when the final LR ≥ 600 and the combined structural evidence yielded an LR ≥ 100. Since P-HIPSTer is based entirely on structural evidence, we empirically tested the extent to which an LR cutoff of 100 is evidence for a true physical interaction. Sixty-five pairwise predictions spanning LR values from 1 to 1106 were tested by co-immunoprecipitation (co-IP). Of the 34 predictions with an LR > 100, 26 were confirmed to be positive (Figures 1B, S2 and Table S2). Additionally, of the 31 predictions with LR < 100, only 7 were positive. These data correspond to a true positive rate of 79% and a true negative rate of 77%, similar to results obtained with the original PrePPI algorithm (Zhang et al., 2012). Furthermore, they provide a striking validation of the P-HIPSTer algorithm and indicate that an LR cutoff of 100, is indeed reasonable in identifying true direct interactions.
Of the 245,122,781 potential viral-human PPIs, P-HIPSTer reports LR values for 99,639,431 (Table S1). Of these, 282,528 (~0.1%) predictions have an LR ≥ 100 (data are available for detailed browsing at phipster.org). These are mediated primarily through interactions between structured domains (64.1%) and involve 7,463 viral proteins from 990 viruses and 5,749 human proteins. The majority (76.8%) of predicted viral-human PPIs mediated by domain-domain contacts are based on experimentally solved interaction complexes that do not involve a viral protein. On average, 38 and 285 PPIs are predicted per viral protein and virus, respectively, with variability due to proteome size and structural coverage (Table S1). By comparison, structure-informed predictions from the PrePPI pipeline identify an average of ~6 interactions among proteins in the human proteome, in line with observations that viral proteins tend to mediate a greater number of interactions than human proteins (Garamszegi et al., 2013).
We compared P-HIPSTer predictions with results from high-throughput experimental approaches, yeast two-hybrid, and mass-spectrometry (Table 1). While the extent of overlap is modest, in most cases it is statistically significant and comparable to that found between different studies carried out on the same virus. As noted by others, limited overlap between independent experimental mapping of PPIs has been common and largely attributed to fundamental differences in experimental and statistical approaches that can influence rates of false-positives and false-negatives (Luck et al., 2017; Shah et al., 2018; von Mering et al., 2002). This, at least in part, may explain why P-HIPSTer results overlap well with some high-throughput studies and not with others.
Table 1. Evaluation of the overlap between high-throughput methods and P-HIPSTer.
The number of viral-interacting human proteins or RNA ‘hits’ correspond to those reported in the original publication and considered in the P-HIPSTer dataset. The set of influenza A PPIs were obtained from publicly available databases. Asterisk: PPIs involving ZIKV Capsid, NS3 or NS5 proteins.
| Dataset | Technique | Virus | PPIs/RNAi ‘hits’ considered by P-HIPSTer | Predicted PPIs | Overlap |
|---|---|---|---|---|---|
| Sha et al., 2018 | AP-MS | DENV2 | 190 | 181 | 4 (9×10−2) |
| Batra et al., 2018 | AP-MS | EBOV | 170 | 209 | 3 (0.26) |
| Calderwood et al., 2007 | Y2H | EBV | 111 | 956 | 12 (7×10−3) |
| Ramage et al., 2015 | AP-MS | HCV | 134 | 170 | 1 (0.68) |
| Germain et al., 2014 | AP-MS | HCV | 98 | 170 | 6 (1.9×10−4) |
| de Chassey et al., 2008 | Y2H+literature | HCV | 481 | 170 | 21 (6.8×10−10) |
| Union of Ramage, Germain, de Chassey | Multiple | HCV | 606 | 170 | 25 (6.4×10−11) |
| Jager et al., 2011 | AP-MS | HIV | 422 | 509 | 33 (1.3×10−8) |
| PPI-DBs | Multiple | Influenza A | 326 | 132 | 11 (1.1×10−5) |
| Uetz et al., 2006 | Y2H | KSHV | 13 | 937 | 4 (2×10−2) |
| Davis et al., 2015 | AP-MS | KSHV | 555 | 937 | 40 (5×10−3) |
| Shah et al., 2018 | AP-MS | ZIKV | 189* | 89* | 0 (0.42) |
| Scaturro et al., 2018 | AP-LC-MS | ZIKV | 383 | 97 | 3 (0.28) |
| Ma et al., 2017 | CRISPR/Cas9 | EBV | 142 | 956 | 14 (8.1×10−3) |
| Tripathi et al., 2015 | RNAi | Influenza A | 846 | 469 | 56 (2.4×10−12) |
| Jager et al., 2011 | RNAi | HIV | 1031 | 509 | 38 (9×10−3) |
We also compared P-HIPSTer predictions to results obtained from genome-wide loss-of-function screens used to identify cellular modulators of viral replication. The overlap between P-HIPSTer predictions and cellular factors for EBV (Ma et al., 2017), HIV (Jager et al., 2011), and influenza (Tripathi et al., 2015) previously identified as ‘hits’ in genome-wide screens were found to be highly significant (p-value 8.1×10−3, 9×10−3, and 2.4×10−12, respectively; Table 1). Importantly, while overlap between independent RNAi screens has been relatively poor (largely attributed to disparities in cell types and assays used to determine phenotypic consequences of knockdown), functional analysis on “hits” has demonstrated high concordance at an ontological and pathway level. Indeed, integrating virus-host PPIs together with pathway analysis of independent RNAi screens enables discovery of critical host factors that modulate viral replication (Tripathi et al., 2015). Since P-HIPSTer leverages structural homology to generate high-confidence predictions for viral-host PPIs, it captures interactions independently of cell type or assay used and can therefore be leveraged to explore and discover previously unappreciated circuitry that controls virus-human relationships.
NOVEL MEDIATORS OF ZIKA VIRUS INFECTION
Zika virus (ZIKV), an arthropod borne Flavivirus has raised major health alarms due to devastating complications associated with infection, including Guillain-Barre Syndrome and birth defects like microcephaly as well as spontaneous abortion and stillbirth (Musso and Gubler, 2016). Recent concerted experimental efforts have mapped ZIKV-human PPIs and identified cellular factors that mediate immune function and regulate viral replication (Grant et al., 2016; Li et al., 2016; Musso and Gubler, 2016; Scaturro et al., 2018; Shah et al., 2018; Tang et al., 2016; Wu et al., 2016). We leveraged P-HIPSTer predictions to augment these advances and identified 159 ZIKV-human PPIs, involving six viral and 97 human proteins, at an LR ≥ 100 (Figure 2). While the majority of these were not identified through previous experimental efforts, they are consistent with observations and functions associated with ZIKV infection (Grant et al., 2016; Li et al., 2016; Wu et al., 2016). Specifically, we find that the predicted ZIKV interactome is enriched for: a) genes expressed in fetal brain cortex (p-value 1.16×10−3); b) genes known to be part of the transcriptional program in response to ZIKV infection in embryonic mouse brains (p-value 1.2×10−3); and c) immune related pathways like antigen processing-cross presentation, IFN-γ signaling and NF-κB signaling (Figure 2B). We experimentally confirmed 12 (of 16 tested interactions) novel ZIKV-human interactions and modeled the interaction mode for 7 of them (Figure 2C). The prediction and validation of interactions not previously observed through traditional experimental efforts underscores the sparsity of relationships reported in the literature, the susceptibility of such approaches to incomplete mapping of PPIs, and the value of combining experimental techniques (like MS and Y2H) with computational predictions provided by P-HIPSTer.
Figure 2. Discovery of novel ZIKV-human protein interactions.
A) Combining P-HIPSTer predictions with known human PPIs followed by topological analysis identifies connectivity-based modules that are subjected to functional interrogation. B) ZIKV-human PPI network with enriched biological pathways and phenotypes for each topological module. C) Interaction models and experimental validation for 7 predicted ZIKV-human PPIs (blue and red models correspond to human and viral proteins, respectively).
We combined predicted ZIKV-human PPIs along with known human PPIs from IntAct into a single protein interaction network and identified 9 topological modules: four modules composed of proteins predicted to interact with >1 viral protein, and 5 modules defined by cellular proteins with only a single ZIKV protein partner (Figure 2B and Table S3). Modules predicted to interact with >1 viral protein are enriched for functions associated with apoptosis, embryonic lethality, cell-cycle, abnormal neuron proliferation and conjunctivitis – consistent with known biological pathways and phenotypes associated with ZIKV infection (Li et al., 2016; Musso and Gubler, 2016; Scaturro et al., 2018; Shah et al., 2018; Tang et al., 2016). In addition to recapitulating known biology, we find that the NS3 sub-network is enriched for regulators of peptidase activity – suggesting that they play a critical role in NS3 function. Modules specific to ZIKV envelope (Env) or capsid (C) are associated with immune related pathways. While the data implicates capsid in modulating Toll-like receptor and RIG-I mediated signaling, two important modes of pattern recognition that play indispensable roles in innate immunity and initiate cellular responses to ZIKV infection (Hamel et al., 2015), Env function is associated with T-cell responses (Figure 2B) – ones that are critical for control of, and long term protection from, ZIKV infection (Lima et al., 2017).
We selected six cellular factors with validated ZIKV-human PPIs (Figure 3 and Table S2) for experimental interrogation through gain and loss-of-function (Figure 3). We observe that while over-expression of human amyloid β precursor APP results in a modest (yet significant) 2-fold enhancement of ZIKV replication, it dramatically affects cellular response to IFNβ stimulation (Figure 3A). While priming of cells with IFNβ normally induces a refractory state (resulting in a 10-fold reduction in viral titers), priming with IFNβ has no effect on cells exogenously expressing APP. APP is a known partner of Musashi-1 (MSI1), a cellular protein recently shown to regulate ZIKV replication through interaction with the viral genome and repression of genes involved in neural stem cell function (Chavali et al., 2017). This suggests a potential role for the experimentally confirmed PPIs between APP and ZIKV C and NS3 in modulating MSI1-mediated control of viral replication and cellular responsiveness to IFNβ.
Figure 3. Functional interrogation of ZIKV cellular partners and identification of ESR1 as an inhibitor of viral replication.
A) Effect of overexpressing cellular factors on ZIKV infection in 293T cells with or without IFNβ priming. Stars indicate significant difference in viral titer (determined by focus forming assay) or cellular response to IFNβ. B-C) Effect of ESR1 knockdown (siRNA) on ZIKV replication. B) Focus forming assay (red staining indicates ZIKV foci). C) qPCR of ZIKV mRNA, data representative of 3 independent experiments. One and two stars indicates p-values <0.05 and <0.0001 respectively. (EV; empty vector)
In contrast to APP, we find that while SFN (14-3-3σ) overexpression results in ~10-fold increase in viral titers, it potentiates cells to IFNβ priming and enhances the cellular antiviral state (Figure 3A). 14-3-3e, which is related to SFN, has recently been shown to regulate RIG-I localization (Liu et al., 2012), and to be a target of dengue virus NS3 protein – an interaction that antagonizes RIG-I signaling and results in enhanced viral replication (Chan and Gack, 2016). Similarly, SFN and 14-3-3e have also been shown to inhibit Toll-like receptor (TLR) mediated sensing of viral RNA (Butt et al., 2012) and 14-3-3z was demonstrated to interact with STAT3 and promote downstream signaling (Han et al., 2015). Our findings suggest a broader role for this protein family in controlling innate immune responses to viral infection and implicate SFN in regulating both cell-intrinsic antiviral programs, as well as those induced by IFNβ.
The most dramatic impact on viral replication resulted from overexpression of ESR1, which led to ~2000-fold reduction in ZIKV replication (Figure 3A). Conversely, siRNA knockdown of ESR1 (which resulted in 70% depletion of ESR1 mRNA) potentiated ZIKV replication as measured by both viral titer and viral mRNA in infected cells (Figures 3B and 3C). Though the precise molecular process through which ESR1 regulates ZIKV replication is yet to be defined, the discovery that FDA-approved selective estrogen receptor modulators to inhibit Ebola virus replication (Johansen et al., 2013) suggests a broader role for this hormone in modulating viral replicative potentials. Moreover, recent epidemiological data suggests different ZIKV incidence rates between men and women as well as pregnant verses non-pregnant women (Lozier et al., 2016). While potential explanations include exposure to Aedes mosquitoes, increased severity of symptoms in women vs men, reporting biases, and sexual transmission, our findings reveal an additional point of viral vulnerability that may be leveraged for therapeutic targeting of the estrogen pathway. In short, functional interrogation of experimentally validated ZIKV-interacting human proteins reveals novel roles for three human proteins (APP, SFN and ESR1) in modulating viral replication and/or cellular responsiveness to IFNβ. Thus, P-HIPSTer predictions for a virus of interest can be leveraged to uncover novel PPIs, identify interactions underlying clinical phenotypes of viral infection, expose regulators that control inflammatory responses and viral replication as well as identify targets with wide therapeutic potential.
CLASSIFYING HPVS BASED ON PPIS
Human papillomaviruses (HPVs), non-enveloped dsDNA viruses that preferentially infect epithelial tissues, can cause lesions with various degrees of severity. HPVs encode two sets of proteins defined by their temporal expression during the viral replicative cycle: i) late proteins (L1-L2) form the icosahedral capsid; ii) early proteins (E1-E7) serve regulatory functions including replication control (E1 and E2), cell-cycle regulation, immune evasion, and virus release (E4-E7). E6 and E7 are transcriptionally regulated by E2 and have been implicated in dictating viral oncogenicity (Doorbar et al., 2012). Low-risk (LR) HPVs have been linked primarily with benign warts while high-risk (HR) HPVs can lead to neoplasias and carcinomas like cervical cancer (Cubie, 2013). HR and LR oncogenic classifications have largely relied on epidemiologic and phylogenetic information which is essential for prevention planning and screening programs (Munoz et al., 2003). While vaccines against HPV exist, they have no therapeutic value against existing infections and must therefore be administered before individuals are infected. So, identification of viable therapeutic targets is critical and delineating cellular proteins that are differentially targeted by HR and LR HPVs can be particularly useful in identifying circuits that mediate disease.
We applied a supervised feature selection method, using PPIs with LR ≥ 100 as input, to five LR-HPVs and five HR-HPVs associated with cervical cancer with the goal of identifying PPIs that discriminate HR and LR infections (Figure 4 and Table S4). Evaluation of the Bayesian classifier by five-fold and leave-one-out cross-validation yielded an accuracy of 90% in predicting cervical cancer risk for a given HPV and resolved ten human proteins whose interactions differentiate LR- and HR-HPVs (Figure 4B). HR-HPVs E7 and E2 proteins have a propensity to interact with CCNB1, SMAD3, PCNA, HNRNPM and HNRNPDL (Group I) while LR-HPVs E1 and E2 proteins preferentially interact with LIRB1, TERF1, FUBP1, BRDT, TK1 (Group II). In both cases, LR-values clearly distinguish HR- and LR-HPVs.
Figure 4. P-HIPSTer derived Bayesian Network classifier discriminates high- and low-risk HPVs.
A) Machine learning on P-HIPSTer interactomes for HR and LR HPVs (highlighted in ‘B’ with red and blue rectangles, respectively) is used to identify features associated with viral oncogenic potential. B) Hierarchical clustering of alpha HPVs based on a constellation of 10 viral-host PPIs (5 associated with HR, 5 associated with LR; Group I and Group II, respectively) discriminates HR and LR HPVs. Group III describes 18 viral-host PPIs shared across alpha HPVs. Dark and open circles denote binding profiles (LR ≥ 100: dark circle; LR < 100 open circle). Human proteins with known roles during HPV infection are highlighted in red.
We also identified 18 proteins that interact with both HR-HPVs and LR-HPVs (Group III), 10 of which have been shown to have direct roles in the HPV life cycle (Buitrago-Perez et al., 2009; Collier et al., 1998; Dietrich-Goetz et al., 1997; Grinstein et al., 2002; Kajitani and Schwartz, 2015; Katzenellenbogen et al., 2010). Among these, P-HIPSTer recapitulates previously reported interactions between HPVs and RB1, RBL1, RBL2, and p53 (Buitrago-Perez et al., 2009). Indeed, p53 and RB family members play well-accepted roles in HPV oncogenesis. The results discussed below suggest that the interaction of viral proteins with Group I and II proteins may modulate the established roles of Group III proteins in determining HPV oncogenic potentials.
Multiple lines of evidence connect Group I proteins, including CCNB1, SMAD3, and PCNA, with an augmented risk of cervical intraepithelial neoplasia (CIN) and cervical cancer (Cho et al., 2006; Lee et al., 2002; Tjalma et al., 2001). CCNB1, predicted to preferentially interact with E7 protein of HR-HPVs (Table S4), plays an important pro-mitotic role by promoting the G2/M transition (Takizawa and Morgan, 2000) – elevated levels have been demonstrated in multiple cancers, including cervical cancer (Cho et al., 2006). While previous reports illustrate that E7 promotes mitosis and cellular proliferation through interactions with cellular B-Myb-MuvB (Pang et al., 2014), our results suggest that, in addition, E7- CCNB1 interaction may contribute to fulfilling its pro-mitotic role. In agreement with previous reports, we also identify SMAD3, a signal transducer and transcriptional modulator of TGF-β signaling that promotes tumor suppression, as a partner of E7. HR-HPV16 E7 has been shown to interfere with TGF-β induced cell growth by binding SMAD3 and blocking its transcriptional activity (Lee et al., 2002), our results extend this regulatory potential to E7 proteins of other HR-HPVs. Indeed, recent reports suggest that E7 may modulate oncogenic potential of HPV’s (Mirabello et al., 2017; White and Munger, 2017). Thus, P-HIPSTer provides a basis for further understanding the role of E7 in carcinogenesis and points to additional layers of regulation through which HPVs modulate viral replication and initiation of oncogenic states in infected tissues.
In addition to CCNB1 and SMAD3, we identified PCNA as a cellular partner of HR-HPV E2 protein but not LR-HPVs (Table S4). Expression of PCNA has been associated with HR-HPV and is correlated with increasing grade of CIN and cervical cancer (Branca et al., 2007; Eissenberg et al., 1997; Kelman, 1997; Tjalma et al., 2001). The large difference in LRs observed for this interaction resides in the peptide-domain interaction, where E2 sequences from HR-HPVs match the peptide motif of an ELM class while the E2 LR-HPVs do not (Table S4). During HPV infection, viral proteins recruit DNA polymerases and members of the DNA replication machinery, including PCNA, to trigger viral replication (Berg and Stenlund, 1997; Chojnacki and Melendy, 2018; Fuss and Linn, 2002; Melendy et al., 1995; Mohr et al., 1990). Indeed, HPV is known to indirectly regulate the activity of PCNA through an interaction between viral E7 and human p21 protein (Funk et al., 1997). While the precise mode of action remains to be elucidated, our results suggest that the E2-PCNA interaction contributes to HPV-mediated modulation of genome replication and initiation of oncogenic states.
The remaining 2 human proteins preferentially targeted by HR-HPVs are HNRNPM and HNRNPDL (via viral E2 protein). In addition to acting as a transcriptional repressor of viral E6 and E7 proteins, E2 contributes to RNA processing and metabolism (Kajitani and Schwartz, 2015). Notably, HNRNPM and HNRNPDL belong to the hnRNP family which participates in various aspects of RNA metabolism including alternative splicing, mRNA stabilization and regulation of transcription and translation (Geuens et al., 2016), and their expression is associated with various cancer types (including cervical carcinoma for HNRPNDL) (Chen et al., 2014; Sun et al., 2017; Tsuchiya et al., 1998). While the role for HNRNPM, a component of the spliceosome complex, remains to be determined, HNRPNDL has been implicated in inhibiting the production of HPV-16 spliced L1 (HPV major capsid protein) mRNA and promoting immune system evasion and establishment of long-term persistent infections with enhanced risk of carcinogenesis (Li et al., 2013).
Group II proteins point to unappreciated biological processes underlying HPV pathogenesis. While none are known to play a direct role in the HPV life cycle, changes in TK1 and FUBP1 expression have been associated with cervical cancer (Buitrago-Perez et al., 2009; Chen et al., 2013; Pyeon et al., 2007). The relative sparsity in prior knowledge likely reflects the fact that LR-HPVs are far less studied than HR-HPVs. Our results invite new studies into Group I and II proteins in cancer progression and underline the importance of evaluating host factors in the context of both LR- and HR-HPV.
Having established the Bayesian classifier on five known HR-HPVs and five know LR-HPVs, we clustered the remaining 28 HPVs based on predicted PPIs with the 10 human proteins in Group I and II. The resulting dendrogram discretizes HR-HPVs and LR-HPVs into two branches that are supported by known biology (Figure 4B). Of the seven HPVs that co-cluster with HR-HPVs; four have been described as potentially carcinogenic (HPV-53, HPV-69, HPV-82, HPV-97) (Cubie, 2013; IARC); two (HPV-68a and HPV-68b) are subtypes of high risk HPV-68 and; one (HPV-62), though considered to be a LR-HPV, is commonly found in neoplastic tissue, is detected in coinfection with HR-HPVs, and is among the most prevalent LR-HPVs found in women with abnormal cervical cytology or cervical cancer (Artaza-Irigaray et al., 2017). Of the 21 HPVs that cluster with LR-HPVs, only one has been described as potentially carcinogenic (HPV-67) (IARC).
The classifier presented here is orthogonal to the standard one based on sequence alone and can be used in parallel when new viruses are discovered. However, in addition to providing a demonstration of P-HIPSTer performance, the results identify previously unappreciated PPIs that discriminate between HR- and LR-HPVs. To our knowledge, this has not been accomplished previously and offers a new perspective on HPV pathogenesis, oncogenic potential, and cellular factors that may serve as viable therapeutic targets.
CELLULAR PATHWAYS COOPTED BY VIRUSES
Beyond utilizing P-HIPSTer to gain insights into specific viruses of interest, its comprehensive nature also offers a unique opportunity to illuminate a broad picture of viral infection and identify features that are shared across human viruses. For example, Gene Ontology (GO) analysis across the 5,749 cellular factors predicted as viral-interacting proteins indicates significant overrepresentation of biological processes related to signal transduction, immune response and viral infection (Figures S3A and S3B). We also identified 173 human proteins, predicted to interact with at least 100 viruses, enriched for processes related to immunity, virus infection and functions known to be important during viral replication (e.g. regulation of cell cycle and membrane organization); providing insights into cellular processes broadly relevant for virus-human dynamics (Figures S3C and S3D). In agreement with recent findings (Chen and Xia, 2019), we find that these proteins are also enriched for virally implicated genetic diseases. The identification of pan-viral factors that converge around common cellular pathways is made possible both by the comprehensive nature of the P-HIPSTer database and by the large number of interactions detected by structural homology.
In addition to the standard analysis used to identify overrepresented biological themes and ontologically related gene groups, we previously demonstrated that Gene Set Enrichment Analysis (GSEA) (Subramanian et al., 2005) can be used to provide functional annotation of a given protein based on the functions of its predicted binding partners (Garzon et al., 2016). We applied this analysis to the LR-based rank-order of all predictions for a given viral protein and represented the pathways and functions targeted by a given virus as the union of enrichments across its proteins (Figure S4). This exposed 190 pathways that are recurrently targeted by DNA, RNA and RT viruses, 92 of which are associated with regulation of cellular metabolism and point to specific metabolic processes underpinning viral requirements during infection (Table S5). For example, glucose metabolism and fatty acid synthesis are universally targeted across viruses, reflecting the requirement for increased membrane production necessary for viral packaging, vesicular transport and protein production and the need for energy through glycolysis and beta-oxidation of fatty acids (Gonzalez Plaza et al., 2016). We also find that the majority glycan-related metabolic pathways are targeted by viruses across all nucleic acid types (Table S5), reflecting their central roles in immune evasion (through modulation of adaptive immune responses to infected cells) and supporting pathogen recognition by antibodies (Raman et al., 2016).
Underscoring the universal role that Jak-Stat, NF-κB, and type-I IFN dependent pathways play in initiating innate immune responses to infection, we find that they too are recurrently targeted across all viruses (Figure S4B and Table S5). This is further emphasized by the observation that among the 173 human proteins predicted to interact with ≥100 viruses, 55 contain a sushi domain – known to be involved in the complement cascade, innate immune cell trafficking and viral entry and endocytosis (Figure S3D) (Ley, 2003; Tanner et al., 1988). Moreover, targeting of complement pathway and complement and coagulation cascades is enriched among RNA and DNA viruses. Yet, while viruses converge around response-initiating pathways and share metabolic requirements for infection, replication and pathogenesis, they have diversified in their targeting of other arms of the immune system (only 16 out of 182 immune related pathways are commonly targeted by all viruses). For example, poxviruses widely target immune-related pathways, while noroviruses specialize in targeting innate immune and type I IFN responses but do not widely interact with proteins involved in nucleic acid sensing. Conversely, our results indicate that members of other viral families have diversified in their targeting immune system components. For example, flaviviruses utilize one of two distinct strategies in targeting immune related pathways; while the majority of flaviviruses (including dengue, Zika and West Nile viruses) interact with pathways involved in T-cell differentiation, DAP12-mediated natural killer (NK) cell responses, and RIG-I/MDA5 signaling, hepaciviruses target eicosanoid ligand binding receptors and IL-23 signaling (Figure S4B). These observations invite questions about the selective pressures that gave rise to such diversification in adaptive strategies within a family of related viruses. Furthermore, these data highlight the power of P-HIPSTer to capture general and specialized strategies employed by viruses to rewire cellular responses to infection and to uncover principles that govern human-virus relationships that were otherwise hidden or missing from other PPI discovery methods.
RECONSTRUCTING FUNCTIONAL AND EVOLUTIONARY RELATIONSHIPS ACROSS HUMAN-INFECTING VIRUSES
In addition to providing insight into shared and unique strategies exploited by human viruses, knowledge of PPIs informs about functional and evolutionary relationships that cannot be discerned through sequence alone. For instance, sequence and structural comparison of viral proteins illustrates that while sequence homology occurs between proteins from viruses of the same nucleic acid type, high structural similarity is often found for protein pairs belonging to viruses of different nucleic acid types (Figure S5). We employed an unsupervised clustering strategy (where viruses are described by their set of enriched biological pathways, given their corresponding set of predicted viral-human PPIs) to uncover functional relationships that are masked by sequence divergence (Figure 5A). We identified eight distinct clusters, each representing a unique constellation of pathways targeted by member viruses (Figure 5 and Table S6) – revealing novel insights into shared and unique infection strategies employed across virus families.
Figure 5. P-HIPSTer reveals functional and evolutionary relationships across the human virome.
A) Strategy to cluster viruses, described by their set of enriched pathways. B) Dendrogram of 568 viruses clustered based on their enriched pathways. Inner color ring specifies Baltimore category; outer ring specifies viral family for each virus. Schematic representations of the Interleukin 10 (IL-10; enriched in Cluster 8), Rho GTPase signaling (enriched in cluster 7), and RNA metabolism (enriched in cluster 5) pathways. Pathway components predicted to be targeted by viruses within each cluster are highlighted in blue.
Three clusters are dominated by viruses belonging to one of three families (cluster 1: Picornaviridae, cluster 5: Caliciviridae, cluster 8: Poxviridae), suggesting specialized infection strategies that distinguish them from other human viruses. Poxviruses are characterized by pathways related to signaling, immunity, transcription and cell growth. In particular, we observe that Poxviridae family members target Interleukin-10 (IL-10), an anti-inflammatory cytokine that regulates immune responses to mitigate tissue damage and is known to be modulated by viruses to evade immunity (Ouyang et al., 2014) (Figure 5B and Table S6). While several poxviruses have been shown to encode an IL-10 ortholog (Ouyang et al., 2014), we find that this family of viruses recursively target the IL-10 pathway at multiple levels of regulation including production and downstream signaling, reflecting a particular importance of this pathway during infection.
P-HIPSTer can also help contextualize findings from mouse model systems and when considering experimentally intractable viruses or viruses for which not much more than genome sequence is known (Karst et al., 2014). Noroviruses (Caliciviridae family) have emerged as human pathogens of significant public health import. These viruses do not replicate well in vitro and much of what is known about them comes from studies of related mouse caliciviruses, resulting in limited understanding of cell factors that are involved in human infection (Wobus et al., 2006). As (+)ssRNA viruses, noroviruses regulate mRNA translation machinery to facilitate viral replication and modulate translation of certain host mRNAs, including interferon stimulated genes (ISG) (Emmott et al., 2017). We find that in addition to pathways related to immunity and regulation of the cellular extracellular matrix, all biological pathways related to mRNA processing are enriched within noroviruses (cluster 5; Figure 5B and Table S6), suggesting a pervasive and systematic cooption of cellular machinery involved in RNA metabolism by this family of viruses.
Clusters 2, 4, 6 and 7 bring together viruses from different families and Baltimore categories. Strategies employed by viruses within these clusters appear to have converged around biological pathways of shared importance during infection (by interacting either with the same host proteins or with host proteins of the same pathway). A similar observation, albeit at smaller scale, was described when clustering viral proteins based on their proteomic profiling and network proximities of interacting partners (Pichlmair et al., 2012). Cluster 7, which contains all Ebola and measles viruses, and evolutionarily distant viruses, converges on a variety of pathways related to signaling, immunity, apoptosis and cell growth (Table S6). Of these, Rho GTPase signaling (p-value 5.1×10−36), regulates actin and microtubule dynamics and biological processes like phagocytosis, cellular transport and intracellular communication, by linking membrane receptors to the actin cytoskeleton (Van den Broeke et al., 2014) (Figure 5). While Rho GTPases facilitate Ebola virus entry through macropinocytosis, measles virus interferes with this pathway to perturb T cell function and long-term T-cell mediated immunity (Muller et al., 2006; Quinn et al., 2009). Our observations extend a role of Rho GTPase signaling to other members of this cluster and illustrate that though some viruses converge around particular cellular pathways, utilization of such machinery is dependent on cellular context and viral requirements for their replicative cycle.
We also find functional divergence among some viral families. For example, flaviviruses appear in multiple clusters, including clusters 2, 4 and 6. Viruses in cluster 2 converge around pathways related to transcriptional and translational regulation (e.g. “transport of mature transcript to cytoplasm” and “mRNA capping”). While Zika virus falls into cluster 2, phylogenetically related dengue viruses fall into cluster 4, where signaling and immune related pathways are among the most enriched pathways (e.g. “Immune response IFN α/β signaling”). Instead, viruses in cluster 6 (e.g. Japanese encephalitis virus) are enriched for a different set of immune pathways, cell growth and apoptosis-related pathways. In short, clustering viruses based on pathway enrichments derived from P-HIPSTer inferred PPIs enables discovery of convergent functional relationships between unrelated viruses and divergent relationships within viral families. Moreover, clusters that bring together disparate viruses demonstrate that while sequence divergence precludes any possibility of using traditional methods to go beyond ontological classification, structure-based PPIs afforded by P-HIPSTer add a level of cross-stratification to virus taxonomy by identifying functional relationships that are otherwise masked. Finally, the data open the possibility of discovering molecular processes that underlie pathological states and clinical outcomes across unrelated viral infections.
HISTORY OF SELECTION IMPOSED BY VIRUSES
In the context of viral infection, as with other pathogens, there exists an “arms race” between the virus and infected cells – signs of which can be found in genomes of both virus and host. We interrogated genetic divergence among primates as a proxy for the selection pressures imposed by viruses on the human proteome (Figure 6). As noted by many others (Bustamante et al., 2005; Daugherty and Malik, 2012; Hughes et al., 2003), the primate lineage is dominated by purifying selection. We found that proteins predicted to interact with viral proteins display significantly lower ΔN/ΔS values (avg. of 0.23 vs 0.31; Wilcoxon tests: p-value <0.0001, W>106) than non-interacting proteins. The observation suggests that virus-interacting proteins are under stronger purifying selection than non-interacting ones. This observation is in agreement with previous results using orthologous, albeit smaller, datasets (Enard et al., 2016; Halehalli and Nagarajaram, 2015). We also find that proteins with the lowest ΔN/ΔS values (5th percentile) that are predicted to interact with viruses are enriched for regulators of gene expression, chromatin modification and cell cycle, in addition to modulators of innate and adaptive immune responses (Table S7). Though such increased conservation may reflect an over-representation of essential genes targeted by viruses, it points to critical proteins at the interface of viral-human interactions that span millions of years of primate evolution.
Figure 6. P-HIPSTer reveals history of selection imposed by human viruses.
A) Evolutionary tree derived from 12 whole genome primate sequences. B) ΔN/ΔS values for each of 14,974 aligned genes were plotted across human-virus interacting and non-interacting proteins. Listed in the insert are virus-interacting proteins with ΔN/ΔS values >1. Shown are functional enrichments (p-value<0.05 and FDR<20%) for interacting (dark circle) and non-interacting (light circle) proteins with ΔN/ΔS values >1. Red dot denotes p-value that did not pass FDR correction. Stars indicate Wilcoxon p-value<0.0001, W>106.
Positively selected genes are of particular interest, as they represent key adaptive sweeps that have been shaped by environmental pressures, including those imposed by viruses. Analysis of the 45 virus-interacting proteins under positive selection (with mean ΔN/ΔS values >1) reveals that they are highly enriched for functions and diseases related to viral infection, as well as adaptive and innate immune responses (Figure 6B) - a similar relationship was recently described (Chen and Xia, 2019). Reflecting their central role in initiating and mediating responses to a variety of pathogens, regulators of immunity are also enriched among the 329 positively selected genes not predicted to interact with viruses. However, these genes are overrepresented by functions and diseases related to bacterial, fungal, and parasitic infection, demonstrating that P-HIPSTer can distinguish among proteins that do and do not interact with viral proteomes and have functions associated with viral infection. Notably, TRIM5, previously identified as a host restriction factor for HIV that determines human vs macaque susceptibility (Sawyer et al., 2005), is among the 45 positively selected genes targeted by viruses. As noted by others (Barreiro and Quintana-Murci, 2010), genome-wide scans for molecular “footprints” of natural selection imposed by infections would benefit greatly from integrative approaches that combine genetics, epidemiology, immunological phenotyping, and knowledge of key pathogen-host molecular interactions. So, while the notion that human genetics has been shaped by pathogens is not new, P-HIPSTer data extend the list of human proteins shaped by their interactions with viruses and highlight its ability to capture evolutionary histories that have determined pathogen and disease susceptibility. Thus, adding to the toolbox of resources used to investigate evolutionary pressures imposed on the human genome and to interrogate evolutionary relationships between humans and their virome.
DISCUSSION
As viruses exploit host cell through molecular interactions involving diverse molecules including nucleic acids, lipids, sugars, and proteins, their study has resulted in the discovery of many fundamental aspects of cell biology. The work presented here represents the largest initiative to model and map cross-species protein-protein interactome and, as such, should serve as a valuable resource for studying viral replication strategies, evolutionary and functional relationships and history of human adaptation. Furthermore, this compendium may accelerate the discovery of drugs with potential pan-viral effects by targeting host pathways coopted across multiple viruses. By providing structural models for the majority of viral proteins included in the study, and models for interactions mediated by them, P-HIPSTer invites in-depth exploration of interfacial residues, examination of the structure space utilized by viruses, and an analysis of the evolutionary constraints imposed on viral protein functions.
To facilitate convenient access to P-HIPSTer derived interactions, we have made the results available through an interactive webserver that enables both searchable queries and data download (phipster.org). Specifically, the database includes: all interactions with LR ≥ 100 (representing 282,528 PPIs between 12,237 viral and 20,113 human proteins), structure models for all modeled viral proteins, structure models for predicted complexes with a domain-based LR≥100 and navigational links to external databases. In addition, annotations for all viral proteins are provided based on known functions of structure neighbors and sequence homologs as well as functional analysis of P-HIPSTer predicted interaction partners. In total, the repository provides over 100,000 atomic models corresponding to ~7,600 viral proteins from 970 viruses. Finally, as the structural information in the PDB continues to grow and provides critical sources of evidence for PPI predictions, the P-HIPSTer database will undergo annual updates. Since it relies on homology-modeling based on experimentally determined crystal structures, as the number and diversity of solved structures increases, so will P-HIPSTer’s predictive power and scope. While some human infecting viruses and viral strains may be missing from the current compendium, PPI discovery and structural analysis for individual viruses of interest are possible upon request.
P-HIPSTer, like PrePPI and other high-throughput computational and experimental methods, is ultimately intended to generate testable hypotheses. The reliability of predictions contained in the database can be assessed in a number of ways. Along with in-silico recapitulation of results obtained by orthogonal methods, and true positive and true negative rates above 75%, the applications discussed above demonstrate that P-HIPSTer yields predictions that either rediscover known biology or point to PPIs that are consistent with the expected effects of viral infection. Moreover, even in cases of well-studied viruses like ZIKV that have been subject to repeated experimental efforts to comprehensively identify viral-human PPIs, P-HIPSTer led to discovery and validation of novel and functionally relevant interactions, including ESR1 which we demonstrate to have a critical role in modulating ZIKV replication.
While P-HIPSTer rediscovers multiple previously identified interactions between viral and cellular proteins, others are not reported with an LR ≥ 100. For example, HPV PPIs including E7-URB4 and E6-MAML1 were not identified. However, there is no evidence to suggest that these interactions are direct. Indeed, methods like affinity purification and chromatography, co-IP, and fluorescence microscopy, used to identify these interactions and others often do not discriminate between direct and indirect interactions. Since P-HIPSTer identifies direct interactions, these examples highlight the orthogonal and complimentary nature of the approach in mapping the extended network of cellular partners and complexes of viral proteins. Similarly, some false positive P-HIPSTer predictions, notably HPV E7-p53 (known experimentally not to occur) will inevitably arise. Nevertheless, the high validation rate along with the discoveries highlighted in each of our applications of the pipeline, engender confidence in P-HIPSTer reliability.
Though P-HIPSTer is orthogonal to traditional experimental methods of identifying PPIs, it offers several key benefits. First, while high-throughput methods can be influenced by experimental settings and generally fail to provide information about the nature of an interaction, P-HIPSTer provides context independent evidence of a direct physical interaction. Moreover, access to structure models for all interactions with a domain-domain LR ≥ 100 (85,175 in total) offers the ability, not available by other methods, to experimentally probe each prediction with site directed mutants. Moreover, such models are extremely useful in generating functional hypotheses and interpreting genetic data on susceptibility to viral infection. While the current sparsity of data on human genetic diversity does not provide the statistical power to analyze differential selection pressure on interfacial versus non-interfacial residues, the proliferation of genome sequencing projects may soon make such an inquiry tractable. Second, P-HIPSTer exploits the fact that protein structure is far better conserved than sequence and provides new functional annotations for ~7,600 viral proteins – representing a massive expansion on incomplete annotations provided by current computational tools that rely on sequence identity. Most importantly, the new findings described in this work demonstrate the facility with which P-HIPSTer’s comprehensive nature enables fundamentally new biological discoveries and insights that could not have been realized with existing resources.
P-HIPSTer’s scalability is an important step towards the goal of defining all pathogen-host interactions across any and all species. Indeed, future developments will include exploration of endogenous retroviruses, eukaryotic and bacterial pathogens of humans and agricultural crops and livestock, as well as bacterial viruses that are part of the human enteric microbiome. In addition, efforts to leverage whole-genome sequencing together with P-HIPSTer predictions may systematically map the landscape of human genetic variation with endemic viral infection across geographic regions and inform on adaptive costs incurred during human evolution. Integration of P-HIPSTer data with other large-scale experimental mapping of pathogen-host interactions will augment and accelerate discovery of basic cell-biological machinery and targeted interrogation of both experimentally tractable and intractable viruses. Finally, affordable gene synthesis pipelines, coupled with tunable expression systems and low input profiling of cellular transcriptional states, may be used to bridge PPIs and reconstruct signaling pathways modulated by viral proteins of interest.
STAR METHODS
CONTACT FOR REAGENT AND RESOURCE SHARING
Further information and requests for reagents may be directed to, and will be fulfilled by the corresponding author Sagi D. Shapira (ss4197@cumc.columbia.edu).
EXPERIMENTAL MODEL AND SUBJECT DETAILS
Cell lines, virus strains
293T, Vero and MCF7 cells were maintained in Dulbecco’s modified Eagle’s medium (DMEM; GIBCO, Life Technologies) supplemented with 10% fetal bovine serum (FBS; Hyclone Laboratories) and 1% penicillin and streptomycin (GIBCO, Life Technologies) at 37°C and 5% CO2. All cells lines have been authenticated by ATCC. Additionally, each cell line was inspected through microscopy for morphology and growth characteristics. Cell lines were tested for mycoplasma contamination and found to be negative. MCF7 cells are female cells as described by ATCC. Zika virus (ZIKV), MR 766 strain obtained from BEI Resources was kindly provided by Dr. Vincent Racaniello (Columbia University Medical Center), was amplified once on Vero cells. Virus titer was determined by focus-forming assay on Vero cells.
METHOD DETAILS
Assembly of viral protein and human protein dataset
We compiled a viral protein dataset containing 12,237 viral proteins corresponding to 1,001 completely sequenced human viruses from virus-hostDB (Mihara et al., 2016) as of October, 2016 (Supplementary Table 1; https://www.genome.jp/virushostdb). Originally, the dataset contained 12,520 viral proteins and 1,028 human viruses. The dataset was manually curated by cross-referencing viral taxonomy IDs to other viral databases and viruses with poorly annotated genomes (viruses with < 4 proteins and a genomic coverage < 80%) and viruses wrongly annotated as human viruses were discarded (Federhen, 2012; Hulo et al., 2011; Pickett et al., 2012). Information for each virus (nucleic type, taxonomic classification, taxonomic identifier) and viral protein (protein database identifiers, description and amino acid sequence) was retrieved from virus-hostDB (Mihara et al., 2016). Additionally, proteins annotated as polyproteins in virus-hostDB were parsed into individual proteins, whenever possible, using the annotation of proteins and mature peptides as described in the NCBI protein database (Coordinators, 2017). We assigned Uniprot accession codes (ACs) to viral proteins by Blasting each protein sequence against the Uniprot database (Altschul et al., 1990; Apweiler et al., 2004) considering only those hits with maximum E-value of 10−20, a minimum alignment coverage of 70% of the shortest sequence aligned and a minimum sequence identity of 90%. The human protein dataset contains 20,113 non-immunoglobulin human proteins obtained from the Uniprot database as described by Garzon et. al (Apweiler et al., 2004; Garzon et al., 2016).
Protein modeling, structural neighbor search
The P-HIPSTer (Pathogen-Host Interactome Prediction using STructurE similaRity) algorithm enables systematic interrogation of pan-human virus interactions by exploiting both sequence- and structure-based information from atomic structures taken from the PDB, and from homology models, to account for both domain-domain and peptide-domain interactions. Three-dimensional models for full-length viral proteins and protein domains, as defined by the Conserved Domain Database (Marchler-Bauer et al., 2017), are either taken directly from the PDB (Berman et al., 2000) or built by homology modeling. The homology pipeline described in Garzon et al. (Garzon et al., 2016) was adapted to model large polyproteins and to account for potential conformational variability whenever possible by considering multiple templates for homology modeling rather than only considering the best scoring template. Template search is performed in three steps where each step is run only if the preceding step reports no templates. Templates are identified based on the significance of the sequence alignment between the query protein and the matching protein structure, requiring an E-value < 10−12 and some additional criteria imposed at each step. The first step runs one iteration of Blast (Altschul et al., 1990) against the PDB database and identifies the set of non-overlapping templates (matching different segments of the query protein sequence) with the lowest E-value. Overlapping templates with higher E-values are also considered for modeling only if their E-values are < 10−12 and lie within 10−10 of the E-value reported for the best hit found in that particular protein segment. The second and third step runs HHblits (Remmert et al., 2011) and five iterations of Blast (Altschul et al., 1990) respectively against the PDB database, identifying templates in a similar manner with the exception that overlapping templates with higher E-values are considered only if their E-values are < 10−12 and lie within 10−2 of the E-value reported for the best hit found in that particular protein segment. Atomic models were built with NEST (Petrey et al., 2003; Xiang and Honig, 2001) based on the alignments provided either by Blast or by HHblits. Homology modeling of human proteins was carried out as described by Garzon et al (Garzon et al., 2016). In modeling the set of viral proteins used in this study, we found that the greatest structural coverage was obtained for proteomes of (+)ssRNA and (-)ssRNA viruses may reflect the tendency of their proteomes to have fewer disordered regions (Figure S1C), or the overrepresentation of viral proteins from RNA viruses in the PDB (that serve as templates for structural modeling; 4,791, 3,104 and 2,638 viral proteins from RNA, DNA and RT viruses respectively as of April, 2017).
The structure of the protein complexes used as templates for interaction models are defined by i) PDB ‘biounit’, likely representing the biological relevant quaternary structure for interacting proteins; ii) the PISA database of predicted biounits (Krissinel and Henrick, 2007) and; iii) the PDB file (Garzon et al., 2016). In order to identify structurally similar proteins, a structural neighbor search is carried out for each human and viral protein or domain (either taken directly from the PDB or modelled by homology) against the PDB database with Ska (Petrey et al., 2003; Yang and Honig, 2000b) using a Protein Structural Distance (PSD) ≤ 0.6.
Disorder prediction and structural coverage
The fraction of disordered residues and the structural coverage for each Baltimore category (Baltimore, 1971) is calculated at the residue level for each corresponding virus. Disorder prediction is calculated with IUPred (Dosztanyi et al., 2005), where each residue is predicted as disordered if its IUPred score is > 0.5. For each virus within a Baltimore category we calculate the fraction of residues predicted as disordered. Similarly, we define structural coverage for each virus as the fraction of residues within a virus present in an X-ray structure or homology model.
Structural comparison of viral proteins
All-against-all structural comparison of atomic structures (either modelled or derived from the PDB), corresponding to viral capsid proteins or to a subset of viral proteins with low pairwise sequence identity, was performed using Ska (Yang and Honig, 2000b). Structural similarity between viral capsids has been used to infer functional and evolutionary relationships where genomic similarities are no longer observable (Abrescia et al., 2012). Here, we extend the same principle to a subset of viral capsids whose structure are modelled by homology. Similarly, structural comparison of viral proteins with low pairwise sequence identity permit us to identify potential functional relationships between pairs of viral proteins that are not detectable using sequence alone. A low sequence similarity subset was assembled at 40% sequence identity with CD-HIT (Li and Godzik, 2006). Considering that most viral proteins are described by several protein structures (corresponding to either the full sequence or domains and additionally modelled using multiple templates), we define the structural distance between two proteins as the shortest distance reported between any two protein structures, where each structure corresponds to one of the two proteins. We use both the protein structural distance (PSD) and the structural alignment score (SAS) to measure distance between any two atomic structures (Subbiah et al., 1993; Yang and Honig, 2000a). PSD reflects local and/or global structural similarity and can capture structural relationships and functional relationships involving only a small number of secondary structure elements (>3). SAS, on the other hand, takes into account the length of the alignment and better reflects global structural similarity. By combining both PSD and SAS we expect to capture both local and global structural relationships between pairs of proteins. Two atomic structures are considered structurally similar when their PSD is ≤ 0.6 and their SAS is ≤ 3.5 Å: while the PSD cutoff of 0.6 indicates that there is good structural similarity over at least part of both structures, an SAS cutoff of 3.5 Å better reflects a global similarity (Budowski-Tal et al., 2010; Zhang et al., 2012). Pairwise sequence identities described in Figure S5 were computed with the Needleman-Wunsch algorithm implemented in the EMBOSS package (Rice et al., 2000).
Bayesian network to predict viral-human PPIs
P-HIPSTER adapts the Bayesian network underlying the PrePPI (Garzon et al., 2016; Zhang et al., 2012) algorithm and reports a likelihood ratio (LR) of the interaction between any pathogen and host protein using three separate structure-based evidences Ex: i) the structural modeling evidence (Edom) evaluates the potential direct interaction between two query proteins through their folded domains (Zhang et al., 2012); ii) the protein-peptide evidence (Epep) evaluates the likelihood of a PPI through an unstructured domain and a folded domain (Chen et al., 2015); iii) the partner redundancy evidence (Eredu) infers a likelihood of a PPI based on the number of structural neighbors of one query protein known to interact with the remaining query protein (de Chassey et al., 2013; Garzon et al., 2016). Edom evaluates the potential interaction between two folded protein domains based on four criteria (Zhang et al., 2012): i) SIM: the average structural similarity of the query domain pair to the interacting subunits in an interaction template (an experimentally solved protein complex involving two proteins); ii) SIZ: the number of interacting residue pairs in the interaction template that are structurally aligned to residue pairs in the query domains after structural superposition; iii) COV: the fraction of interacting residue pairs in the interaction template that are structurally aligned to residue pairs in the query domains after structural superposition and; iv) OS: same as SIZ, with the additional condition that each residue in the interacting pair aligns to a residue predicted to be interfacial in the query domains. Epep (Chen et al., 2015) reports the maximum LR assessing the potential interaction between an unstructured domain and a folded domain in two independent evidences: i) PepX predicts peptide-domain interactions based on experimentally determined complexes (Vanhee et al., 2010) using both structural and sequence similarity; ii) PepELM predicts peptide-domain interactions based on the Eukaryotic Linear Motif (ELM) database (Dinkel et al., 2014), by identifying a PFAM sequence signature in one query protein and a motif sequence signature in the remaining query protein, both mapping to the same ELM class. Eredu does not necessary imply a direct physical interaction between query proteins but it is used to further support the predictions reported by Edom and/or Epep. This evidence combines knowledge on PPIs extracted from PPI databases with structural similarity: a Likelihood Ratio (LR) of the interaction between two proteins A and B is assigned based on the number of structural neighbors of A known to interact with B, or viceversa, and the number of structural neighbors of A and B that are known to interact (Garzon et al., 2016).
Training was carried out on the yeast interactome (Garzon et al., 2016). The scores for each evidence E are partitioned into n bins b1, b2, …bn. A likelihood ratio is assigned to each bin. corresponds to the percentage of protein pairs in a positive gold standard dataset of PPIs in yeast with a score for Ex in bin bi divided by the percentage of protein pairs in the negative gold standard dataset of PPIs in yeast with a score for Ex in the same bin bi. Details regarding the Bayesian network training and the calculation of LR scores associated to each evidence have been described (Chen et al., 2015; Garzon et al., 2016; Zhang et al., 2012). P-HIPSTer integrates the LR scores obtained from each structural evidence Ex with a Naïve Bayes approach. Since Edom and Epep evidences can be considered as mutually exclusive, we only consider the maximum LR reported by either of them:
| (1) |
Overall statistics of the set of predictions (number of predicted PPIs per viral protein, per virus and per host proteins) are computed using only instances for which LRP-HIPSTer is equal or higher than an LR threshold of 100 (7,463 viral proteins, 990 viruses and 5,749 human proteins).
Functional enrichment analysis
We used DAVID to identify the set of biological pathways and molecular functions enriched within the subset of the human proteome predicted to interact with human viruses with an LRP-HIPSTer ≥ 100 (Huang da et al., 2009a, b). We used two different subsets: i) a set of 5,749 human proteins predicted to interact with at least one human virus with an LRP-HIPSTer ≥ 100 and; ii) a set of 173 human proteins predicted to interact with ≥ 100 human viruses with an LRP-HIPSTer ≥ 100. We corrected the background of the enrichment analysis for the 20,113 non-immunoglobulin human proteins considered by P-HIPSTer in this study.
Similar to previous work, we use the ranked list of predicted human interactors for a given viral protein to identify the biological pathways enriched (Garzon et al., 2016). Enrichment of gene sets for viral proteins are computed with GSEA software from mSigDB (Subramanian et al., 2005). For each viral protein, a ranked list of predicted interacting human proteins is constructed based on the LRP-HIPSTer. In order to account for background and potential technical bias, we subtracted the corresponding average LR derived from the predicted PrePPI human interactome to the LRP-HIPSTer (Garzon et al., 2016). Enrichment was calculated considering only the rank order (classic mode). Gene sets were derived from PathCards (Belinky et al., 2015) and considered enriched if the reported q-value was < 0.01. In order to collect the set of enriched pathways for any given virus, we computed the union of enriched gene sets (q-value < 0.01) of the corresponding viral proteins.
Common and unique pathways within viral groups
The discovery of common and unique pathways predicted to be targeted by DNA, RNA and RT viruses is carried out using a two-step procedure where first the most relevant pathways are selected for each viral category (DNA, RNA or RT) and then the overlap between the different categories (defined by their set of relevant pathways) is computed. Out of the 1,001 human viruses considered in our dataset, we excluded 148 viruses with low protein structural coverage (>50% of their residues were not successfully modelled). In order to reduce potential bias by larger viral families within a category we consider a pathway relevant for a particular category whenever it is found enriched (q-val < 0.01) in at least of 50% of the viruses within a viral family in at least 50% of the viral families. Venn diagrams describing the overlap between different nucleic acid type viral groups were computed with BioVenn and manually edited for visualization purposes (Hulsen et al., 2008). The set pathways within each region of the Venn diagram were manually classified into 26 different categories: apoptosis, cell adhesion, cell cycle, cell differentiation, cell growth, cell organization, cellular trafficking, development, disease, DNA, drug, extracellular matrix, immune, lipid, metabolism, microRNA, mRNA processing, pathogen, protein processing, RNA, signaling, transcription, translation, transport, viral and/or other. The significance of each category found within the different regions of the Venn diagram was assessed with a permutation test.
Clustering viruses based on GSEA
In order to avoid potential clustering errors due to technical noise (e.g. unrelated viruses with only few modelled proteins can cluster together leading to wrong conclusions) we applied a stringent filter where we only consider 568 human viruses with: i) annotated mature viral proteins where ≥ 50% of their proteins and their residues can be modelled and; ii) a number of viral proteins that is consistent with other viruses within the same genus or family (removing viruses with polyproteins not annotated as such; e.g. noroviruses with three viral proteins while the majority of noroviruses have eight viral proteins). For each virus, we calculated the union of enriched biological pathways with a q-value < 0.01 using the set of enriched pathways of the corresponding viral proteins. A pairwise distance matrix (568×568) using the Jaccard distance (intersection over the union) was computed and used as input for clustering using R and the heatmap.2 function within gplots package (distance metric: euclidean, method: complete) (R Core Team. R Foundation for Statistical Computing, 2016; Warnes et al., 2016). Clusters were delimited after cutting the dendrogram at a specific height. Biological pathways significantly overrepresented within clusters were identified using a hypergeometric clustering with Bonferroni correction using as background the relative abundance of each biological pathway within the set of 568 human viruses. Dendrogram was formatted for visualization purposes using iTOL (Letunic and Bork, 2016).
Analysis of ZIKA interactome
ZIKV interacting human orthologs:
We assembled a set of 690 mouse genes whose expression significantly changes in the brain of embryonic mice upon ZIKV infection (Li et al., 2016; Wu et al., 2016). Using this mouse gene set, we identified a set of 502 human orthologue genes, considered in our human dataset, with OrthoRetriever and the Ensembl BioMart version 77 (Kasprzyk, 2011). Significance of the overlap between the set of 97 human proteins predicted to interact with ZIKV proteins at an LRP-HIPSTer ≥ 100 and the set of 502 human orthologues was carried out with a hypergeometric test.
Protein interaction network analysis:
In order to build the Protein Interaction Network (PIN) between ZIKV and human proteins we considered only the predicted viral-human PPIs with an LR ≥ 100. We retrieved known protein interactions between the human proteins predicted to interact with ZIKV using the Intact database (Kerrien et al., 2012). The final interaction network combines both predicted viral-human PPIs and known human PPIs. Visualization of the network was carried out with cytoscape (Shannon et al., 2003). We clustered human proteins within the network based solely on their connectivity. First, we applied the MCODE plugin (Bader and Hogue, 2003) to find clusters of densely interconnected human proteins denoting potential functional modules or parts of pathways (include loops: no; degree cutoff: 2; haircut: no; fluff: no; node score: 0.4; kcore: 2; max depth: 100). Secondly, non-clustered proteins were grouped together according to their connectivity with ZIKV proteins. Enrichment analysis for each cluster was carried out with geneAnalytics (Ben-Ari Fuchs et al., 2016), only the five most enriched GO biological processes and phenotype sets were considered (minimum FDR corrected P-value ≤ 0.0001). Bar plots were drawn with GraphPad Prism (Software).
Cells and virus:
293T cells were maintained in Dulbecco’s modified Eagle’s medium (DMEM; GIBCO, Life Technologies) supplemented with 10% fetal bovine serum (FBS; Hyclone Laboratories) and 1% penicillin and streptomycin (GIBCO, Life Technologies) at 37°C and 5% CO2. Zika virus (ZIKV), MR 766 strain obtained from BEI Resources was kindly provided by Dr. Vincent Racaniello (Columbia University Medical Center), was amplified once on Vero cells. Virus titer was determined by focus-forming assay on Vero cells.
Focus-forming assay:
Vero cells were seeded into 24-well tissue culture plates at concentration of 80,000 cells/well. Serial 10-fold dilutions of each sample were prepared and added (in duplicates) to cell monolayers. Following 1hr incubation at 37°C, a semi-solid overlay containing 0.8% methylcellulose (Sigma-Aldrich), 3% fetal bovine serum, 1% Penicillin-Streptomycin in DMEM was added and plates were incubated at 37°C and 5% CO2 for 48 hr. The semisolid overlay was then removed, cells were washed 3 times with PBS, and fixed with an acetone and methanol (1:1) solution for 30 min at -20°C. Cells were then subjected to immunohistochemical staining with mouse anti-flavivirus D1-4G2-4-15 antibody (EMD Millipore), incubated overnight at room temperature, followed by mouse IgG HRP-conjugated antibody (R&D Systems) for 1hr. This was followed by incubation with vector VIP peroxidase substrate (Vector Laboratories) until color developed. The number of foci was determined and used to calculate virus titers expressed as FFU/ml.
Gain of function:
293T cells were seeded on to 12-well tissue culture plates at concentration of 150,000 cells per well for 24hr and transfected with 0.5 μg of plasmids encoding indicated proteins (ORFeome collection 8.1) as indicated using TransIT-LT1 Reagent (Mirus) according to manufacturer’s instructions. IFNβ (PBL Interferon Source) priming was achieved by pretreatment at concentration of 1U/mL for 18hr prior virus infection. Cells were infected with ZIKV at multiplicity of infection (MOI) of 1 40hr post-transfection. Following a 1hr incubation at 37°C, cell monolayers were washed to remove unbound virus and incubated at 37°C and 5% CO2 for 48hr. The cell supernatants was then harvested and virus was quantitated by focus-forming assay.
siRNA Transfection:
Human breast adenocarcinoma cells (MCF-7) were seeded into 48-well tissue culture plates at concentration of 25,000 cells per well and transfected (in quadruplicates) with 50nM (final concentration) siRNAs duplexes using Lipofectamine RNAiMAX Transfection Reagent (Invitrogen, Life Technologies) according the manufacturer’s instructions. The following siRNA were used: human ESR1 (SMARTpool, Dharmacon); AllStars Negative Control (QIAGEN) and AllStars Hs Cell Death (QIAGEN). 72hr post-transfection, cells were incubated with ZIKV (MOI of 1) for 1hr at 37°C. Cell monolayers were then washed to remove unbound virus and incubated at 37°C and 5% CO2 for 48hr. Cell supernatants were harvested for virus quantification by focus-forming assay and cells were lysed with QIAzol reagent (QIAGEN) for real-time quantitative PCR (qPCR) analysis.
Real-time quantitative PCR:
Cell samples were resuspended in QIAzol (QIAGEN) and total RNA was extracted following manufacturing instructions. cDNA was synthesized by using High Capacity cDNA Reverse Transcription Kit (Applied Biosystems). Real-time quantitative PCR (qPCR) was performed by using iTaq Universal SYBR Green Supermix (Bio-Rad Laboratories) in the CFX96 real-time PCR system (Bio-Rad Laboratories). The sequences of the primer pairs were the following:
GAPDH: 5’-ACCACAGTCCATGCCATCAC-3’ and 5’-TCCACCACCCTGTTGCTGTA-3’;
ESR1: 5’- GGGAAGTATGGCTATGGAATCTG-3’ and 5’-TGGCTGGACACATATAGTCGTT-3’;
ZIKV: 5’- CCGCTGCCCAACACAAG-3’ and 5’- CCACTAACGTTCTTTTGCAGACAT-3’.
Gene expression was calculated based on Ct values by using the formula 2^- [Ct (target gene) - Ct (GAPDH)]
Classification of Human Papillomaviruses (HPVs)
LR-HPVs and HR-HPVs in dataset:
Classification of HPVs into low risk cervical cancer and high risk cervical cancer was obtained from the literature (Cubie, 2013; IARC; Munoz et al., 2003). Using this classification, we identified HR-HPVs and LR-HPVs in our dataset. HR-HPVs: HPV-16, HPV-18, HPV-39, HPV-59, HPV-68; LR-HPVs: HPV-6, HPV43, HPV-54, HPV-72, HPV-81.
HPV-interacting human proteins (Groups I and II):
A human protein is considered to preferentially bind viral proteins in a particular HPV category (LR-HPV or HR-HPV) whenever it is predicted to interact with viral proteins in at least half of HPVs in one category and less than half of HPVs in the remaining category, providing that the difference between number of interacting viruses in each category is at least half of the average category size:
| (2) |
| (3) |
| (4) |
where bc1 is the number of viruses in class A (e.g. HR-HPV) with viral proteins predicted to bind a human protein, nc1 is the number of viruses in class A, bc2 is the number of viruses in class B (e.g. LR-HPV) with viral proteins predicted to bind the same human proteins, nc2 is the number of viruses in class B. A human protein is considered to preferentially bind one HPV category whenever it meets all three criteria described above.
HPV-interacting human proteins (Group III):
In order to identify the set of human proteins predicted to preferentially bind HPV viral proteins, we assessed the significance of the interaction with HR-HPVs and LR-HPVs viral proteins with a hypergeometric test using as background the entire predicted viral-human interactome. Only those human proteins predicted to interact with viral proteins in both HR-HPVs and LR-HPVs with a P-value < 0.05 are considered as HPV-specific interacting proteins. Additionally, we included p53, which is predicted to interact with viral proteins from both LR and HR HPVs (3/5 and 4/5 respectively) highlighting the accepted role of this human protein in the life cycle of both LR- and HR HPVs (Li and Coffino, 1996; Pietsch and Murphy, 2008).
Bayesian network:
A Bayesian network was trained using Weka (Frank et al., 2016). Training was carried out using the five HR-HPVs and five LR-HPVs. Each virus is described by the set of predicted interactions with the 10 human proteins, that were previously identified (see above), using a binary code (1 if the human protein is predicted to interact with at least one of the corresponding viral proteins in a virus with an LRP-HIPSTer ≥ 100, 0 otherwise). Evaluation of the network was carried out by five-fold and leave-one-out cross-validation. Additionally, we also trained a Bayesian network classifier, where we applied an automatic and supervised feature selection method prior to training in order to ensure that the considered features highly correlate with the class (LR-HPV or HR-HPV) while having a low inter-correlation between them (Hall, 1999). Automatic selection of features reported the interactions with 8 human proteins, which were previously identified as proteins preferentially binding either HR-HPVs or LR-HPVs with the method described above. The additional two features (human proteins predicted to preferentially bind either HR- or LR-HPVs) not considered by the automatic supervised feature selection method (HNRNPM and TERF1) were redundant and did not contribute further to the performance of the classifier (which does not mean that the interaction with these human proteins are not biologically meaningful). As expected, five-fold and leave-one-out cross-validation of the Bayesian network trained considering only the interactions to eight human proteins reported the same accuracy (9 out 10 viruses correctly classified as LR- or HR-HPV).
Clustering of HPVs:
Each alpha-papillomavirus is described as a binary vector with 10 elements, each describing the predicted interaction with one of the 10 human proteins predicted to preferentially interact with viral proteins in HR-HPVs or LR-HPVs. Predicted interactions are converted into a binary vector using an LR cutoff of 100 (1 if the human protein is predicted to interact with at least one of the corresponding viral proteins in a virus with an LRP-HIPSTer ≥ 100, 0 otherwise). Clustering was carried out in R using the heatmap.2 package (distance metric: Manhattan; clustering method: complete) (R Core Team. R Foundation for Statistical Computing, 2016; Warnes et al., 2016).
Evolutionary analysis on human genes
Dataset:
Primate datasets of 15,332 human coding genes were downloaded from the Ensembl genome browser (https://useast.ensembl.org/index.html) between August and November, 2017. The primate species included in the analyses were Gorilla gorilla, Homo sapiens, Pan troglodytes, Pongo abelli, Nomascus leucogenys, Macaca mulatta, Papio Anubis, Chlorocebus sabaeus, Callithrix jacchus, Carlito syrichta, Microcebus murinus and Otolemur garnettii. Only one sequence of each species was included. All datasets are publicly available from GITHUB (https://github.com/RabadanLab/pamler). Each nucleotide dataset was aligned as amino acids using Prank (Loytynoja, 2014), in order to preserve the correct frame, and then reconverted to nucleotides. The resulting alignment was then refined, by trimming regions of poor homology along the alignment with TrimAl (“gappyout” settings) (Capella-Gutierrez et al., 2009).
Evolutionary analyses:
A phylogenetic tree depicting the evolutionary relationships of the aforementioned primate species was built, using a topology reported previously in the literature (Hallast et al., 2016; Rogers and Gibbs, 2014). Given that sequences from all these 12 species were not necessarily present in a given gene, the original tree was pruned, thus retaining only the taxa for which there was a sequence available. Then, for each gene, the branch lengths of the primate tree were inferred with PhyML (Guindon and Gascuel, 2003) under the GTR + GAMMA (4 CAT) model, retaining the original tree topology.
The different genes analyzed were classified as “interacting” or “not-interacting”, according to the LR values inferred (using an LRPHIPSTER of 100). In order to assess the selective constraints in human proteins associated with the interaction with viral proteins, we performed site model analyses with codeml as implemented in PAML (Yang, 2007). Under the codon frequency model F3×4 (which estimates codon frequencies empirically), two different evolutionary models were compared by means of likelihood ratio tests (2 degrees of freedom): M7 (selective pressure of sites distributed according to a beta distribution, without positive selection) and M8 (selective pressure distributed according to a beta distribution, with the presence of positively selected sites). Significant evidence of selection was considered if P < 0.05, after False Discovery Rate correction. This model also allowed us infer the mean ΔN/ΔS of each gene. ΔN/ΔS values of Interacting (3,373) and not-interacting (11,601) genes were compared by means of Wilcoxon tests (2-tail).
Functional enrichment analyses:
Enrichment analysis on human proteins with ΔN/ΔS > 1 and ΔN/ΔS < 1 (5th percentile) was carried out with DAVID (Huang da et al., 2009a, b). Enriched pathways were selected based on the nominal p-value (< 0.05) and the False Discovery Rate correction (FDR < 20%). For each ΔN/ΔS subset, enrichments were carried out separately on: i) proteins predicted to interact with viral proteins with an LRPHIPSTER ≥ 100 and; ii) proteins not predicted to interact with viral proteins with viral proteins with an LRPHIPSTER ≥ 100.
Co-immunoprecipitation
293T cells were maintained in Dulbecco’s modified Eagle’s medium (DMEM; GIBCO, Life Technologies) supplemented with 10% fetal bovine serum (FBS; Hyclone Laboratories) and 1% penicillin and streptomycin (GIBCO, Life Technologies) at 37°C and 5% CO2. Plasmids expressing the human proteins AMBP, BIRC2, BRCA1, DOCK5, ESR1, ING2, ISG15, RABGGTA, SFN, SPINT2, STAC2, STAT3, STK32A, TP53, VARS, VAV1, VCAM1, SPINLW1 (EPPIN) (ORFeome collection 8.1, Table S8), ABL1, APP, BIRC3, CUL1 (Table S8) and plasmid expressing the virus proteins HCV NS5A, HSV-1 UL13, HIV-1 Nef, ZIKV C, NS3 and NS5 (Addgene, Table S8) were used in this study. For co-immunoprecipitation (Co-IP), cells were seeded into 12-well tissue culture plates at concentration of 300,000 cells per well for 24 hr and cotransfected with 0.5 mg of each plasmid expressing the human and virus proteins using TransIT-LT1 Reagent (Mirus) (Figure S2). Single transfection with each plasmid were also performed to access individual protein expression (Figure S2). Cells from each well were lysed 36 to 40 hr after transfection in 0.2 mL of RIPA buffer (50 mM Tris-HCl [pH 8.0]; 150 mM NaCl; 0.1%SDS; 1% Triton X-100; 0.5% Na Deoxycholate and EDTA-free protease inhibitor) at 4°C for 2 hr with constant agitation. Cell lysates were then sonicated for 2 min and centrifuged at 15,000 × g for 30 min at 4C° and pre-cleared for 1 hr at 4°C with protein G magnetic beads (SureBeads, Bio-Rad). The supernatant was then immunoprecipitated with 0.5 μg of monoclonal anti-HA antibody (Sigma-Aldrich) or 1:50 dilution of rabbit anti-FLAG antibody (Cell Signaling Technology) and 30 mL of protein G magnetic beads at 4°C overnight (O/N). The immunocomplex was then washed five times with 0.5 mL of RIPA buffer. Proteins bound to the beads were resuspended in 20 μL of Laemmli sample buffer (Bio-Rad) containing 5% β-mercaptoethanol (Sigma-Aldrich), boiled for 5 min at 95°C and then subjected to NuPAGE 4–12% Bis-Tris Gel (Life Technologies) and transferred to 0.45 μm nitrocellulose membranes (Bio-Rad). Membranes were blocked with TBST containing 5% non-fat milk at room temperature (RT) for 1 hr and incubated O/N at 4°C with the anti-V5-HRP antibody (Life Technologies) or c-Myc polyclonal antibody (Santa Cruz Biotechnology) followed by the HRP-conjugated secondary antibody (Thermo Scientific) for 2 hr at RT. For detection, membranes were incubated with SuperSignal West Femto Maximum Sensitivity Substrate (Thermo Scientific) and imaging using ChemDoc Touch Imaging System (Bio-Rad).
The following lines contains a description of the used plasmids:
The plasmid pDONR223-ABL1 was a gift from William Hahn & David Root, Addgene plasmid # 23939 (Johannessen et al., 2010). This plasmid was cloned into pLX304 Destination vector using LR Recombination Reaction (Gateway Technology, Invitrogen) according manufacturer protocol.
The APP gene was PCR amplified (attB1-APP_FOR 5’GGGGACAAGTTTGTACAAAAAAGCAGGCTTCACCATGCTGCCCGGTTTG GCACTGCTCCT3’ and attB2-APP_REV 5’GGGGACCACTTTGTACAAGAAAGCTGGGTTCAGCATGAGCCATCGTGCC TGGCC3’) from the human ORFeome Collection 8.1 plasmid (ORF ID 8905) and the purified PCR product was cloned into the pDONR223 vector using BP Recombination Reaction (Gateway Technology, Invitrogen) and further cloned into pLX304 Destination vector using LR Recombination Reaction (Gateway Technology, Invitrogen) according manufacturer protocol.
The plasmid Flag-cIAP2/pRK5 was a gift from Xiaolu Yang, Addgene plasmid # 27973 (Hu et al., 2006). cIAP2 is a ubiquitin protein ligase for BCL10 and is dysregulated in mucosa-associated lymphoid tissue lymphomas. The tag from this plasmid was replaced with a 3xMyc tag generated by an assembly PCR protocol (https://primerize.stanford.edu). The following oligos were used to generate the 3xMyc tag containing EcoRI and XbaI restriction sites at the 5’ and 3’, respectively: PAC 1 For 5’ TCGATTGAATTCGCCGCCATGGAGCAGAAACTCATCTCTGAAGAAGATCT GGAACAAA 3’; PAC 2 Rev 5’ CCAGATCTTCTTCTGAAATCAACTTTTGTTCCAGATCTTCTTCAGAGATGA GTTTCTGC 3’;PAC 3 For 5’ AGTTGATTTCAGAAGAAGATCTGGAACAGAAGCTCATCTCTGAGGAAGAT CTGGG 3’; andPAC 4 Rev 5’ TACTATGTTTCTAGAGGATCCCAGATCTTCCTCAGAGATGAGCTT 3’.
The plasmid pcDNA3-myc3-CUL1 was a gift from Yue Xiong (Addgene plasmid # 19896) (Ohta et al., 1999).
The plasmid pCMV-Tag1-NS5A was a gift from Xin Wang (Addgene plasmid # 17646) (Budhu et al., 2007).
The plasmid HSV-1 UL13 was a gift from Robert Kalejta (Addgene plasmid # 26697) ((Kuny et al., 2010).
The plasmid pCI NL4–3 Nef-HA-WT was a gift from Warner Greene (Addgene plasmid # 24162) ((Geleziunas et al., 2001).
The plasmid pLV_Zika_Cv_Flag was a gift from Vaithi Arumugaswami (Addgene plasmid # 79628): Unpublished.
The plasmid pLV_Zika_NS3_Flag was a gift from Vaithi Arumugaswami (Addgene plasmid # 79635): Unpublished.
The plasmid pLV_Zika_NS5_Flag was a gift from Vaithi Arumugaswami (Addgene plasmid # 79639): Unpublished.
Webserver development
The P-HIPSTer (http://phipster.org) webserver base code is split into two distinct parts: a back-end, which comprises a rest API written in the Django framework; and a front-end, which provides the user-interface to query this API, process the structured data it returns, and display the results. The API layer communicates with a Postgres database. The font-end is coded in the javascript framework Vue.js.; and was written as a single-page-application (SPA). It features the third party module NGL Viewer for molecular visualization (Rose et al., 2018). We host the webserver on Amazon Web Services (AWS). In addition to the set of predicted PPIs, the P-HIPSTer webserver provides, for each viral protein, sequence-, structure- and PPI-based functional annotation, the corresponding Pfam (Finn et al., 2016) domains, a graphical representation of the predicted protein interaction network, and a molecular viewer that allows to explore the structural models of viral and human proteins as well as the predicted interaction complexes, whenever possible (the LRdom must be ≥ 100).
Sequence-based annotation:
We identify the set of homologs after Blasting the sequence of a viral protein against the Uniprot database, considering as homologs only those hits with maximum E-value of 10−20, a minimum alignment coverage of 70% of the shortest sequence aligned and a minimum sequence identity of 90% (Altschul et al., 1990; Apweiler et al., 2004). The webserver reports the union of GO terms (Ashburner et al., 2000; The Gene Ontology, 2017) extracted from each homolog, ranked based on their frequency.
Structure-based annotation:
A structural neighbor search is carried out for each viral protein or domain (either taken directly from the PDB or modelled by homology) against the PDB database with Ska (Petrey et al., 2003; Yang and Honig, 2000b) using a Protein Structural Distance (PSD) ≤ 0.6 and a Structural Alignment Score (SAS) ≤ 3.5 Å. The webserver reports the union of GO (Ashburner et al., 2000; The Gene Ontology, 2017) and EC (Bairoch, 2000) terms extracted from each structural neighbor, ranked based on their frequency.
PPI-based annotation:
See Functional enrichment analysis using Gene Set Enrichment Analysis (GSEA).
QUANTIFICATION AND STATISTICAL ANALYSIS
Statistical parameters, including the definition of center, dispersion and associated significance, are reported in the main text, Figures, Figure legends and Tables. We have applied hypergeometric test, permutation test and Wilcoxon test to calculate significance. Whenever appropriate, p-values were adjusted for multiple comparisons. The section entitled “Method details” describes the statistical analyses performed. Data are judged to be statistically significant when p < 0.05 in applied statistical analyses. In figure 3, asterisks denote statistical significant (*, p < 0.05; **, P < 0.0001).
DATA AND SOFTWARE AVAILABILITY
All data generated as part of this study is available at phipster.org and GITHUB (https://github.com/RabadanLab/pamler). In addition, P-HIPSTer code is made available upon request.
ADDITIONAL RESOURCES
N/A
Supplementary Material
Figure S1. Related to Figure 1. Summary of viral families and Baltimore categories interrogated by P-HIPSTer and homology modeling coverage across Baltimore categories. a) Fully sequenced viruses infecting humans were downloaded from VirusHostDB (https://www.genome.jp/virushostdb/). For a complete list of viruses and Baltimore category as well as Family membership, see Table S1. b) Fraction of modelled residues per virus within each Baltimore category; c) Fraction of disordered residues per virus within each Baltimore category. Structure modeling was performed as described in methods. Center lines correspond to median values and whiskers range from min to max.
Table S1. Related to Figure 1. Viral and human datasets and predicted viral-human PPIs. Description of viral and human datasets and overview of P-HIPSTer PPI predictions (with a final LR >= 100) for each viral protein, virus and human protein.
Table S2. Related to Figures 1 and 3. Experimental validation of P-HIPSTer predictions. Summary results of 65 co-immunoprecipitation experiments on predicted PPIs.
Table S3. Related to Figure 2. ZIKV-Human interactome. ZIKV-Human PPI network combining predicted ZIKV-Human PPIs and experimentally derived human PPIs from Intact.
Table S4. Related to Figure 4. Predicted cellular factors that discriminate between HR-and LR-HPVs. Annotation of human proteins predicted to preferentially bind HR-HPVs (Group I), LR-HPVs (Group II) or both (Group III).
Table S5. Related to Figure 5. Shared and unique pathways targeted across viruses belonging different nucleic acid types.
Table S6. Related to Figure 5. Pathway-based clustering of human-infecting viruses. Viruses are described based on the corresponding set of enriched pathways. This is then used to identify shared functional targeting across human-infecting viruses.
Table S7. Related to Figure 6. Functional enrichment analysis on P-HIPSTer resolved human proteins. Human proteins under positive selection or purifying selection are further divided into virus interacting or virus non-interacting proteins. Functional enrichments on each subset were carried out using DAVID.
Table S8. Related to Key resource table. Plasmids and primers used in the accompanying manuscript. Human and viral plasmid employed to express proteins of interest during Co-immunoprecipitation assays and description of primers utilized in this work.
Figure S2. Related to Figure 1 and Figure 2. Empirical validation of P-HIPSTer predictions. 65 P-HIPSTer predictions spanning LR values between 1.2 and 1,106 were selected for co-IP. Briefly, 293T cells were transfected with relevant expression plasmids and lysates were subjected to immunopre- cipitation with indicated antibodies followed by SDS-PAGE electrophoresis and immunoblotting. Shown are representative gels. All experiments were performed at least 3 times to ensure reproducibility. For further details, please refer to Supplemental Materials and Methods. A complete summary of validation experiments is available in Table S3.
Figure S3. Related to Figures 1 and 5. P-HIPSTer uncovers viral targeting of infection related pathways. Enriched biological pathways (a) and molecular functions (b) within a set of 5,749 human proteins predicted to interact with human viruses. Enriched biological pathways (c) and molecular functions (d) within a set of 173 human proteins predicted to interact with ≥ 100 human viruses.
Figure S4. Related to Figure 5. P-HIPSTer uncovers shared and unique machinery employed across viruses of differing nucleic acid type. a) Venn diagram illustrating overlap of enriched biological pathways within each nucleic acid type (indicated are number of pathways within each region of the Venn diagram). Shown are pathway categories that are over-represented (x-axis denotes number of enriched pathways within each category). Categories that are significantly enriched within each region of the Venn diagram are highlighted in light purple. b) Clustering of immune-related pathways and viruses based on predicted pathway enrichment. Dark cells within the heatmap correspond to pathways enriched with q-value < 0.01. Highlighted are pathway classes and viral families corresponding to co-clustered pathways and viruses.
Figure S5. Related to Figure 5. Comparison of structural and sequence similarity across viral protein pairs. a) Subset of viral proteins with low pairwise sequence identity (<40%) where proteins in each pair corresponds to a virus from a different nucleic acid type: DNA/RNA (Red), DNA/RT (Blue), RNA/RT (Green). b) Subset of viral proteins across all nucleic acid types with low pairwise sequence identity (<40%). c) Viral capsid proteins corresponding to viruses from the same viral family (Black) or different Baltimore categories (Blue). Vertical bars indicate threshold to consider a pair of viral proteins as global structural neighbors (Structural Alignment Score SAS ≤ 3.5Å).
P-HIPSTer identifies ~282,000 high confidence pan human-virus PPIs
Cellular mediators of ZIKV infection and PPI-based classification of HPV oncogenicity
Identification of shared/unique infection strategies employed by human viruses
Cell-factor-centric analysis of human evolution
A computational approach facilitates the prediction and validation of protein-protein interactions between viruses and humans
ACKNOWLEDGMENTS
We thank Jose Ignacio Garzon and Donald Petrey for their technical support. This work was funded by NIH grants 5R01GM109018–05 and 5U54CA209997–03 to SS, 5R01GM030518–38 to BH, 5R01GM117591 to RR and equipment grants S10OD012351 and S10OD021764 to the Department of Systems Biology.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
DECLARATION OF INTERESTS
The authors declare no competing interests
REFERENCES
- Abrescia NG, Bamford DH, Grimes JM, and Stuart DI (2012). Structure unifies the viral universe. Annu Rev Biochem 81, 795–822. [DOI] [PubMed] [Google Scholar]
- Altschul SF, Gish W, Miller W, Myers EW, and Lipman DJ (1990). Basic local alignment search tool. J Mol Biol 215, 403–410. [DOI] [PubMed] [Google Scholar]
- Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, et al. (2004). UniProt: the Universal Protein knowledgebase. Nucleic Acids Res 32, D115–119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Artaza-Irigaray C, Flores-Miramontes MG, Olszewski D, Magana-Torres MT, Lopez-Cardona MG, Leal-Herrera YA, Pina-Sanchez P, Jave-Suarez LF, and Aguilar-Lemarroy A (2017). Genetic variability in E6, E7 and L1 genes of Human Papillomavirus 62 and its prevalence in Mexico. Infect Agent Cancer 12, 15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. (2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25, 25–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bader GD, and Hogue CW (2003). An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics 4, 2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bairoch A (2000). The ENZYME database in 2000. Nucleic Acids Res 28, 304–305. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baltimore D (1971). Expression of animal virus genomes. Bacteriol Rev 35, 235–241. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Barreiro LB, and Quintana-Murci L (2010). From evolutionary genetics to human immunology: how selection shapes host defence genes. Nat Rev Genet 11, 17–30. [DOI] [PubMed] [Google Scholar]
- Belinky F, Nativ N, Stelzer G, Zimmerman S, Iny Stein T, Safran M, and Lancet D (2015). PathCards: multi-source consolidation of human biological pathways. Database (Oxford) 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ben-Ari Fuchs S, Lieder I, Stelzer G, Mazor Y, Buzhor E, Kaplan S, Bogoch Y, Plaschkes I, Shitrit A, Rappaport N, et al. (2016). GeneAnalytics: An Integrative Gene Set Analysis Tool for Next Generation Sequencing, RNAseq and Microarray Data. OMICS 20, 139–151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Berg M, and Stenlund A (1997). Functional interactions between papillomavirus E1 and E2 proteins. J Virol 71, 3853–3863. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, and Bourne PE (2000). The Protein Data Bank. Nucleic Acids Res 28, 235–242. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Branca M, Ciotti M, Giorgi C, Santini D, Di Bonito L, Costa S, Benedetto A, Bonifacio D, Di Bonito P, Paba P, et al. (2007). Up-regulation of proliferating cell nuclear antigen (PCNA) is closely associated with high-risk human papillomavirus (HPV) and progression of cervical intraepithelial neoplasia (CIN), but does not predict disease outcome in cervical cancer. Eur J Obstet Gynecol Reprod Biol 130, 223–231. [DOI] [PubMed] [Google Scholar]
- Budhu A, Chen Y, Kim JW, Forgues M, Valerie K, Harris CC, and Wang XW (2007). Induction of a unique gene expression profile in primary human hepatocytes by hepatitis C virus core, NS3 and NS5A proteins. Carcinogenesis 28, 1552–1560. [DOI] [PubMed] [Google Scholar]
- Budowski-Tal I, Nov Y, and Kolodny R (2010). FragBag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately. Proc Natl Acad Sci U S A 107, 3481–3486. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Buitrago-Perez A, Garaulet G, Vazquez-Carballo A, Paramio JM, and Garcia-Escudero R (2009). Molecular Signature of HPV-Induced Carcinogenesis: pRb, p53 and Gene Expression Profiling. Curr Genomics 10, 26–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bustamante CD, Fledel-Alon A, Williamson S, Nielsen R, Hubisz MT, Glanowski S, Tanenbaum DM, White TJ, Sninsky JJ, Hernandez RD, et al. (2005). Natural selection on protein-coding genes in the human genome. Nature 437, 1153–1157. [DOI] [PubMed] [Google Scholar]
- Butt AQ, Ahmed S, Maratha A, and Miggin SM (2012). 14-3-3epsilon and 14-3-3sigma inhibit Toll-like receptor (TLR)-mediated proinflammatory cytokine induction. J Biol Chem 287, 38665–38679. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
- Capella-Gutierrez S, Silla-Martinez JM, and Gabaldon T (2009). trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chan YK, and Gack MU (2016). A phosphomimetic-based mechanism of dengue virus to antagonize innate immunity. Nat Immunol 17, 523–530. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chavali PL, Stojic L, Meredith LW, Joseph N, Nahorski MS, Sanford TJ, Sweeney TR, Krishna BA, Hosmillo M, Firth AE, et al. (2017). Neurodevelopmental protein Musashi-1 interacts with the Zika genome and promotes viral replication. Science 357, 83–88. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen G, He C, Li L, Lin A, Zheng X, He E, and Skog S (2013). Nuclear TK1 expression is an independent prognostic factor for survival in pre-malignant and malignant lesions of the cervix. BMC Cancer 13, 249. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen S, Zhang J, Duan L, Zhang Y, Li C, Liu D, Ouyang C, Lu F, and Liu X (2014). Identification of HnRNP M as a novel biomarker for colorectal carcinoma by quantitative proteomics. Am J Physiol Gastrointest Liver Physiol 306, G394–403. [DOI] [PubMed] [Google Scholar]
- Chen TS, Petrey D, Garzon JI, and Honig B (2015). Predicting peptide-mediated interactions on a genome-wide scale. PLoS Comput Biol 11, e1004248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen YF, and Xia Y (2019). Convergent perturbation of the human domain-resolved interactome by viruses and mutations inducing similar disease phenotypes. PLoS Comput Biol 15, e1006762. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cho NH, Kang S, Hong S, An HJ, Choi YH, Jeong GB, and Choi HK (2006). Elevation of cyclin B1, active cdc2, and HuR in cervical neoplasia with human papillomavirus type 18 infection. Cancer Lett 232, 170–178. [DOI] [PubMed] [Google Scholar]
- Chojnacki M, and Melendy T (2018). The human papillomavirus DNA helicase E1 binds, stimulates, and confers processivity to cellular DNA polymerase epsilon. Nucleic Acids Res 46, 229–241. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Collier B, Goobar-Larsson L, Sokolowski M, and Schwartz S (1998). Translational inhibition in vitro of human papillomavirus type 16 L2 mRNA mediated through interaction with heterogenous ribonucleoprotein K and poly(rC)-binding proteins 1 and 2. J Biol Chem 273, 22648–22656. [DOI] [PubMed] [Google Scholar]
- Coordinators NR (2017). Database Resources of the National Center for Biotechnology Information. Nucleic Acids Res 45, D12–D17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cubie HA (2013). Diseases associated with human papillomavirus infection. Virology 445, 21–34. [DOI] [PubMed] [Google Scholar]
- Daugherty MD, and Malik HS (2012). Rules of engagement: molecular insights from host-virus arms races. Annu Rev Genet 46, 677–700. [DOI] [PubMed] [Google Scholar]
- de Chassey B, Meyniel-Schicklin L, Aublin-Gex A, Navratil V, Chantier T, Andre P, and Lotteau V (2013). Structure homology and interaction redundancy for discovering virus-host protein interactions. EMBO Rep 14, 938–944. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dietrich-Goetz W, Kennedy IM, Levins B, Stanley MA, and Clements JB (1997). A cellular 65-kDa protein recognizes the negative regulatory element of human papillomavirus late mRNA. Proc Natl Acad Sci U S A 94, 163–168. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dinkel H, Van Roey K, Michael S, Davey NE, Weatheritt RJ, Born D, Speck T, Kruger D, Grebnev G, Kuban M, et al. (2014). The eukaryotic linear motif resource ELM: 10 years and counting. Nucleic Acids Res 42, D259–266. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Doorbar J, Quint W, Banks L, Bravo IG, Stoler M, Broker TR, and Stanley MA (2012). The biology and life-cycle of human papillomaviruses. Vaccine 30 Suppl 5, F55–70. [DOI] [PubMed] [Google Scholar]
- Dosztanyi Z, Csizmok V, Tompa P, and Simon I (2005). The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. J Mol Biol 347, 827–839. [DOI] [PubMed] [Google Scholar]
- Eissenberg JC, Ayyagari R, Gomes XV, and Burgers PM (1997). Mutations in yeast proliferating cell nuclear antigen define distinct sites for interaction with DNA polymerase delta and DNA polymerase epsilon. Mol Cell Biol 17, 6367–6378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Emmott E, Sorgeloos F, Caddy SL, Vashist S, Sosnovtsev S, Lloyd R, Heesom K, Locker N, and Goodfellow I (2017). Norovirus-Mediated Modification of the Translational Landscape via Virus and Host-Induced Cleavage of Translation Initiation Factors. Mol Cell Proteomics 16, S215–S229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Enard D, Cai L, Gwennap C, and Petrov DA (2016). Viruses are a dominant driver of protein adaptation in mammals. Elife 5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Federhen S (2012). The NCBI Taxonomy database. Nucleic Acids Res 40, D136–143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, Mitchell AL, Potter SC, Punta M, Qureshi M, Sangrador-Vegas A, et al. (2016). The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res 44, D279–285. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Frank E, Hall MA, and Witten IH (2016). The WEKA Workbench Online Appendix for “Data Mining: Practical Machine Learning Tools and Techniques”, Fourth edn (Morgan Kaufmann; ). [Google Scholar]
- Funk JO, Waga S, Harry JB, Espling E, Stillman B, and Galloway DA (1997). Inhibition of CDK activity and PCNA-dependent DNA replication by p21 is blocked by interaction with the HPV-16 E7 oncoprotein. Genes Dev 11, 2090–2100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fuss J, and Linn S (2002). Human DNA polymerase epsilon colocalizes with proliferating cell nuclear antigen and DNA replication late, but not early, in S phase. J Biol Chem 277, 8658–8666. [DOI] [PubMed] [Google Scholar]
- Garamszegi S, Franzosa EA, and Xia Y (2013). Signatures of pleiotropy, economy and convergent evolution in a domain-resolved map of human-virus protein-protein interaction networks. PLoS Pathog 9, e1003778. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Garzon JI, Deng L, Murray D, Shapira S, Petrey D, and Honig B (2016). A computational interactome and functional annotation for the human proteome. Elife 5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Geleziunas R, Xu W, Takeda K, Ichijo H, and Greene WC (2001). HIV-1 Nef inhibits ASK1-dependent death signalling providing a potential mechanism for protecting the infected host cell. Nature 410, 834–838. [DOI] [PubMed] [Google Scholar]
- Geuens T, Bouhy D, and Timmerman V (2016). The hnRNP family: insights into their role in health and disease. Hum Genet 135, 851–867. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gonzalez Plaza JJ, Hulak N, Kausova G, Zhumadilov Z, and Akilzhanova A (2016). Role of metabolism during viral infections, and crosstalk with the innate immune system. Intractable Rare Dis Res 5, 90–96. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grant A, Ponia SS, Tripathi S, Balasubramaniam V, Miorin L, Sourisseau M, Schwarz MC, Sanchez-Seco MP, Evans MJ, Best SM, et al. (2016). Zika Virus Targets Human STAT2 to Inhibit Type I Interferon Signaling. Cell Host Microbe 19, 882–890. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grinstein E, Wernet P, Snijders PJ, Rosl F, Weinert I, Jia W, Kraft R, Schewe C, Schwabe M, Hauptmann S, et al. (2002). Nucleolin as activator of human papillomavirus type 18 oncogene transcription in cervical cancer. J Exp Med 196, 1067–1078. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guindon S, and Gascuel O (2003). A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52, 696–704. [DOI] [PubMed] [Google Scholar]
- Halehalli RR, and Nagarajaram HA (2015). Molecular principles of human virus protein-protein interactions. Bioinformatics 31, 1025–1033. [DOI] [PubMed] [Google Scholar]
- Hall MA (1999). Correlation-based Feature Selection for Machine Learning In Computer Science (Hamilton, New Zealand: The Univeristy of Waikato; ). [Google Scholar]
- Hallast P, Maisano Delser P, Batini C, Zadik D, Rocchi M, Schempp W, Tyler-Smith C, and Jobling MA (2016). Great ape Y Chromosome and mitochondrial DNA phylogenies reflect subspecies structure and patterns of mating and dispersal. Genome Res 26, 427–439. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hamel R, Dejarnac O, Wichit S, Ekchariyawat P, Neyret A, Luplertlop N, Perera-Lecoin M, Surasombatpattana P, Talignani L, Thomas F, et al. (2015). Biology of Zika Virus Infection in Human Skin Cells. J Virol 89, 8880–8896. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Han X, Han Y, Jiao H, and Jie Y (2015). 14-3-3zeta regulates immune response through Stat3 signaling in oral squamous cell carcinoma. Mol Cells 38, 112–121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hu S, Du MQ, Park SM, Alcivar A, Qu L, Gupta S, Tang J, Baens M, Ye H, Lee TH, et al. (2006). cIAP2 is a ubiquitin protein ligase for BCL10 and is dysregulated in mucosa-associated lymphoid tissue lymphomas. J Clin Invest 116, 174–181. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang da W, Sherman BT, and Lempicki RA (2009a). Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 37, 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang da W, Sherman BT, and Lempicki RA (2009b). Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4, 44–57. [DOI] [PubMed] [Google Scholar]
- Hughes AL, Packer B, Welch R, Bergen AW, Chanock SJ, and Yeager M (2003). Widespread purifying selection at polymorphic sites in human protein-coding loci. Proc Natl Acad Sci U S A 100, 15754–15757. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hulo C, de Castro E, Masson P, Bougueleret L, Bairoch A, Xenarios I, and Le Mercier P (2011). ViralZone: a knowledge resource to understand virus diversity. Nucleic Acids Res 39, D576–582. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hulsen T, de Vlieg J, and Alkema W (2008). BioVenn - a web application for the comparison and visualization of biological lists using area-proportional Venn diagrams. BMC Genomics 9, 488. [DOI] [PMC free article] [PubMed] [Google Scholar]
- IARC Agents claslsified by the IARC monographs, vols.1–100 (Lyon, France: ). [Google Scholar]
- Jager S, Cimermancic P, Gulbahce N, Johnson JR, McGovern KE, Clarke SC, Shales M, Mercenne G, Pache L, Li K, et al. (2011). Global landscape of HIV-human protein complexes. Nature 481, 365–370. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jansson L, and Holmdahl R (1998). Estrogen-mediated immunosuppression in autoimmune diseases. Inflamm Res 47, 290–301. [DOI] [PubMed] [Google Scholar]
- Johannessen CM, Boehm JS, Kim SY, Thomas SR, Wardwell L, Johnson LA, Emery CM, Stransky N, Cogdill AP, Barretina J, et al. (2010). COT drives resistance to RAF inhibition through MAP kinase pathway reactivation. Nature 468, 968–972. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Johansen LM, Brannan JM, Delos SE, Shoemaker CJ, Stossel A, Lear C, Hoffstrom BG, Dewald LE, Schornberg KL, Scully C, et al. (2013). FDA-approved selective estrogen receptor modulators inhibit Ebola virus infection. Sci Transl Med 5, 190ra179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kajitani N, and Schwartz S (2015). RNA Binding Proteins that Control Human Papillomavirus Gene Expression. Biomolecules 5, 758–774. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Karst SM, Wobus CE, Goodfellow IG, Green KY, and Virgin HW (2014). Advances in norovirus biology. Cell Host Microbe 15, 668–680. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kasprzyk A (2011). BioMart: driving a paradigm change in biological data management. Database (Oxford) 2011, bar049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Katzenellenbogen RA, Vliet-Gregg P, Xu M, and Galloway DA (2010). Cytoplasmic poly(A) binding proteins regulate telomerase activity and cell growth in human papillomavirus type 16 E6-expressing keratinocytes. J Virol 84, 12934–12944. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kelman Z (1997). PCNA: structure, functions and interactions. Oncogene 14, 629–640. [DOI] [PubMed] [Google Scholar]
- Kerrien S, Aranda B, Breuza L, Bridge A, Broackes-Carter F, Chen C, Duesbury M, Dumousseau M, Feuermann M, Hinz U, et al. (2012). The IntAct molecular interaction database in 2012. Nucleic Acids Res 40, D841–846. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Krissinel E, and Henrick K (2007). Inference of macromolecular assemblies from crystalline state. J Mol Biol 372, 774–797. [DOI] [PubMed] [Google Scholar]
- Kuny CV, Chinchilla K, Culbertson MR, and Kalejta RF (2010). Cyclin-dependent kinase-like function is shared by the beta- and gamma- subset of the conserved herpesvirus protein kinases. PLoS Pathog 6, e1001092. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lee DK, Kim BC, Kim IY, Cho EA, Satterwhite DJ, and Kim SJ (2002). The human papilloma virus E7 oncoprotein inhibits transforming growth factor-beta signaling by blocking binding of the Smad complex to its target sequence. J Biol Chem 277, 38557–38564. [DOI] [PubMed] [Google Scholar]
- Letunic I, and Bork P (2016). Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic Acids Res 44, W242–245. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ley K (2003). The role of selectins in inflammation and disease. Trends Mol Med 9, 263–268. [DOI] [PubMed] [Google Scholar]
- Li C, Xu D, Ye Q, Hong S, Jiang Y, Liu X, Zhang N, Shi L, Qin CF, and Xu Z (2016). Zika Virus Disrupts Neural Progenitor Development and Leads to Microcephaly in Mice. Cell Stem Cell 19, 120–126. [DOI] [PubMed] [Google Scholar]
- Li W, and Godzik A (2006). Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659. [DOI] [PubMed] [Google Scholar]
- Li X, and Coffino P (1996). High-risk human papillomavirus E6 protein has two distinct binding sites within p53, of which only one determines degradation. J Virol 70, 4509–4516. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li X, Johansson C, Glahder J, Mossberg AK, and Schwartz S (2013). Suppression of HPV-16 late L1 5’-splice site SD3632 by binding of hnRNP D proteins and hnRNP A2/B1 to upstream AUAGUA RNA motifs. Nucleic Acids Res 41, 10488–10508. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lima NS, Rolland M, Modjarrad K, and Trautmann L (2017). T Cell Immunity and Zika Virus Vaccine Development. Trends Immunol 38, 594–605. [DOI] [PubMed] [Google Scholar]
- Liu HM, Loo YM, Horner SM, Zornetzer GA, Katze MG, and Gale M Jr. (2012). The mitochondrial targeting chaperone 14-3-3epsilon regulates a RIG-I translocon that mediates membrane association and innate antiviral immunity. Cell Host Microbe 11, 528–537. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Loytynoja A (2014). Phylogeny-aware alignment with PRANK. Methods Mol Biol 1079, 155–170. [DOI] [PubMed] [Google Scholar]
- Lozier M, Adams L, Febo MF, Torres-Aponte J, Bello-Pagan M, Ryff KR, Munoz-Jordan J, Garcia M, Rivera A, Read JS, et al. (2016). Incidence of Zika Virus Disease by Age and Sex - Puerto Rico, November 1, 2015-October 20, 2016. MMWR Morb Mortal Wkly Rep 65, 1219–1223. [DOI] [PubMed] [Google Scholar]
- Luck K, Sheynkman GM, Zhang I, and Vidal M (2017). Proteome-Scale Human Interactomics. Trends Biochem Sci 42, 342–354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ma Y, Walsh MJ, Bernhardt K, Ashbaugh CW, Trudeau SJ, Ashbaugh IY, Jiang S, Jiang C, Zhao B, Root DE, et al. (2017). CRISPR/Cas9 Screens Reveal Epstein-Barr Virus- Transformed B Cell Host Dependency Factors. Cell Host Microbe 21, 580–591 e587. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marchler-Bauer A, Bo Y, Han L, He J, Lanczycki CJ, Lu S, Chitsaz F, Derbyshire MK, Geer RC, Gonzales NR, et al. (2017). CDD/SPARCLE: functional classification of proteins via subfamily domain architectures. Nucleic Acids Res 45, D200–D203. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Melendy T, Sedman J, and Stenlund A (1995). Cellular factors required for papillomavirus DNA replication. J Virol 69, 7857–7867. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mihara T, Nishimura Y, Shimizu Y, Nishiyama H, Yoshikawa G, Uehara H, Hingamp P, Goto S, and Ogata H (2016). Linking Virus Genomes with Host Taxonomy. Viruses 8, 66. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mirabello L, Yeager M, Yu K, Clifford GM, Xiao Y, Zhu B, Cullen M, Boland JF, Wentzensen N, Nelson CW, et al. (2017). HPV16 E7 Genetic Conservation Is Critical to Carcinogenesis. Cell 170, 1164–1174 e1166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mohr IJ, Clark R, Sun S, Androphy EJ, MacPherson P, and Botchan MR (1990). Targeting the E1 replication protein to the papillomavirus origin of replication by complex formation with the E2 transactivator. Science 250, 1694–1699. [DOI] [PubMed] [Google Scholar]
- Muller N, Avota E, Schneider-Schaulies J, Harms H, Krohne G, and Schneider-Schaulies S (2006). Measles virus contact with T cells impedes cytoskeletal remodeling associated with spreading, polarization, and CD3 clustering. Traffic 7, 849–858. [DOI] [PubMed] [Google Scholar]
- Munoz N, Bosch FX, de Sanjose S, Herrero R, Castellsague X, Shah KV, Snijders PJ, Meijer CJ, and International Agency for Research on Cancer Multicenter Cervical Cancer Study, G. (2003). Epidemiologic classification of human papillomavirus types associated with cervical cancer. N Engl J Med 348, 518–527. [DOI] [PubMed] [Google Scholar]
- Musso D, and Gubler DJ (2016). Zika Virus. Clin Microbiol Rev 29, 487–524. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nicod C, Banaei-Esfahani A, and Collins BC (2017). Elucidation of host-pathogen protein- protein interactions to uncover mechanisms of host cell rewiring. Curr Opin Microbiol 39, 7–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ohta T, Michel JJ, Schottelius AJ, and Xiong Y (1999). ROC1, a homolog of APC11, represents a family of cullin partners with an associated ubiquitin ligase activity. Mol Cell 3, 535–541. [DOI] [PubMed] [Google Scholar]
- Olah J, Vincze O, Virok D, Simon D, Bozso Z, Tokesi N, Horvath I, Hlavanda E, Kovacs J, Magyar A, et al. (2011). Interactions of pathological hallmark proteins: tubulin polymerization promoting protein/p25, beta-amyloid, and alpha-synuclein. J Biol Chem 286, 34088–34100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ouyang P, Rakus K, van Beurden SJ, Westphal AH, Davison AJ, Gatherer D, and Vanderplasschen AF (2014). IL-10 encoded by viruses: a remarkable example of independent acquisition of a cellular gene by viruses and its subsequent evolution in the viral genome. J Gen Virol 95, 245–262. [DOI] [PubMed] [Google Scholar]
- Pang CL, Toh SY, He P, Teissier S, Ben Khalifa Y, Xue Y, and Thierry F (2014). A functional interaction of E7 with B-Myb-MuvB complex promotes acute cooperative transcriptional activation of both S- and M-phase genes. (129 c). Oncogene 33, 4039–4049. [DOI] [PubMed] [Google Scholar]
- Petrey D, Xiang Z, Tang CL, Xie L, Gimpelev M, Mitros T, Soto CS, Goldsmith-Fischman S, Kernytsky A, Schlessinger A, et al. (2003). Using multiple structure alignments, fast model building, and energetic analysis in fold recognition and homology modeling. Proteins 53 Suppl 6, 430–435. [DOI] [PubMed] [Google Scholar]
- Pichlmair A, Kandasamy K, Alvisi G, Mulhern O, Sacco R, Habjan M, Binder M, Stefanovic A, Eberle CA, Goncalves A, et al. (2012). Viral immune modulators perturb the human molecular network by common and unique strategies. Nature 487, 486–490. [DOI] [PubMed] [Google Scholar]
- Pickett BE, Sadat EL, Zhang Y, Noronha JM, Squires RB, Hunt V, Liu M, Kumar S, Zaremba S, Gu Z, et al. (2012). ViPR: an open bioinformatics database and analysis resource for virology research. Nucleic Acids Res 40, D593–598. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pietsch EC, and Murphy ME (2008). Low risk HPV-E6 traps p53 in the cytoplasm and induces p53-dependent apoptosis. Cancer Biol Ther 7, 1916–1918. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pyeon D, Newton MA, Lambert PF, den Boon JA, Sengupta S, Marsit CJ, Woodworth CD, Connor JP, Haugen TH, Smith EM, et al. (2007). Fundamental differences in cell cycle deregulation in human papillomavirus-positive and human papillomavirus-negative head/neck and cervical cancers. Cancer Res 67, 4605–4619. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Quinn K, Brindley MA, Weller ML, Kaludov N, Kondratowicz A, Hunt CL, Sinn PL, McCray PB Jr., Stein CS, Davidson BL, et al. (2009). Rho GTPases modulate entry of Ebola virus and vesicular stomatitis virus pseudotyped vectors. J Virol 83, 10176–10186. [DOI] [PMC free article] [PubMed] [Google Scholar]
- R Core Team. R Foundation for Statistical Computing (2016). R: A Language and Environment for Statistical Computing.
- Raman R, Tharakaraman K, Sasisekharan V, and Sasisekharan R (2016). Glycan-protein interactions in viral pathogenesis. Curr Opin Struct Biol 40, 153–162. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Remmert M, Biegert A, Hauser A, and Soding J (2011). HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods 9, 173–175. [DOI] [PubMed] [Google Scholar]
- Rice P, Longden I, and Bleasby A (2000). EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet 16, 276–277. [DOI] [PubMed] [Google Scholar]
- Rogers J, and Gibbs RA (2014). Comparative primate genomics: emerging patterns of genome content and dynamics. Nat Rev Genet 15, 347–359. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rose AS, Bradley AR, Valasatava Y, Duarte JM, Prlic A, and Rose PW (2018). NGL Viewer: Web-based molecular graphics for large complexes. Bioinformatics. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sawyer SL, Wu LI, Emerman M, and Malik HS (2005). Positive selection of primate TRIM5alpha identifies a critical species-specific retroviral restriction domain. Proc Natl Acad Sci U S A 102, 2832–2837. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Scaturro P, Stukalov A, Haas DA, Cortese M, Draganova K, Plaszczyca A, Bartenschlager R, Gotz M, and Pichlmair A (2018). An orthogonal proteomic survey uncovers novel Zika virus host factors. Nature 561, 253–257. [DOI] [PubMed] [Google Scholar]
- Shah PS, Link N, Jang GM, Sharp PP, Zhu T, Swaney DL, Johnson JR, Von Dollen J, Ramage HR, Satkamp L, et al. (2018). Comparative Flavivirus-Host Protein Interaction Mapping Reveals Mechanisms of Dengue and Zika Virus Pathogenesis. Cell 175, 1931–1945 e1918. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, and Ideker T (2003). Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13, 2498–2504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Software, G. GraphPad Software, La Jolla, California, USA, www.graphpad.com.
- Subbiah S, Laurents DV, and Levitt M (1993). Structural similarity of DNA-binding domains of bacteriophage repressors and the globin core. Curr Biol 3, 141–148. [DOI] [PubMed] [Google Scholar]
- Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, et al. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102, 15545–15550. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sun H, Liu T, Zhu D, Dong X, Liu F, Liang X, Chen C, Shao B, Wang M, and Wang Y (2017). HnRNPM and CD44s expression affects tumor aggressiveness and predicts poor prognosis in breast cancer with axillary lymph node metastases. Genes Chromosomes Cancer 56, 598–607. [DOI] [PubMed] [Google Scholar]
- Takizawa CG, and Morgan DO (2000). Control of mitosis by changes in the subcellular location of cyclin-B1-Cdk1 and Cdc25C. Curr Opin Cell Biol 12, 658–665. [DOI] [PubMed] [Google Scholar]
- Tang H, Hammack C, Ogden SC, Wen Z, Qian X, Li Y, Yao B, Shin J, Zhang F, Lee EM, et al. (2016). Zika Virus Infects Human Cortical Neural Progenitors and Attenuates Their Growth. Cell Stem Cell 18, 587–590. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tanner J, Whang Y, Sample J, Sears A, and Kieff E (1988). Soluble gp350/220 and deletion mutant glycoproteins block Epstein-Barr virus adsorption to lymphocytes. J Virol 62, 4452–4464. [DOI] [PMC free article] [PubMed] [Google Scholar]
- The Gene Ontology C (2017). Expansion of the Gene Ontology knowledgebase and resources. Nucleic Acids Res 45, D331–D338. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tjalma WA, Weyler JJ, Bogers JJ, Pollefliet C, Baay M, Goovaerts GC, Vermorken JB, van Dam PA, van Marck EA, and Buytaert PM (2001). The importance of biological factors (bcl-2, bax, p53, PCNA, MI, HPV and angiogenesis) in invasive cervical cancer. Eur J Obstet Gynecol Reprod Biol 97, 223–230. [DOI] [PubMed] [Google Scholar]
- Tripathi S, Pohl MO, Zhou Y, Rodriguez-Frandsen A, Wang G, Stein DA, Moulton HM, DeJesus P, Che J, Mulder LC, et al. (2015). Meta- and Orthogonal Integration of Influenza “OMICs” Data Defines a Role for UBR4 in Virus Budding. Cell Host Microbe 18, 723–735. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tsuchiya N, Kamei D, Takano A, Matsui T, and Yamada M (1998). Cloning and characterization of a cDNA encoding a novel heterogeneous nuclear ribonucleoprotein-like protein and its expression in myeloid leukemia cells. J Biochem 123, 499–507. [DOI] [PubMed] [Google Scholar]
- Van den Broeke C, Jacob T, and Favoreel HW (2014). Rho’ing in and out of cells: viral interactions with Rho GTPase signaling. Small GTPases 5, e28318. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vanhee P, Reumers J, Stricher F, Baeten L, Serrano L, Schymkowitz J, and Rousseau F (2010). PepX: a structural database of non-redundant protein-peptide complexes. Nucleic Acids Res 38, D545–551. [DOI] [PMC free article] [PubMed] [Google Scholar]
- von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, and Bork P (2002). Comparative assessment of large-scale data sets of protein-protein interactions. Nature 417, 399–403. [DOI] [PubMed] [Google Scholar]
- Warnes GR, Bolker B, Bonebakker L, Gentleman R, Huber W, Liaw A, Lumley T, Maechler M, Magnusson A, Moeller S, et al. (2016). gplots: Various R Programming Tools for Plotting Data.
- Webster JI, Tonelli L, and Sternberg EM (2002). Neuroendocrine regulation of immunity. Annu Rev Immunol 20, 125–163. [DOI] [PubMed] [Google Scholar]
- White EA, and Munger K (2017). Crowd Control: E7 Conservation Is the Key to Cancer. Cell 170, 1057–1059. [DOI] [PubMed] [Google Scholar]
- Wobus CE, Thackray LB, and Virgin H.W.t. (2006). Murine norovirus: a model system to study norovirus biology and pathogenesis. J Virol 80, 5104–5112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu KY, Zuo GL, Li XF, Ye Q, Deng YQ, Huang XY, Cao WC, Qin CF, and Luo ZG (2016). Vertical transmission of Zika virus targeting the radial glial cells affects cortex development of offspring mice. Cell Res 26, 645–654. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xiang Z, and Honig B (2001). Extending the accuracy limits of prediction for side-chain conformations. J Mol Biol 311, 421–430. [DOI] [PubMed] [Google Scholar]
- Yang AS, and Honig B (2000a). An integrated approach to the analysis and modeling of protein sequences and structures. I. Protein structural alignment and a quantitative measure for protein structural distance. J Mol Biol 301, 665–678. [DOI] [PubMed] [Google Scholar]
- Yang AS, and Honig B (2000b). An integrated approach to the analysis and modeling of protein sequences and structures. III. A comparative study of sequence conservation in protein structural families using multiple structural alignments. J Mol Biol 301, 691–711. [DOI] [PubMed] [Google Scholar]
- Yang Z (2007). PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24, 1586–1591. [DOI] [PubMed] [Google Scholar]
- Zhang QC, Petrey D, Deng L, Qiang L, Shi Y, Thu CA, Bisikirska B, Lefebvre C, Accili D, Hunter T, et al. (2012). Structure-based prediction of protein-protein interactions on a genome-wide scale. Nature 490, 556–560. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Figure S1. Related to Figure 1. Summary of viral families and Baltimore categories interrogated by P-HIPSTer and homology modeling coverage across Baltimore categories. a) Fully sequenced viruses infecting humans were downloaded from VirusHostDB (https://www.genome.jp/virushostdb/). For a complete list of viruses and Baltimore category as well as Family membership, see Table S1. b) Fraction of modelled residues per virus within each Baltimore category; c) Fraction of disordered residues per virus within each Baltimore category. Structure modeling was performed as described in methods. Center lines correspond to median values and whiskers range from min to max.
Table S1. Related to Figure 1. Viral and human datasets and predicted viral-human PPIs. Description of viral and human datasets and overview of P-HIPSTer PPI predictions (with a final LR >= 100) for each viral protein, virus and human protein.
Table S2. Related to Figures 1 and 3. Experimental validation of P-HIPSTer predictions. Summary results of 65 co-immunoprecipitation experiments on predicted PPIs.
Table S3. Related to Figure 2. ZIKV-Human interactome. ZIKV-Human PPI network combining predicted ZIKV-Human PPIs and experimentally derived human PPIs from Intact.
Table S4. Related to Figure 4. Predicted cellular factors that discriminate between HR-and LR-HPVs. Annotation of human proteins predicted to preferentially bind HR-HPVs (Group I), LR-HPVs (Group II) or both (Group III).
Table S5. Related to Figure 5. Shared and unique pathways targeted across viruses belonging different nucleic acid types.
Table S6. Related to Figure 5. Pathway-based clustering of human-infecting viruses. Viruses are described based on the corresponding set of enriched pathways. This is then used to identify shared functional targeting across human-infecting viruses.
Table S7. Related to Figure 6. Functional enrichment analysis on P-HIPSTer resolved human proteins. Human proteins under positive selection or purifying selection are further divided into virus interacting or virus non-interacting proteins. Functional enrichments on each subset were carried out using DAVID.
Table S8. Related to Key resource table. Plasmids and primers used in the accompanying manuscript. Human and viral plasmid employed to express proteins of interest during Co-immunoprecipitation assays and description of primers utilized in this work.
Figure S2. Related to Figure 1 and Figure 2. Empirical validation of P-HIPSTer predictions. 65 P-HIPSTer predictions spanning LR values between 1.2 and 1,106 were selected for co-IP. Briefly, 293T cells were transfected with relevant expression plasmids and lysates were subjected to immunopre- cipitation with indicated antibodies followed by SDS-PAGE electrophoresis and immunoblotting. Shown are representative gels. All experiments were performed at least 3 times to ensure reproducibility. For further details, please refer to Supplemental Materials and Methods. A complete summary of validation experiments is available in Table S3.
Figure S3. Related to Figures 1 and 5. P-HIPSTer uncovers viral targeting of infection related pathways. Enriched biological pathways (a) and molecular functions (b) within a set of 5,749 human proteins predicted to interact with human viruses. Enriched biological pathways (c) and molecular functions (d) within a set of 173 human proteins predicted to interact with ≥ 100 human viruses.
Figure S4. Related to Figure 5. P-HIPSTer uncovers shared and unique machinery employed across viruses of differing nucleic acid type. a) Venn diagram illustrating overlap of enriched biological pathways within each nucleic acid type (indicated are number of pathways within each region of the Venn diagram). Shown are pathway categories that are over-represented (x-axis denotes number of enriched pathways within each category). Categories that are significantly enriched within each region of the Venn diagram are highlighted in light purple. b) Clustering of immune-related pathways and viruses based on predicted pathway enrichment. Dark cells within the heatmap correspond to pathways enriched with q-value < 0.01. Highlighted are pathway classes and viral families corresponding to co-clustered pathways and viruses.
Figure S5. Related to Figure 5. Comparison of structural and sequence similarity across viral protein pairs. a) Subset of viral proteins with low pairwise sequence identity (<40%) where proteins in each pair corresponds to a virus from a different nucleic acid type: DNA/RNA (Red), DNA/RT (Blue), RNA/RT (Green). b) Subset of viral proteins across all nucleic acid types with low pairwise sequence identity (<40%). c) Viral capsid proteins corresponding to viruses from the same viral family (Black) or different Baltimore categories (Blue). Vertical bars indicate threshold to consider a pair of viral proteins as global structural neighbors (Structural Alignment Score SAS ≤ 3.5Å).
Data Availability Statement
All data generated as part of this study is available at phipster.org and GITHUB (https://github.com/RabadanLab/pamler). In addition, P-HIPSTer code is made available upon request.






