SUMMARY
Cryo-EM has revolutionized spliceosome structural biology and structures representing much of the splicing process have been determined. Comparison of these structures is challenging due to extreme dynamics of the splicing machinery and the thousands of changing interactions during splicing. We have used network theory to analyze splicing factor interactions by constructing structure-based networks from protein-protein, protein-RNA, and RNA-RNA interactions found in eight different spliceosome structures. Our analyses reveal that connectivity dynamics result in step-specific impacts of factors on network topology. The spliceosome’s connectivity is focused on the active site in part due to contributions from non-globular proteins. Many essential factors exhibit large shifts in centralities during splicing. Others show transiently high betweenness centralities at certain stages thereby suggesting mechanisms for regulating splicing by briefly bridging otherwise poorly connected network nodes. These observations provide insights into organizing principles of the spliceosome and provide frameworks for comparative analysis of other macromolecular machines.
Graphical Abstract

eTOC Blurb
Spliceosomes dramatically change composition and conformation during pre-mRNA splicing. Kaur, van der Feltz, et al. use network theory to analyze cryo-EM structures of yeast spliceosomes to identify how the connections between dozens of splicing factors are transformed as splicing progresses.
INTRODUCTION
The spliceosome recognizes and removes non-coding regions (introns) from precursor-mRNAs (pre-mRNAs) while ligating together the flanking exons. The spliceosome is a ribonucleoprotein (RNP), comprised of five non-coding small nuclear RNAs (snRNAs) and nearly a hundred protein factors (Wahl et al., 2009). The U1, U2, U4, U5, and U6 snRNAs in the spliceosome have distinct functions and assemble on pre-mRNAs as part of small nuclear ribonucleoprotein (snRNP) subcomplexes (the U1 and U2 snRNPs and U4/U6.U5 tri-snRNP). In addition, spliceosomes also contain a number of non-snRNP associated splicing factors including the NineTeen complex (NTC) and NTC-associated (NTA) proteins. Pre-mRNA splicing is a highly dynamic process involving assembly of splicing factors on pre-mRNAs (to form B complex spliceosomes); an activation phase in which the nascent spliceosome is remodeled to create an RNA active site (Bact, B*1 complexes); organization and re-organization of the machinery to carry out the two sequential transesterification reactions (B*2, C, C* complexes); and finally release of the spliced mRNA from the product (P) complex and disassembly of the intron-lariat spliceosome (ILS) (Wahl et al., 2009). The entire process is driven forward by eight ATPases, each functioning at specific stages in splicing (Wahl et al., 2009).
Spliceosomes are highly dynamic in terms of both their composition and structure. More than a dozen distinct spliceosome complexes have been characterized, and it is certain that several other intermediates exist along the splicing pathway (Plaschka et al., 2019). Cryogenic electron microscopy (cryo-EM) has revolutionized studies of splicing by allowing structures of entire spliceosomes to be determined. More than 35 spliceosome structures from Saccharomyces cerevisiae (hereafter, yeast), Schizosacharomyces pombe, and H. sapiens have provided unparalleled insights into how splicing factors interact and function (Kastner et al., 2019; Plaschka et al., 2019; Yan et al., 2019). Nevertheless, the complexity and highly dynamic nature of the spliceosome makes comparison of different structures challenging since hundreds or thousands of intermolecular interactions may be changing from one state to the next.
Network theory provides a framework for analysis of large and complex sets of interactions such as those typified by the spliceosome. Analysis of networks provides quantitative information on the individual connectivity of each component as well as the structure of the network as a whole. At the component level, centrality parameters, such as eigenvector and betweenness centralities, can be determined for each network node. These parameters incorporate information about how that component contributes to the entire network organization. For the whole network, descriptors can include the average degree (the average number of connections for components), the algebraic connectivity (a measure of interconnectivity of all network components), and modularity (subdivisions of the network containing high connectivity between components but less to other subdivisions).
Protein-protein interaction (PPI) network analysis has been applied to many systems and is enabled by a substantial number of PPIs catalogued in various databases (MINT, STRING, DIP, PSI-MI, and InACT). These applications have broadened the scope of spectral graph theory to the understanding the biological systems, identifying the functions of genes, and drug discovery (Athanasios et al., 2017; Brun et al., 2003; Hasan et al., 2012; Miryala et al., 2018; Yu et al., 2013). In applications to structural biology, network analysis has been used to determine the inter-component interactions (edges) of individual residues, domains or whole molecules (nodes). These methods have been used extensively to understand the structure, folding, and function of proteins including splicing factors by analysis of different centrality metrics (Brinda and Vishveshwara, 2005; David-Eden and Mandel-Gutfreund, 2008; Fanelli et al., 2016; Menichetti et al., 2016; Negre et al., 2018; Shao et al., 2020b; Vendruscolo et al., 2002; Yan et al., 2018; Yan et al., 2014). Structural network analysis has been more limited for macromolecular machines of comparable complexity to the spliceosome, although it has been used effectively to study the peptidyl transferase center of the ribosome and intermolecular interactions observed in trypanosome mitochondrial ribosomes (David-Eden and Mandel-Gutfreund, 2008; Ramrath et al., 2018).
For the splicing machinery, network theory has been applied to both functional data from biological experiments as well as PPIs from the STRING database (Carbonell et al., 2019; Guimarães et al., 2018; Papasaikas et al., 2015; Pires et al., 2015). However, these studies are difficult to interpret with respect to molecular mechanisms since snRNAs are not part of the STRING database and without structural data the intermolecular contacts that cause changes in biological function are unclear. Recently, network theory analysis of a spliceosome structure combined with molecular dynamic simulations were used to identify putative information exchange pathways among splicing machinery components (Clf1, Cwc2, and Prp8) in the yeast C complex spliceosome (Saltalamacchia et al., 2020; Shao et al., 2020a). It is not clear, though, how the C complex structural network is assembled or changes during splicing.
Here, we calculated the structural networks of eight different spliceosome models determined from cryo-EM data representing the B→Bact→B*1→B*2→C→C*→P→ILS complexes (Fica et al., 2017; Galej et al., 2016; Plaschka et al., 2017; Wan et al., 2019; Wan et al., 2017; Wilkinson et al., 2017; Yan et al., 2016). This was enabled by creation of software that permits analysis of both RNA and protein contributions to networks using available PDB models. Comparison of these networks reveals dynamic changes in connectivity, modularity, module composition, and centrality parameters for splicing factors throughout the reaction. They also reveal how connectivity of the spliceosome becomes focused on the active site and the contributions of elongated (non-globular) and nonessential proteins make to this organization. In comparison with networks derived from PPI or functional data, our analysis has implications for understanding how perturbations propagate through different modules to influence multiple steps in splicing, how factors with high betweenness influence specific steps in splicing, and how network topological parameters correlate with lethality phenotypes.
RESULTS
We used network analysis to study eight models of yeast spliceosomes built from cryo-EM maps of complexes captured during activation, catalysis, and disassembly (Fica et al., 2017; Galej et al., 2016; Plaschka et al., 2017; Wan et al., 2019; Wan et al., 2017; Wilkinson et al., 2017; Yan et al., 2016). Each model was treated as its own network and analyzed individually (Fig. 1). First, network nodes were defined as individual single chains in the models except for proteins making up the heteroheptameric Lsm and Sm rings, which were grouped together as single nodes (e.g., the U2 Sm ring was considered a single node), the Prp19 homotetramer was also grouped as a single node, and the pre-mRNA substrate was divided based on continuously resolved regions in the EM density (e.g., split into 5’ SS and BS nodes when appropriate). The published models were further edited by assigning or removing protein chains of unknown identity and removal of glycines in poly-alanine modeled regions where the maps precluded modeling of amino acid side chains (Table S1A, Data S1). The removal of glycines from these regions was necessary to prevent rejection of the PDB data by PDBePISA, which was used to identify interactions (Krissinel and Henrick, 2007b).
Figure 1. Structural network analysis workflow and network properties.

(A) Example workflow using the yeast B complex spliceosome (shown in tan, PDB ID: 5NRL). PDB coordinate files are first edited so that network nodes and edges can be identified. Edges are calculated as the buried surface areas of molecular interfaces (e.g., the interface between the B complex components Prp8 (dark grey) and Snu114 (light grey) is shown in red and blue). Nodes and edges are then used to construct the network. Bottom Left. The network topology for B complex is shown with snRNAs (2, 4, 5, and 6) and some of the largest proteins noted (Prp8, Brr2, and Hsh155). Bottom Middle. Computationally defined modules in B complex are separately colored (U5 module, blue; U6 module, green; U2 module, yellow). Bottom Right. B complex components are colored by eigenvector centrality (CEV). (B) Fold changes in structural network parameters for spliceosomes transitioning between complexes. Structure images were generated using UCSF Chimera, and the network display was generated with Gephi.
We then defined edges in our networks as the interactions between splicing factors resulting in buried surfaces inaccesible to 1.4 Å diameter probe molecules (Fig. 1A) (Krissinel and Henrick, 2007a; Krissinel and Henrick, 2007b). The nodes and edges were then used to generate a network for each splicing complex. The network was not weighted by the amount of buried surface area (BSA) since the importance of a component’s interactions may not be directly correlated with this value and calculations of BSA in regions for which the cryo-EM maps are poor may not be accurate.
We note that two caveats of this approach are that it is restricted to models built from interpretable EM density and it assumes EM densities were interpreted correctly when models were constructed. We were unable to include information in our networks for the large amount of protein mass that is unaccounted for in all available spliceosome models (ranging from 22–38% of total amino acids, Table S1A). We assume these represent protein domains which interact weakly or on the periphery of the spliceosome. We accounted for remaining uncertainties where possible by comparison of independently determined structures of spliceosomes captured in similar states to verify model accuracy (Data S1). While it is possible that our networks are biased towards stable interactions over those that occur transiently as well as towards interactions present in trapped spliceosomes amenable to structural elucidation, we believe they are accurate descriptions of the pre-dominant intermolecular interactions occurring within and between each spliceosome complex.
Topology Analysis Reveals Oscillations in Algebraic Connectivity
Once the networks were computed, we first analyzed their topological information (Fig. 1A, B). The average degree (or number of connections a node makes to other nodes) ranges from 6.28–8.50 for each spliceosome network, a variation of ~1.4-fold from B to ILS complex (Fig. 1B, Table S1B). However, the level of intermolecular connectedness of each spliceosome network (the algebraic connectivity) varies by a larger factor both for the overall transition from B to ILS complex (~2.9-fold) and for intermediary steps (Fig. 1B). In random networks, an increase in the number of nodes results in a decrease in algebraic connectivity. Therefore, spliceosome network topologies indicate that they are not organized randomly (e.g., nodes increase from 34 to 40 between the B and Bact networks while the connectivity doubles from 0.36 to 0.80, Table S1B). This confirms that our networks capture the specific, non-random intermolecular interactions formed by biomolecular structures.
Networks computed from models of spliceosomes captured just before or after the catalytic steps of splicing (B*2, C, P) had the highest values of algebraic connectivity (Fig. 1B). High values of algebraic connectivity indicate a greater degree of interconnection and more difficulty in separating the network into individual components or sub-networks (Jamakovic and Van Mieghem, 2008). This shows that B*2, C, and P complex networks are the most robust and resistant to change due to removal of a given node or edge.
In contrast we observed low values of connectivity associated with the B and ILS complex networks and networks obtained from complexes transitioning to or between catalytic conformations (B*1, C*; Fig. 1B). The transition between low connectivity in C* complex and high connectivity in P complex was surprising given that the structures strongly resemble one another and have nearly identical compositions. The low connectivity of C* complex is largely due to the poor linkage of Prp22 ATPase and the 3’ exon to the rest of the network. Deletion of the 3’ exon node significantly increases network connectivity to a larger degree than expected from deletion of a random interaction (Fig. S1A–C), and yields a network with almost as high connectivity as the P complex. In P complex itself, Prp22 and the 3’ exon are much more integrated and form more connections with the network as do several other factors including Slu7, Prp18, and the U2 and U6 snRNAs (Fig. S1A, B; Table S1B).
The oscillations between high and low algebraic connectivity are consistent with spliceosome networks being most interconnected when derived from structures representing catalytic stages and less interconnected in networks derived from structures representing transitions between these stages. Large changes in connectivity of the networks can result from binding of factors, like Prp22, to the periphery of the spliceosome which results in few edges to these factors in the corresponding networks. Thus, in structural networks algebraic connectivity can provide insight into the level of integration across the network as well identify poorly connected factors that have large influences on network topological parameters.
Spliceosome Modules are Highly Integrated and Dynamic
Spliceosomes are built from discrete complexes (the U snRNPs, NTC, and other protein splicing factors) that associate with one another and the pre-mRNA. We next studied if networks describing assembled spliceosomes maintain these complexes as discrete subnetworks that are more highly coordinated between their members than to other nodes. To do this, we calculated the modularity of each network as values ranging from 0 for a highly interconnected network to 1 for a highly modular one using MODULAR (Marquitti et al., 2014). We also used this software to calculate the number and composition of the modules for each network and categorized each module by its component snRNA (U2, U5, U6) or by the lack of snRNAs (non-snRNA). Importantly, these network modules are defined only by the presence or absence of edges/interactions in the network models of the structures and not based on buried surface areas of the interactions or additional biochemical or genetic data. Our analysis indicates that the spliceosome networks begin as highly modular in B and Bact complexes but then modularity decreases as the original subcomplexes integrate to form the active site (Fig. 2A). Thus, the network modularity analysis recapitulates expectations for snRNP integration and active site folding of the spliceosome derived from other types of experimental data.
Figure 2. Module Integration and Contributions to Network Topology.

(A) Calculated modularity values for spliceosome structural networks. (B) Venn diagram of the total number of protein and snRNA components in the combined B→ILS structural networks associated with a given module or set of modules. Module components are listed in Table S1C–F. (C) The relative contribution of each module to total CEV for each structural network. (D) The relative contribution of each module to total CBW for each structural network.
While the overall modularity of the spliceosome networks remains nearly constant following activation, the compositions of the modules are highly dynamic and can change dramatically from one structural network to the next. These changes in module composition reflect the totality of the many dozens of altered interactions between structures. Nineteen factors in our analysis transiently associate with more than one module and eight of these associate with three different modules over the course of splicing (Fig. 2B, Tables S1C–F). The snRNAs almost never segregated within the same module (the exception being co-segregation of U4 and U6 snRNAs in B complex), and the NTC/NTA proteins dispersed among other modules. One consequence of these dynamics is that it makes the spliceosome networks less robust to perturbations at particular stages. To illustrate this concept, we deleted the node corresponding to the non-essential NTC/NTA splicing factor Ecm2 and calculated the resulting changes in connectivity in each network (Fig. S1D). The impact of Ecm2 deletion ranges from a loss of ~10% of network paths in Bact complex to a loss of over 40% of paths in B*1 complex. Interestingly, Ecm2 was recently proposed to play roles in both catalytic stages of splicing (van der Feltz et al., 2021). Consistent with these roles, it is in some of the spliceosome networks from these stages in which Ecm2 deletion has the greatest impact on network paths (B*1, B*2, C*, and P complexes). These results suggest that the dynamic changes in module composition can reflect the step-specific functions of certain splicing factors. These factors could potentially be identified by determining how their corresponding nodes influence the topology of certain networks along the splicing pathway.
U5 and U6-snRNA Containing Modules are Major Contributors to Network Topology
To further address how individual modules or splicing factors can have step-specific influence over network topology, we analyzed the contribution each module makes to the overall sum of centrality parameters for each network. We computed contributions to eigenvector centrality (CEV, high CEV indicates a major network intersection as it connects to other nodes of high connectivity; Fig. 2D) and betweenness centrality (CBW, high CBW indicates a major bridge in the shortest path between nodes; Fig. 2D). In all networks, the snRNA-containing modules make the largest contributions to CEV and CBW. The U5 and U6 modules are often the highest contributors to CEV. For U5, this echoes the large number of connections the Prp8 protein makes with splicing factors in other modules. For U6, this is consistent with its role in forming the spliceosome active site and its scaffolding by splicing factors. U6 is also the highest contributor to both CEV and CBW in the B complex spliceosome, reflecting the fact that interactions with and through U6 are central for the allosteric cascade of conformational changes driving spliceosome activation (Brow, 2002). It is interesting to note that while the U6 module often contributes the most highly to CEV, this does not always correlate with a high contribution to CBW (Fig. 2C vs. 2D). In other words, while many splicing factors connect to U6 within the network, many of these interactions do not result in linkages between splicing factors that are uniquely bridged by U6. This suggests a large degree of redundancy in the U6 interaction network. In contrast, the U5 module contributes very highly to CBW, again driven by a large number of connections that are uniquely bridged by the Prp8 protein.
Dynamic and Step-Specific Bridges Between Splicing Factors
The high contributions by the Prp8 protein to U5 module CEV and CBW are consistent with its central role in spliceosome structure and its large size (Plaschka et al., 2019; Saltalamacchia et al., 2020). Prp8 contributes highly to spliceosome network centrality parameters throughout the entire reaction. However, it is not the only splicing factor to do so. The U2 snRNA and NTC/NTA component Cef1 function alongside Prp8 as the three strongest contributors to CBW in the Bact through ILS networks (Fig. 3A–C).
Figure 3. Dynamic Changes in Betweenness of Splicing Factors.

(A) Depiction of the spliceosome B*1 complex (PDB ID: 6J6H) with components colored by CBW. Factors discussed in the text with high CBW are identified. (B) Calculated CBW values for Prp8 and Prp45 are plotted for each splicing factor structural network in comparison with the average CBW of all splicing factors in the corresponding complex. (C) CBW values for the snRNAs in each structural network in comparison with the average CBW of all splicing factors in the corresponding complex. For (B) and (C), the circle size represents the ranking of the CBW value of that component in relation to others in the complex.
Even for Prp8, structural dynamics of the spliceosome result in large changes in CBW from one network to the next (Fig. 3B). While Prp8 is always among the components with highest CBW, this value changes by more than 4-fold during splicing. This results from the many interactions between nodes that are uniquely bridged by Prp8 fluctuating during splicing: conformational changes result in some of these interactions being lost or gained or no longer being unique. For example, the increase in CBW for Prp8 in C and C* networks can be partially attributed to unique connections between the Prp8 and Prp16 or Prp22 ATPase nodes, respectively. The U6 snRNA is an extreme example of fluctuation in CBW. It is among the highest contributors to CBW in the B complex network before activation results in a loss of CBW due to increased path redundancy between nodes connected through U6 for the remainder of the splicing reaction (Fig. 3C).
Other factors can exhibit high CBW only at certain stages. For example, the NTC/NTA protein Prp45 node has high CBW only in Bact and ILS networks (Fig. 3B). At other stages it does not deviate from the average splicing factor CBW. The high values for CBW in these networks are due to Prp45 serving as the major bridge for linkage of the REtention and Splicing (RES) or the NTC-Related (NTR) complexes to the center of the spliceosome (Fig. S2). Changes in CBW for Prp45 are correlated with a change in the number of connections (degree) Prp45 makes to other nodes (Table S1G). However, changes in degree and CBW are not always correlated for splicing factors, and we identified multiple examples in which these values are also anti-correlated or independent of one another (Table S1G). This shows that connectivity of the spliceosome networks does not increase or decrease in complexity simply due to the addition or subtraction of various nodes. Rather, connectivity is also changing due to re-arrangement of connections between existing nodes.
Network Interactions Focus on the Active Site
We next analyzed how the connectivity of spliceosome protein nodes corresponds with the spatial position of those same proteins relative to the active site. We approximated the spliceosome active site as being centered around the U6 snRNA nucleotide G60 and then summed CEV values for all proteins that are at least partly contained within shells of increasing radius (Fig. 4A). This analysis shows that the proteins closest to the active site (within 10 Å) have a much higher total CEV than those that located on the periphery. Interestingly, this analysis also showed that the spliceosome’s region of highest CEV is the active site and not always the center of mass (Fig. S3, Table S1H). Thus, the high centrality values likely reflect an abundance of functional interactions being directed towards the active site rather than being due to the amount of protein present at that location.
Figure 4. Eigenvector Centrality is Centered on the Spliceosome Active Site.

(A) Sum of splicing factor CEV values as a function of node distance from the spliceosome active site (approximated as U6 snRNA nucleotide G60). (B) Bact complex (PDB ID: 5GM6) highlighting the location of key factors in contact with the 20–40 Å shell from the active site. (C) P complex (PDB ID: 6EXN) highlighting the location of 2nd step factors in contact with the 20–40 Å shell from the active site.
The sum of CEV values decreased with increasing distance from the active site (Fig. 4A) indicating that the factors on the periphery of the spliceosome make fewer connections with highly interconnected components than those nearer the active site. This observation held true except for a few instances: proteins in the 20–40 Å shell had higher total CEV than those in the 10–20 Å shell in Bact, C*, and P complexes. This expansion of CEV is not just due to new factors arriving but arises from the location and connection of factors in the network. Specifically, we identified that the higher CEV sums were due to Hsh155 (Bact network) and the 2nd step protein factors (Prp17, Prp22, and Slu7; C* and P networks) being partially located within these shells (Fig. 4B, C). These proteins serve as major interaction hubs and are highly connected to splicing factors distant from the active site.
Elongated and Nonessential Factors are Major Contributors to Connectivity
When we mapped CEV values onto the proteins closest to the active site (Fig. 5A), we noticed that several of the factors with the highest CEV shared common features: they have non-globular structures and they are conditionally non-essential (i.e., not required for yeast proliferation under permissive conditions) components of the yeast NTC/NTA complex. We first analyzed how non-globular proteins contribute to our calculated spliceosome networks. While many of the spliceosome’s largest proteins have well-defined three-dimensional folds (Prp8, Brr2, Hsh155), a number of proteins lack globular structural domains. We calculated surface areas for each spliceosome protein and divided these values by the number of amino acids (aa) resolved in each protein to yield the protein’s elongation factor (EF, Table S1I). The distribution of EF values for splicing factors showed multiple peaks consistent with globular (e.g., EFSnu114 = ~47 Å2/aa), intermediate (e.g., EFCwc2 = ~60 Å2/aa) and highly elongated proteins (e.g., EFIsy1 >80 Å2/aa) (Fig. S4). Based on this distribution, we defined elongated proteins as those with EF > 75 Å2/amino acid.
Figure 5. Contributions of Elongated and Nonessential Proteins to Spliceosome Network Connectivity.

(A) View of the C complex spliceosome active site (PDB ID: 5LJ5) with elongated proteins with values of CEV higher than the average for the complex (⟨CEV⟩ = 0.032) are shown in shades of red and active site RNA nucleotides are shown in orange. CEV values for the proteins are shown in parenthesis. (B) Histogram of the total number of network nodes in all spliceosome networks with given values of CEV (grey) plotted in comparison with the fraction of those nodes originating from elongated proteins in each histogram bin (red). (C) Sum of CEV values for essential and nonessential NTC/NTA components in each spliceosome structural network. (D) Sum of CBW values for essential and nonessential NTC/NTA components in each spliceosome structural network.
We then examined how EF correlates with CEV in spliceosome networks. We created a histogram in which we binned all of the calculated CEV parameters for protein nodes. We then calculated the fraction of elongated proteins within each bin (Fig. 5B). The histogram reveals several features of the spliceosome networks. First, the distribution of CEV is multi-modal. The largest bins correspond to splicing factors that are not highly interconnected and exhibit low CEV values (0–0.03). A second cluster shows a peak within the 0.04–0.05 CEV bin and tails off towards higher values of CEV. Bins containing proteins with CEV >0.07 were dominated by Prp8. Thus, CEV is not equally distributed across the nodes of spliceosome networks, rather these networks contain many nodes of both low and high influence.
Elongated proteins make up just over half of the nodes in the four highest CEV bins (0.05<CEV<0.09; 53% elongated proteins; Fig. 5B). Furthermore, the vast majority of elongated proteins with high CEV are members of the NTC/NTA complex. This indicates that elongated NTC/NTA proteins are the greatest protein influencers of spliceosome network architecture, surpassed as a group only by the globular protein Prp8.
Finally, we studied the relative contributions of essential and nonessential components of the NTC/NTA to the networks. Of 26 members of the NTC/NTA, eleven are nonessential for yeast growth under normal conditions while a twelfth (Cwc2) can be deleted if the yeast are grown at low temperatures (Hogg et al., 2014; Hogg et al., 2010). Despite being conditionally dispensable for splicing in yeast, nearly all of these nonessential proteins have human homologs. Surprisingly, the sum totals and averages for CEV for nonessential components of the NTC/NTA was comparable, and in some cases, higher to those of the essential proteins (Fig 5C, Table S1J). This shows that essentiality is not required for a factor to influence spliceosome network topology and that non-essential NTC/NTA components are collectively major contributors to network connectivity.
While non-essential components of the NTC/NTA are highly connected, they have low betweenness (CBW, Fig. 5D, Table S1K) meaning that they do not form many unique bridges between other splicing factors. In contrast, essential components of the NTC/NTA have much higher CBW. Thus, while essential and non-essential components both have similar numbers of connections to highly connected splicing factors (CEV), only the essential components are likely to make unique connections between factors. To examine this more closely, we compared CBW for several essential and nonessential factors (Fig. S5). Cef1, as previously mentioned, has higher than average CBW in all structures and is an essential factor. Clf1 and Syf1 show transient peaks in CBW. For Clf1, this is due its connections with factors that contact U2 snRNP components in the Bact network and for Syf1 this is due to its connections in the ILS network with both U2 snRNP components and Prp43, which are otherwise poorly integrated. In contrast, the non-essential factors Cwc2, Ecm2, Cwc15, and Bud31 exhibited average to below average CBW in each structure and mostly contact otherwise already highly connected components such as the U6 snRNA and Prp8. We conclude that non-essential factors of the NTC/NTA may be able to modulate splicing activity due to the number of connections they make to central factors and yet may be dispensable due redundancy in these interactions.
DISCUSSION
Our network analysis has revealed general features of the splicing machinery derived from analysis of structural models representing different stages of the reaction. While spliceosomes are assembled from distinct subunits, the modularity of spliceosomes decreases significantly as the subunits integrate and the active site form (Fig. 2A). snRNAs remain in distinct modules following activation; however, other members of snRNA-containing modules frequently change (Fig. 2B). In concert with module dynamics, the connectivity of the spliceosome oscillates significantly during the reaction and is most connected during catalytic steps and least connected while transitioning to or from these structures (Fig. 1B). This is in part due to peripheral association of some factors, like Prp22, with the spliceosome preceding formation of more stable and integrated interactions (Strittmatter et al., 2021), and these changes in associations are reflected in the network models. While spliceosome composition changes dramatically during the reaction, changes in network topology are not explained by just the addition or subtraction of factors. Rather, rearrangements in the interactions between splicing factors are major contributors to the centrality parameters. Despite these dynamics, connectivity of the spliceosome is highly focused at the active site (Fig. 4). This is due in part to the role of Prp8 in scaffolding the large regions of the spliceosome together but also to the contributions of a number of elongated, non-globular proteins with high connectivity (Fig. 5B).
A recent study calculated network betweenness of spliceosome C complex factors and supported a well-established function of Prp8 as key communication conduit within spliceosome (Brow, 2002; Saltalamacchia et al., 2020). Our analysis provides further insights. First, we identify the U2 snRNA and Cef1 as additional key network nodes (Fig. 3A) due to their interactions with otherwise poorly-connected peripheral factors. Second, the network parameters vary from one structure to the next even for highly central factors like Prp8 and U2 (Fig. 3B, C). This supports the notion that key communication conduits of the spliceosome form transiently and can influence the network, and splicing, at specific stages. This also contrasts with conclusions made from networks derived from STRING database entries that provided evidence for limited impact of perturbations due to spliceosome modularity (Guimarães et al., 2018). Network perturbations, such as removal of a node corresponding to the Ecm2 protein (Fig. S1D), have consequences dependent on the structure of the spliceosome from which the network was calculated.
Structural Networks Connect Betweenness with Lethality
Analysis of networks based on PPI databases (e.g., STRING) support a correlation between CEV of network components and the lethality of the corresponding gene deletion (i.e., proteins with a large number of connections to other proteins are more likely to be essential for viability; the “centrality-lethality rule”) (He and Zhang, 2006; Jeong et al., 2001). This correlation was also evident in spliceosome networks derived from the STRING database (Guimarães et al., 2018). However, these types of networks fail to account for the dynamic nature of many interactions and how structural transitions can lead to transiently high CBW for otherwise poorly connected factors. In our analysis of NTC/NTA components, we find essential and nonessential factors have similar CEV but differing CBW (Fig. 5). Thus, our work supports linking betweenness, or the uniqueness of connections rather than their numbers, with lethality. These results complement those obtained from networks derived from non-structural data and provide insights into relationships between network topologies and organism phenotypes.
Transient, Disassortative Interactions Influence the Catalytic Core
Once the spliceosome has assembled and formed its active site, networks tended to show high CEV and low CBW for factors located near the active site (Figs. 3, 4). The low CBW values for nodes near the active site also suggests a mechanism by which spliceosome networks can be reversibly expanded or contracted. Based on network theory, splicing factors transiently bound to the spliceosome periphery can strongly influence network topology by providing more paths to one region or bridging components that were not previously connected (Fig S1A). This results in asymmetrical or disassortative node interactions between poorly connected components and those of higher connectivity (Baingana and Giannakis, 2016; Newman, 2002). Networks with disassortative features, in turn, are particularly vulnerable to perturbation since incorporation of a single, peripheral factor can significantly impact the rest of network.
While disassortative spliceosome networks have previously been proposed based on biochemical and genetic interaction data (Pires et al., 2015), our work shows that such network features can also have a structural basis. Interactions between peripheral, poorly connected splicing factors with highly connected spliceosome core proteins, at specific steps, results in transient drops in algebraic connectivity (Fig 1B). Thus, the spliceosome network orients towards step-specific, peripheral factors, likely amplifying their influence on network communication. Notably, all of the spliceosome’s ATPases bind peripherally and exhibit disassortative interactions, consistent with their functions in altering spliceosome structure and influencing network topology (Staley and Guthrie, 1998; Yan et al., 2019). Further, biochemical crosslinking of two of these ATPases, Prp16 and Prp22, supports the notion that they are first recruited to form dynamic, weakly associated complexes with the spliceosome and this is followed by stable integration and association with specific regions of the RNA substrate (Strittmatter et al., 2021). Whether or not regulators responsible for activating or repressing the splicing reaction would similarly impact spliceosome networks is not yet known.
NTC/NTA Proteins are Highly Integrated and May Act as Regulatory Hubs
By comparison with splicing factors stably associated with the U snRNPs, several remarkable properties of the NTC/NTA proteins emerged from this network analysis. In the networks calculated here, the NTC/NTA never emerged as a distinct module. Instead, these factors dispersed throughout and transitioned between other modules (Fig. 2). This is consistent with a previous hypothesis based on functional assays in which the NTC/NTA no longer functions as a discrete complex after it has been integrated into the spliceosome (de Almeida and O’Keefe, 2015). The dispersion of the NTC/NTA is likely facilitated by the large number of NTC/NTA proteins which are elongated, rather than globular. Elongated peptide chains may permit formation of intermolecular contacts not possible with globular domains and this in turn leads to high CEV for these components (Fig. 5). While it is possible that the NTC/NTA at some point can be segregated as a distinct module of spliceosome networks, it must involve a conformation of the spliceosome not captured by the models studied here.
Several components of the NTC/NTA are believed to help structure the spliceosome active site and modulate its activity (Hogg et al., 2014; Hogg et al., 2010; Rasche et al., 2012; Saha et al., 2012; van der Feltz et al., 2021; Villa and Guthrie, 2005). In the case of Cwc2, it has also been proposed that due to high degree of connectivity among spliceosomal proteins, Cwc2 is capable of transmitting local changes in structure to the RNA catalytic core to modulate activity (Rasche et al., 2012). This possible function may not be limited to Cwc2 since our data shows that many NTC/NTA proteins possess high degrees of connectivity (Fig. 5) and function as network nodes that bridge distal regions of the spliceosome as well as link the spliceosome’s core to its exterior. Unexpectedly, this is not limited to just essential NTC/NTA components as many nonessential factors have similar or even higher CEV. Since many NTC/NTA proteins are also elongated, their higher surface areas may facilitate increased numbers of interactions solely due to the increase in available binding sites. It is possible that this facilitates regulation of spliceosomal activity since it may maximize the impact of mutation, isoform expression, or post-translational modification of a single, highly-connected factor. The use of disordered regions to increase surface area available for protein-protein interactions may be an evolutionary solution shared between spliceosomal and ribosomal factors since almost half of ribosomal proteins also contain elongated domains (Peng et al., 2014).
Finally, network modeling and topological data such as CEV and CBW may be useful for inferring which NTC/NTA components are regulating splicing at distinct stages and the pathways by which that regulation may be occurring. For example, we identified the NTC/NTA factor Prp45 as having transiently high values of CBW in Bact and ILS structures suggesting that it may be functioning as a regulatory node (Fig. 3). A regulatory function could result from Prp45 bridging factors that are unconnected in other structures. Prp45 has previously been shown to function in co-transcriptional spliceosome assembly, act in concert with the chromodomain protein Eaf3 to link chromatin state to spliceosome activation, and influence the recruitment of the Prp22 ATPase during the late stages of splicing (Gahura et al., 2009; Leung et al., 2019). Based on our networks, we would propose that Prp45 is regulating splicing during activation through its unique interactions with the RES complex and at later stages through similarly unique interactions with the NTR complex (Fig. S2).
Our structure-based network analysis of spliceosomes highlights not only the utility of network modeling of an individual macromolecular structure but also the insights that can be obtained by comparison of networks representing successive stages along a reaction trajectory. As additional spliceosome structures emerge, it is likely that these methods can be extended for comparative analysis of spliceosomes present in different organisms (Fica, 2020) or at equilibrium with one another along different points of a thermodynamic landscape (Haselbach et al., 2018). By reducing complex structures to network descriptors, this type of analysis may facilitate elucidation of the underlying interaction networks common to all spliceosomes and elaborated on by evolution.
STAR Methods
RESOURCE AVAILABILITY
Lead contact
Further information and requests for resources should be directed to and will be fulfilled by the lead contact, Aaron Hoskins (ahoskins@wisc.edu).
Materials availability
This study did not generate new unique reagents.
Data and code availability
This paper analyzes existing, publicly available data. These accession numbers of the datasets are listed in the key resources table.
KEY RESOURCES TABLE
All original code has been deposited at GitHub and is publicly available as of the date of publication (DOI: 10.5281/zenodo.5363321). The GitHub link is listed in the key resources table.
Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.
EXPERIMENTAL MODEL AND SUBJECT DETAILS
All data are generated from datasets provided in the key resources table.
METHOD DETAILS
Preparation of PDB Files for Network Analysis
Modeled structural coordinates for eight splicing complexes were downloaded from the RCSB PDB repository, PDB IDs: 5NRL, 5GM6, 6J6H, 6J6Q, 5LJ5, 5MQ0, 6EXN, and 5Y88. Multimeric Sm and Lsm rings as well as the Prp19 tetramer were treated as a single nodes in the network (Table S1A). The pre-mRNA was split based on modeled regions into individual chains and nodes corresponding to the 5’ splice site, 5’ exon, branch site, 3’ exon, intron lariat, or mRNA as appropriate for each structure.
Chains identified as unknown were deleted or re-assigned based on their location, the electron density map, and/or comparison with independently determined structures (Table S1A). In addition, a WD40 domain from a Prp19 monomer in C complex (PDB ID: 5LJ5) was reassigned to Prp17 based on the identical location of the protein domain in B*2 complex, where it is connected to Prp17.
In the Bact complex (PDB ID: 5GM6), nodes and edges were added to the network input parameters to connect the U2 snRNA with the U2 Sm ring, Lea1, and Msl1. In C* complex, edges were included in the network input parameters between Cef1 and Snt309 nodes as well as between the Cef1 and Prp19 nodes. In ILS complex (PDB ID: 5Y88), a portion of RNA could not be definitively modeled as either originating from the U6 snRNA or the intron and was therefore considered as its own unique node. Alanine-modeled regions were edited to remove interspersed glycines to permit analysis by PISA as described below. The total number of glycines removed from each structure is listed in Table S1A.
Spliceosome structural networks
Yeast spliceosomes structures were modeled as undirected networks, where RNA and proteins represent nodes, and intermolecular interactions among splicing factors define edges (Bastian et al., 2009). Using the edited PDB files, network edges were determined using PISA to identify any components with van der Waals surfaces closer together than 1.4 Å (i.e., interactions forming buried surfaces inaccessible to a probe of 1.4 Å) (Krissinel and Henrick, 2007b). To automate these processes, we built custom software for extracting network information from PISA (LouiseNET, see Methods S1). LouiseNET can be used with any user-specified PDB file.
Network Topology
Topological analysis of the spliceosome networks was performed using Gephi 0.9.2 (https://gephi.org). The global architectural features of the network, such as average degree, average clustering coefficient, and average path length were calculated using Gephi (Bastian et al., 2009). Algebraic connectivity is the second smallest eigenvalue of the Laplacian matrix and was calculated using custom Matlab scripts. Networks were visualized and figures of networks were generated using the Yifan Hu proportional layout algorithm with default parameters in Gephi (Bastian et al., 2009).
Modularity Analysis
Modularity analyses were performed on the adjacency matrix of each structure network, utilizing MODULAR (Marquitti et al., 2014). As previously described, a combination of a fast greedy and simulated annealing algorithms in MODULAR were used to calculate the modularity (Pires et al., 2015). The module distribution output from the fast greedy run was used as input for the simulated annealing algorithm. Protein and RNA assignments to different modules per structure were identified with MODULAR. Venn diagrams were generated with VennDiagram (Chen and Boutros, 2011).
Mathematical Descriptions of Calculated Network Parameters
Further descriptions of the mathematical equations underlying calculation of the parameters and analyses described above can be found in Methods S1.
Centrality and Betweenness Analysis
Betweenness and eigenvector centralities were calculated using MatLab (MathWorks) following the formalized definitions found in (Newman, 2010). In order to compare centralities between networks, values for each network were normalized such that the sum of all centrality scores was 1 (Carley and Kim, 2008). Centrality values were mapped onto structures and displayed with UCSF Chimera (Pettersen et al., 2004).
QUANTIFICATION AND STATISTICAL ANALYSIS
This study does not require statistical analysis or software. Details of the parameters used for network analysis and quantification are included in Table S1.
ADDITIONAL RESOURCES
This study has not generated or contributed to a new website/forum and is not part of a clinical trial.
Supplementary Material
Table S1. Supplemental Data. (S1A) Unresolved regions and glycine to alanine substitutions for each complex. Related to Figure 1A. (S1B) Network topological parameters. Related to Figure 1B. (S1C) Components of the U2 module in each network model. Related to Figure 2. (S1D) Components of the U6 module in each network model. Related to Figure 2. (S1E) Components of the U5 in each network model. Related to Figure 2. (S1F) Components in the non-snRNA module in each network model. Related to Figure 2. (S1G) Degree and Betweenness differences between complexes for select components. Related to Figure 3B. (S1H) Distances between the spliceosome active site and the structural center of mass. Related to Figure S3. (S1I) Elongated proteins identified in each complex. Related to Figure 5A. (S1J) Average Cev for essential and nonessential NTC/NTA factors in each complex. Related to Figure 5C. (S1K) Average CBW for essential and nonessential NTC/NTA factors in each complex. Related to Figure 5D.
Highlights.
Network models were built from spliceosome structures captured at different steps
Spliceosome structural dynamics result in step-specific network fluctuations
Network connectivity is centered around the spliceosome active site
Betweenness parameters correlate with genetic lethality for splicing factors
INCLUSION AND DIVERSITY.
One or more of the authors of this paper self-identifies as a member of the LGBTQ+ community.
ACKNOWLEDGEMENTS
Thank you to Charles Schneider for help generating the Venn diagram in Fig. 2C. We thank Tim Grant and members of the Butcher and Brow laboratories for helpful discussions. We also thank Tim Grant, Charles Query, Samuel Butcher, Sierra Love, Pablo Alcón, and Terence Tang for comments on the manuscript. This work was supported by the National Institutes of Health (R35 GM136261 to AAH).
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
DECLARATION OF INTERESTS
AAH is a scientific advisor for Remix Therapeutics.
REFERENCES
- Athanasios A, Charalampos V, and Vasileios T (2017). Protein-protein interaction (PPI) network: recent advances in drug discovery. Current Drug Metabolism 18, 5–10. [DOI] [PubMed] [Google Scholar]
- Baingana B, and Giannakis GB (2016). Joint Community and Anomaly Tracking in Dynamic Networks. IEEE Transactions on Signal Processing 46, 2013–2025. [Google Scholar]
- Bastian M, Heymann S, and Jacomy M (2009). Gephi: an open source software for exploring and manipulating networks. International AAAI Conference on Web and Social Media. [Google Scholar]
- Brinda K, and Vishveshwara S (2005). A network representation of protein structures: implications for protein stability. Biophysical Journal 89, 4159–4170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brow DA (2002). Allosteric cascade of spliceosome activation. Annu Rev Genet 36, 333–360. [DOI] [PubMed] [Google Scholar]
- Brun C, Chevenet F, Martin D, Wojcik J, Guénoche A, and Jacq B (2003). Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network. Genome Biology 5, R6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carbonell C, Ulsamer A, Vivori C, Papasaikas P, Bottcher R, Joaquin M, Minana B, Tejedor JR, de Nadal E, Valcarcel J, and Posas F (2019). Functional Network Analysis Reveals the Relevance of SKIIP in the Regulation of Alternative Splicing by p38 SAPK. Cell Rep 27, 847–859 e846. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carley K, and Kim E (2008). Random Graph Standard Network Metrics Distributions in Ora. SSRN.
- Chen H, and Boutros PC (2011). VennDiagram: a package for the generation of highly-customizable Venn and Euler diagrams in R. BMC Bioinformatics 12, 35. [DOI] [PMC free article] [PubMed] [Google Scholar]
- David-Eden H, and Mandel-Gutfreund Y (2008). Revealing unique properties of the ribosome using a network based analysis. Nucleic Acids Research 36, 4641–4652. [DOI] [PMC free article] [PubMed] [Google Scholar]
- de Almeida RA, and O’Keefe RT (2015). The NineTeen Complex (NTC) and NTC-associated proteins as targets for spliceosomal ATPase action during pre-mRNA splicing. RNA Biol 12, 109–114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fanelli F, Felline A, Raimondi F, and Seeber M (2016). Structure network analysis to gain insights into GPCR function. Biochem Soc Trans 44, 613–618. [DOI] [PubMed] [Google Scholar]
- Fica SM (2020). Cryo-EM snapshots of the human spliceosome reveal structural adaptions for splicing regulation. Curr Opin Struct Biol 65, 139–148. [DOI] [PubMed] [Google Scholar]
- Fica SM, Oubridge C, Galej WP, Wilkinson ME, Bai XC, Newman AJ, and Nagai K (2017). Structure of a spliceosome remodelled for exon ligation. Nature 542, 377–380. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gahura O, Abrhamova K, Skruzny M, Valentova A, Munzarova V, Folk P, and Puta F (2009). Prp45 affects Prp22 partition in spliceosomal complexes and splicing efficiency of non-consensus substrates. J Cell Biochem 106, 139–151. [DOI] [PubMed] [Google Scholar]
- Galej WP, Wilkinson ME, Fica SM, Oubridge C, Newman AJ, and Nagai K (2016). Cryo-EM structure of the spliceosome immediately after branching. Nature 537, 197–201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guimarães PR Jr., Pires MM, Cantor M, and Coltri PP (2018). Interaction paths promote module integration and network-level robustness of spliceosome to cascading effects. Sci Rep 8, 17441. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hagberg A, Swart P, and S Chult D (2008). Exploring network structure, dynamics, and function using NetworkX. United States. https://www.osti.gov/servlets/purl/960616. [Google Scholar]
- Hasan S, Bonde BK, Buchan NS, and Hall MD (2012). Network analysis has diverse roles in drug discovery. Drug Discovery today 17, 869–874. [DOI] [PubMed] [Google Scholar]
- Haselbach D, Komarov I, Agafonov DE, Hartmuth K, Graf B, Dybkov O, Urlaub H, Kastner B, Lührmann R, and Stark H (2018). Structure and Conformational Dynamics of the Human Spliceosomal B(act) Complex. Cell 172, 454–464.e411. [DOI] [PubMed] [Google Scholar]
- He X, and Zhang J (2006). Why do hubs tend to be essential in protein networks? PLoS Genet 2, e88. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hogg R, de Almeida RA, Ruckshanthi JP, and O’Keefe RT (2014). Remodeling of U2-U6 snRNA helix I during pre-mRNA splicing by Prp16 and the NineTeen Complex protein Cwc2. Nucleic Acids Res 42, 8008–8023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hogg R, McGrail JC, and O’Keefe RT (2010). The function of the NineTeen Complex (NTC) in regulating spliceosome conformations and fidelity during pre-mRNA splicing. Biochem Soc Trans 38, 1110–1115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jamakovic A, and Van Mieghem P (2008). On the robustness of complex networks by using the algebraic connectivity. (Springer; ), pp. 183–194. [Google Scholar]
- Jeong H, Mason SP, Barabasi AL, and Oltvai ZN (2001). Lethality and centrality in protein networks. Nature 411, 41–42. [DOI] [PubMed] [Google Scholar]
- Kastner B, Will CL, Stark H, and Lührmann R (2019). Structural insights into nuclear pre-mRNA splicing in higher eukaryotes. Cold Spring Harbor perspectives in biology 11, a032417. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Krissinel E, and Henrick K (2007a). Inference of macromolecular assemblies from crystalline state. J Mol Biol 372, 774–797. [DOI] [PubMed] [Google Scholar]
- Krissinel E, and Henrick K (2007b). Protein interfaces, surfaces and assemblies service PISA at European Bioinformatics Institute. J Mol Biol 372, 774–797. [DOI] [PubMed] [Google Scholar]
- Leung CS, Douglass SM, Morselli M, Obusan MB, Pavlyukov MS, Pellegrini M, and Johnson TL (2019). H3K36 Methylation and the Chromodomain Protein Eaf3 Are Required for Proper Cotranscriptional Spliceosome Assembly. Cell Rep 27, 3760–3769 e3764. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marquitti FMD, Guimaraes PR Jr, Pires MM, and Bittencourt LF (2014). MODULAR: software for the autonomous computation of modularity in large network sets. Ecography 37, 221–224. [Google Scholar]
- Menichetti G, Fariselli P, and Remondini D (2016). Network measures for protein folding state discrimination. Scientific reports 6, 30367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Miryala SK, Anbarasu A, and Ramaiah S (2018). Discerning molecular interactions: a comprehensive review on biomolecular interaction databases and network analysis tools. Gene 642, 84–94. [DOI] [PubMed] [Google Scholar]
- Negre CF, Morzan UN, Hendrickson HP, Pal R, Lisi GP, Loria JP, Rivalta I, Ho J, and Batista VS (2018). Eigenvector centrality for characterization of protein allosteric pathways. Proceedings of the National Academy of Sciences 115, E12201–E12208. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Newman ME (2002). Assortative mixing in networks. Physical review letters 89, 208701. [DOI] [PubMed] [Google Scholar]
- Newman MEJ (2010). Networks : an introduction (Oxford University Press; ). [Google Scholar]
- Papasaikas P, Tejedor JR, Vigevani L, and Valcárcel J (2015). Functional splicing network reveals extensive regulatory potential of the core spliceosomal machinery. Mol Cell 57, 7–22. 1 [DOI] [PubMed] [Google Scholar]
- Peng Z, Oldfield CJ, Xue B, Mizianty MJ, Dunker AK, Kurgan L, and Uversky VN (2014). A creature with a hundred waggly tails: intrinsically disordered proteins in the ribosome. Cell Mol Life Sci 71, 1477–1504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, and Ferrin TE (2004). UCSF Chimera--a visualization system for exploratory research and analysis. J Comput Chem 25, 1605–1612. [DOI] [PubMed] [Google Scholar]
- Pintilie G, Zhang K, Su Z, Li S, Schmid MF, and Chiu W (2020). Measurement of atom resolvability in cryo-EM maps with Q-scores. Nat Methods 17, 328–334. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pires MM, Cantor M, Guimarães PR, De Aguiar MA, Dos Reis SF, and Coltri PP (2015). The network organization of protein interactions in the spliceosome is reproduced by the simple rules of food-web models. Scientific Reports 5, 14865. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Plaschka C, Lin PC, and Nagai K (2017). Structure of a pre-catalytic spliceosome. Nature 546, 617–621. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Plaschka C, Newman AJ, and Nagai K (2019). Structural basis of nuclear pre-mRNA splicing: Lessons from yeast. Cold Spring Harbor perspectives in biology 11, a032391. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ramrath DJF, Niemann M, Leibundgut M, Bieri P, Prange C, Horn EK, Leitner A, Boehringer D, Schneider A, and Ban N (2018). Evolutionary shift toward protein-based architecture in trypanosomal mitochondrial ribosomes. Science 362, eaau7735. [DOI] [PubMed] [Google Scholar]
- Rasche N, Dybkov O, Schmitzova J, Akyildiz B, Fabrizio P, and Lührmann R (2012). Cwc2 and its human homologue RBM22 promote an active conformation of the spliceosome catalytic centre. EMBO J 31, 1591–1604. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Saha D, Khandelia P, O’Keefe RT, and Vijayraghavan U (2012). Saccharomyces cerevisiae NineTeen complex (NTC)-associated factor Bud31/Ycr063w assembles on precatalytic spliceosomes and improves first and second step pre-mRNA splicing efficiency. J Biol Chem 287, 5390–5399. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Saltalamacchia A, Casalino L, Borišek J, Batista VS, Rivalta I, and Magistrato A (2020). Decrypting the information exchange pathways across the spliceosome machinery. Journal of the American Chemical Society 142, 8403–8411. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shao Q, Gong W, and Li C (2020a). A study on allosteric communication in U1A-snRNA binding interactions: Network analysis combined with molecular dynamics data. Biophysical Chemistry 264, 106393. [DOI] [PubMed] [Google Scholar]
- Staley JP, and Guthrie C (1998). Mechanical devices of the spliceosome: Motors, clocks, springs, and things. Cell 92, 315–326. [DOI] [PubMed] [Google Scholar]
- Strittmatter LM, Capitanchik C, Newman AJ, Hallegger M, Norman CM, Fica SM, Oubridge C, Luscombe NM, Ule J, and Nagai K (2021). psiCLIP reveals dynamic RNA binding by DEAH-box helicases before and after exon ligation. Nat Commun 12, 1488. [DOI] [PMC free article] [PubMed] [Google Scholar]
- van der Feltz C, Nikolai B, Schneider C, Paulson JC, Fu X, and Hoskins AA (2021). Saccharomyces cerevisiae Ecm2 Modulates the Catalytic Steps of pre-mRNA Splicing. RNA 27, 591–603. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vendruscolo M, Dokholyan NV, Paci E, and Karplus M (2002). Small-world view of the amino acids that play a key role in protein folding. Physical Review E 65, 061910. [DOI] [PubMed] [Google Scholar]
- Villa T, and Guthrie C (2005). The Isy1p component of the NineTeen complex interacts with the ATPase Prp16p to regulate the fidelity of pre-mRNA splicing. Genes Dev 19, 1894–1904. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wahl MC, Will CL, and Lührmann R (2009). The spliceosome: design principles of a dynamic RNP machine. Cell 136, 701–718. [DOI] [PubMed] [Google Scholar]
- Wan R, Bai R, Yan C, Lei J, and Shi Y (2019). Structures of the catalytically activated yeast spliceosome reveal the mechanism of branching. Cell 177, 339–351.e313. [DOI] [PubMed] [Google Scholar]
- Wan R, Yan C, Bai R, Lei J, and Shi Y (2017). Structure of an intron lariat spliceosome from Saccharomyces cerevisiae. Cell 171, 120–132.e112. [DOI] [PubMed] [Google Scholar]
- Wilkinson ME, Fica SM, Galej WP, Norman CM, Newman AJ, and Nagai K (2017). Postcatalytic spliceosome structure reveals mechanism of 3’-splice site selection. Science 358, 1283–1288. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Williams CJ, Headd JJ, Moriarty NW, Prisant MG, Videau LL, Deis LN, Verma V, Keedy DA, Hintze BJ, Chen VB, et al. (2018). MolProbity: More and better reference data for improved all-atom structure validation. Protein Sci 27, 293–315. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yan C, Wan R, Bai R, Huang G, and Shi Y (2016). Structure of a yeast activated spliceosome at 3.5 A resolution. Science 353, 904–911.. [DOI] [PubMed] [Google Scholar]
- Yan C, Wan R, and Shi Y (2019). Molecular mechanisms of pre-mRNA splicing through structural biology of the spliceosome. Cold Spring Harbor perspectives in biology 11, a032409. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yan W, Hu G, Liang Z, Zhou J, Yang Y, Chen J, and Shen B (2018). Node-weighted amino acid network strategy for characterization and identification of protein functional residues. Journal of Chemical Information and Modeling 58, 2024–2032. [DOI] [PubMed] [Google Scholar]
- Yan W, Zhou J, Sun M, Chen J, Hu G, and Shen B (2014). The construction of an amino acid network for understanding protein structure and function. Amino Acids 46, 1419–1439. [DOI] [PubMed] [Google Scholar]
- Yu D, Kim M, Xiao G, and Hwang TH (2013). Review of biological network data and its applications. Genomics & Informatics 11, 200. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Table S1. Supplemental Data. (S1A) Unresolved regions and glycine to alanine substitutions for each complex. Related to Figure 1A. (S1B) Network topological parameters. Related to Figure 1B. (S1C) Components of the U2 module in each network model. Related to Figure 2. (S1D) Components of the U6 module in each network model. Related to Figure 2. (S1E) Components of the U5 in each network model. Related to Figure 2. (S1F) Components in the non-snRNA module in each network model. Related to Figure 2. (S1G) Degree and Betweenness differences between complexes for select components. Related to Figure 3B. (S1H) Distances between the spliceosome active site and the structural center of mass. Related to Figure S3. (S1I) Elongated proteins identified in each complex. Related to Figure 5A. (S1J) Average Cev for essential and nonessential NTC/NTA factors in each complex. Related to Figure 5C. (S1K) Average CBW for essential and nonessential NTC/NTA factors in each complex. Related to Figure 5D.
Data Availability Statement
This paper analyzes existing, publicly available data. These accession numbers of the datasets are listed in the key resources table.
KEY RESOURCES TABLE
All original code has been deposited at GitHub and is publicly available as of the date of publication (DOI: 10.5281/zenodo.5363321). The GitHub link is listed in the key resources table.
Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.
