Skip to main content
Nature Portfolio logoLink to Nature Portfolio
. 2023 May 3;617(7960):409–416. doi: 10.1038/s41586-023-06019-2

Structural atlas of a human gut crassvirus

Oliver W Bayfield 1,, Andrey N Shkoporov 2, Natalya Yutin 3, Ekaterina V Khokhlova 2, Jake L R Smith 1, Dorothy E D P Hawkins 1, Eugene V Koonin 3, Colin Hill 2, Alfred A Antson 1,
PMCID: PMC10172136  PMID: 37138077

Abstract

CrAssphage and related viruses of the order Crassvirales (hereafter referred to as crassviruses) were originally discovered by cross-assembly of metagenomic sequences. They are the most abundant viruses in the human gut, are found in the majority of individual gut viromes, and account for up to 95% of the viral sequences in some individuals14. Crassviruses are likely to have major roles in shaping the composition and functionality of the human microbiome, but the structures and roles of most of the virally encoded proteins are unknown, with only generic predictions resulting from bioinformatic analyses4,5. Here we present a cryo-electron microscopy reconstruction of Bacteroides intestinalis virus ΦcrAss0016, providing the structural basis for the functional assignment of most of its virion proteins. The muzzle protein forms an assembly about 1 MDa in size at the end of the tail and exhibits a previously unknown fold that we designate the ‘crass fold’, that is likely to serve as a gatekeeper that controls the ejection of cargos. In addition to packing the approximately 103 kb of virus DNA, the ΦcrAss001 virion has extensive storage space for virally encoded cargo proteins in the capsid and, unusually, within the tail. One of the cargo proteins is present in both the capsid and the tail, suggesting a general mechanism for protein ejection, which involves partial unfolding of proteins during their extrusion through the tail. These findings provide a structural basis for understanding the mechanisms of assembly and infection of these highly abundant crassviruses.

Subject terms: Cryoelectron microscopy, Phage biology, Microbiome, Classification and taxonomy, Virus structures


A cryo-electron microscopy reconstruction of the virus ΦcrAss001 provides insights into the functions of the viral gene products in capsid assembly and infection.

Main

Viruses infecting bacteria are thought to be the most abundant biological entities on earth. Until recently, all such viruses were identified after growing on cultivatable bacterial hosts, but these are likely to account for less than 5% of the actual global diversity79. Recent analyses of viral ‘dark matter’, made possible by cross-assembly of metagenomic data, have led to the discovery of many new viruses, including crAssphage—the most abundant virus in the human microbiome—and related viruses comprising the newly established order Crassvirales2,4,5,10. These viruses have been closely associated with human populations throughout evolution11. Crassviruses appear to infect exclusively diverse members of the bacterial phylum Bacteroidota3,8,11,12. In healthy adult Western cohorts, crassviruses are detected in 98–100% of individuals, often dominating the faecal virome3. The virus families Flandersviridae, Quimbyviridae and Gubaphage have been identified as close contenders with the Crassvirales in terms of their detection frequency8,12. However, Crassvirales appear to be unique in their showing both a high abundance in individual viromes and across varied cohorts globally3,8,11,12. Some crassviruses appear to establish a form of carrier-state infection, with delayed lysis of the infected bacteria, and appear to follow a piggyback-the-winner virus–host dynamic6,13. How this lifestyle contributes to their prevalence and abundance remains unclear.

Virus ΦcrAss001 (Kehishuvirus primarius) was the first representative of this order to be isolated in pure culture from human faecal samples6. Here, we present the characterization of ΦcrAss001 using cryo-electron microscopy (cryo-EM), enabling de novo structure building and elucidation of functions for around 1,440 protein subunits constituting the virion, leading to the identification of a novel ‘crass fold’ in the distal part of the tail. The structure reveals a mechanism by which a large complement of virally encoded cargo proteins, representing a total molecular mass of approximately 5.5 MDa, can be stored in the capsid and tail. The presence of one such cargo protein inside both the capsid and tail suggests a mechanism for protein ejection that takes place before complete ejection of genomic DNA and involves partial unfolding of cargo proteins. Sequence analysis revealed the extent of conservation of virion proteins throughout the Crassvirales, providing a structure–function atlas for this viral order.

Linking virion and genome architectures

The structure of the ΦcrAss001 virion was determined using single-particle 3D reconstructions from electron microscopy data (Fig. 1a,b). Reconstructions were determined to a resolution that enabled building atomic models for most of the proteins that constitute the virion (Fig. 1c). The virion possesses a short non-contractile tail extending approximately 44 nm from an icosahedral capsid. Inside the capsid, double-stranded DNA (dsDNA) is packed with a left-handed supercoiling (Fig. 1c, left), and at high density leading to hexagonal close-packing (Fig. 1c, right). Moreover, the reconstruction reveals the structural hallmark proteins common to all crassviruses, and sequence analysis enabled mapping of the virion proteins that are most broadly conserved across this viral order (Fig. 1d). In particular, no homologues of the capsid auxiliary protein and the tail muzzle protein were detected among the currently characterized structures of viral proteins.

Fig. 1. Structure of the ΦcrAss001 virion and functional assignments.

Fig. 1

a,b, Electron micrographs of virions recorded using negative staining (a; n = 16 micrographs) and under cryogenic conditions (b; n = 44,006 micrographs). Scale bars, 100 nm. c, Molecular surface of virion reconstruction viewed from outside (left) and z-clipped (right), coloured by gene product. DNA (grey) on the z-clipped image is depicted either as an outer layer lining the capsid wall (top left) or clipped (right), or is removed (bottom left). Unmodelled regions are displayed as map, lowpass filtered to 8 Å (parts of head fibre proteins and DNA) and to 16 Å (tail fibres, semi-transparent grey). AUX, auxiliary capsid protein; C1 and C2, cargo proteins; MUZ, muzzle protein; POR, portal protein; PVA, portal vertex auxiliary capsid protein; R1–R4, ring proteins; THA and THB, tail hub proteins. Numbers in brackets indicate the protein copy number in the virion. d, Top, the ΦcrAss001 genome (accession: NC_049977.1) region 11643–68717 showing open reading frames as arrows coloured corresponding to gene products (gp) in c, scaled according to gene length. Bottom, gene conservation, with the length of each bar representing the fraction of each crassvirus group in the order Crassvirales containing a detectable homologue, coloured corresponding to c. Names on the left refer to the virus families (Intestiviridae, Crevaviridae, Steigviridae and Suoliviridae) and groups (zeta and epsilon) with individual members from non-human gut environments represented by ProJPt-BP1 (termite gut), Fpv3 (fish farm) and Φ14:2, Φ17:2 and Φ13:2 (marine environments).

The capsid proteins

The icosahedral reconstruction of the capsid, which was determined to a resolution of 3.01 Å (Extended Data Table 1), has an outer diameter of approximately 88 nm. The capsid has a triangulation number (T) of nine, with an asymmetric unit comprising nine copies of the major capsid protein (MCP) gp32, nine copies of the capsid auxiliary protein gp36, one copy of the trimeric head fibre protein (HFT) gp21, and two copies of the dimeric head fibre protein (HFD) gp29 (Fig. 2a).

Extended Data Table 1.

Data collection and refinement

graphic file with name 41586_2023_6019_Tab1_ESM.jpg

Data collection parameters are shown for different data sets, with refinement statistics shown for different assemblies.

Fig. 2. ΦcrAss001 capsid proteins.

Fig. 2

a, Molecular surface of the asymmetric subunit of the icosahedral capsid with indicated symmetry axes (grey shapes), with major capsid protein gp32 (yellow), capsid auxiliary protein gp36 (dark blue), head fibre proteins gp21 (HFT; pale blue, trimer) and gp29 (HFD; pink, dimer). The triangle indicates a local quasi-C3 symmetry formed by I and PH domains. b,c, Ribbon diagrams of the major capsid protein subunit (b) and the capsid auxiliary protein gp32 (c). d, Molecular surface of the C3 capsid hexon. e, Ribbon diagram of the HFT trimer. f, Molecular surface of the skewed hexon. g, Ribbon diagram of the HFD dimer. Numbers indicate terminal residues of the model.

The MCP adopts the HK97-like fold—the canonical fold of the capsid proteins of the viruses in the class Caudovirecetes14, with the main body comprising the A-domain and P-domain (Fig. 2b). A β-barrel insertion in the E-loop, composed of residues 79–180 (I-domain), is a widespread feature of the crassviruses5. This insertion domain forms a three-fold symmetric assembly with the auxiliary protein at the capsid surface (Fig. 2a and Extended Data Fig. 1), making up much of the exposed surface of the capsid and contributing substantially to the capsid wall thickness and stability. An insertion at a similar position of the E-loop, albeit of a different fold (comprising one helix and six strands), is present in the major capsid protein of bacteriophage T415. The ΦcrAss001 I-domain is structurally similar to the I-domain of the major capsid protein of phage P22 (DALI Z-score of 6.1 and root mean square deviation (r.m.s.d.) of 2.7 Å with Protein Data Bank (PDB) entry 5UU5 chain E). However, in P22, this domain (residues 226–344) is part of the A-domain, rather than the E-loop.

Extended Data Fig. 1. Capsid wall viewed along the local three-fold symmetry axis.

Extended Data Fig. 1

Molecular surface (left) and diagram of protein layers (right). (a) Inner capsid layer formed by three major capsid protein subunits contributed from three adjacent capsomers, orange; (b) Middle layer containing three additional overlaying major capsid protein subunits, yellow; (c) Outer layer containing three overlaying auxiliary capsid proteins, blue. P, peripheral domain of the major capsid protein. I, insertion domain of the major capsid protein. PH, pleckstrin-homology domain of the auxiliary capsid protein. G, G-loop of the major capsid protein.

The auxiliary protein gp36 has a two-domain architecture (Fig. 2c). The PH domain is composed of residues 1–74 and 238–333 and shows structural similarity to the pleckstrin homology-like domain (DALI Z-score of 5.7 and r.m.s.d. of 3.2 Å with PDB entry 4DX9, belonging to CATH superfamily 2.30.29.30). The FD domain is composed of residues 75–237 and is structurally similar to the Helicobacter pylori flagellar cap protein FliD (DALI Z-score 7.3 and r.m.s.d. 3.4 Å with PDB entry 6IWY chain A) and Pseudomonas aeruginosa flagellin FliC (DALI Z-score 6.7 and r.m.s.d. 3.2 Å with PDB entry 4NX9 chain A). To our knowledge, auxiliary proteins with such a fold and size have not been observed in viral capsids, and they represent a unique feature of most of the crassviruses (Fig. 1d).

The FD domain of the auxiliary protein interacts with the A-domain of the MCP, around the periphery of each hexamer and pentamer. Each hexamer can adopt one of two conformations: winged or planar (Fig. 2d,e). Winged hexons span the edges of the icosahedron, on both sides of the icosahedral two-fold symmetry axes, whereas planar hexons are positioned at the three-fold symmetry axes. At the centre of each hexon is a pocket formed by the MCP and auxiliary protein, which serves as a binding site for fibre proteins. A trimer of the head fibre protein gp21 (residues 1–38) binds in the planar hexon pocket at the three-fold symmetric site (Fig. 2d), whereas a dimer of the capsid fibre protein gp29 (residues 1–58) binds in the winged hexon (Fig. 2e). Thus, the T number of the capsid is responsible for generating two distinct hexon-pocket environments, which are specific to two different oligomeric fibres. These two fibres are likely to afford two different binding specificities to external molecules located either on the surface of the bacterial host cell or on the surface of the gut. Homologues of these fibres are present in Intestiviridae (alpha), Steigviridae (beta) and Suoliviridae (delta) groups of crassviruses in the case of the HFT gp21, and in the Steigviridae (beta) and Crevaviridae (gamma) groups in the case of the HFD gp29 (Fig. 1d). Homologues of the C-terminal domain of gp21 (residues 42–97) with high sequence similarity were detected in viruses outside of the Crassvirales, including gp8.5 of Bacillus phage ϕ29 (40% identity with residues 226–280, and DALI Z-score of 4.3 and r.m.s.d. of 3.2 Å between PDB 3QC7 chain A and gp21 AlphaFold structure prediction) (Extended Data Fig. 2). Gp8.5 has been suggested to bind to cell wall factors of Gram-positive bacteria16. AlphaFold modelling showed that the C-terminal domain of gp29 (residues 58–118) forms a β-sandwich (Extended Data Fig. 2) with structural similarity to the immunoglobulin-like fold (DALI Z-score of 6.8 and r.m.s.d. of 2.1 Å with PDB 2NSM chain A; CATH superfamily 2.60.40.1120). A DALI search for the N-terminal domains of gp21 and gp29 did not detect any significantly similar structures. Although most groups of crassviruses did not appear to contain homologues of gp21 and gp29, as yet uncharacterized proteins might perform analogous roles.

Extended Data Fig. 2. Models of the head fiber proteins.

Extended Data Fig. 2

(a–c), head fiber gp21 trimer. (d–f), head fiber gp29 dimer. (a) Extracted density fitted with the gp21model; shown in two orthogonal views. (b) Ribbon diagram with domain structure deduced from cryo-EM density in blue and domain structure predicted by AlphaFold in grey. (c) same as B but with only one monomer shown. (d–f) same as (a-c) but for gp29, with the domain structure derived from cryo-EM density shown in pink. Scale bars 20 Å.

The portal vertex

One vertex of the capsid contains the dodecameric oligomer of portal protein gp20. The portal protein is considered the nucleation point for capsid assembly and serves as the docking site for the tail1720. The portal is surrounded by the five-fold symmetric capsid vertex, contacting 10 major capsid protein subunits. The majority of the portal protein oligomer exhibits 12-fold symmetry and the localized reconstruction imposing C12 symmetry enabled structure determination at 3.09 Å resolution (Fig. 3). The majority of the portal protein residues, except for some flexible loop segments (33–35, 68–71, 269–324 and 550–563), could be confidently modelled (Fig. 3a,b). The portal protein exhibits a canonical fold21 (Fig. 3a), but with a long C-terminal barrel of the oligomer resembling the corresponding domain of the P22 portal protein22.

Fig. 3. The portal protein and the surrounding capsid vertex.

Fig. 3

a, Two opposing subunits of the portal protein (light blue) and the C1 cargo protein (purple) depicted as ribbons. b, Dodecameric assembly of the proteins shown in a, viewed along the channel axis. c, The portal protein wing region (light blue) interacting with C1 cargo protein subunits (purple). d,e, Molecular surface of the portal-containing capsid vertex, comprising major capsid protein gp32 (yellow), auxiliary capsid protein gp36 (dark blue) and portal protein gp20 (ribbon, light blue), viewed from inside the capsid along the central channel (d) and from outside the capsid (e) with the portal vertex auxiliary protein gp57 in green and the head fibre protein gp21 in pale blue. f, Ribbon diagram of the portal protein, depicting 12 subunits superimposed (left) and the oligomer viewed along the central axis (right) with capsid-adaptable loops shown in light blue and red corresponding to conformations A and B, respectively. g, Unit of the C5 reconstructed portal-containing capsid vertex, with the gp57 (PVA) protein dimer shown as green ribbon. h, The gp57 dimer with inter-subunit disulfide bonds show in yellow. Numbers indicate terminal residues of the model.

At the top of the portal protein, an extended α-helical barrel (C-terminal residues 672–739) extends around 80 Å towards the core of the capsid (Fig. 3a). This extended feature has an internal van der Waals diameter of around 45 Å at the base of the crown and around 25 Å at its exposed end. This barrel is well positioned to direct DNA into the capsid during packaging, so that DNA enters at the central part of the capsid towards the opposite capsid wall.

The main body of the portal protein beneath the Crown domain is made up of the stem domain, composed of α-helices defining its central tunnel, and the wing domain (Fig. 3a). Tunnel loops, which are not defined in maps owing to their apparent flexibility, connect the inner stem helix (residues 519–541) with a 34-residue-long kinked helix that is part of the wing (Fig. 2b). These tunnel loops (residues 550–563) are likely to form a narrow constriction in the portal tunnel, as observed in portal proteins of other viruses18,21,2325.

At the surface of the portal wing, there is clear density corresponding to residues 425–552 of the gp45 cargo protein C1 (Fig. 3b,c). Twelve such segments of gp45 interdigitate with wing loops (SH3-like domains) of the portal, residues 337–380 (Fig. 3b,c).

The 12-fold symmetrical portal oligomer sits within the 5-fold symmetrical capsid vertex (Fig. 3d,e). The localized asymmetric reconstruction of the portal–capsid vertex (obtained without imposing symmetry), determined at 3.60 Å resolution, showed that a loop at the peripheral area of the wing domain, residues 268–319, is flexible and can adopt two preferred conformations (types A and B) in interacting with the C5-symmetric capsid vertex (Fig. 3f). In the type A conformation, residues 277–290 of this ‘capsid-adaptable loop’ are oriented radially away from the portal’s tunnel axis, whereas in the type B conformation residues 292–306 of the loop are arranged tangentially to the portal’s circumference (Fig. 3f). In 2 of the 12 subunits of the portal protein, these loops were not resolved in either conformation (denoted ‘x’), resulting in the A-B-x-A-B-A-B-A-x-B-A-B order of the conformational states, counting clockwise if viewed from outside of the capsid (Fig. 3f). The result is a pseudo-five-fold symmetrical arrangement of capsid-adaptable loops around the 12-subunit oligomer, assisting the formation of optimal interactions with the five-fold symmetric capsid vertex.

The surrounding capsid vertex was also reconstructed with C5-symmetrical averaging to a resolution of 3.30 Å (Fig. 3g). The portal oligomer positioned in the vertex occludes the binding site of the capsid auxiliary protein gp36, which is substituted by two copies of the portal vertex specific auxiliary protein, gp57 (Fig. 3g,h). The gp57 dimer has approximate two-fold symmetry, reinforced by disulfide bonds between residues 47–47 and 25–88 of the two subunits (Fig. 3h). Notably, hexameric capsomers bordering the unique vertex contain trimer capsid fibres of gp21 (Fig. 3e), in contrast to the hexamers bordering other pentameric vertices, which contain dimeric capsid fibres of gp29.

Tail barrel assembly and fibre hubs

The portal protein’s clip domain interacts with the tail barrel, which extends approximately 23 nm towards the muzzle protein complex at the tip of the tail (Fig. 4). The barrel was reconstructed by imposing C12 symmetry to a resolution of 3.20 Å, and with C6 symmetry to a resolution of 3.30 Å. The barrel is composed of five stacked dodecameric ‘rings’, termed ring 1 (R1, at the portal end) to ring 5 (R5, at the muzzle end) (Fig. 4a). These structures are formed by four paralogous ring proteins: R1, R2 and R3 are encoded by genes gp43, gp40 and gp35, respectively, whereas R4 and R5 are both encoded by gp34 (Fig. 4b). Such a multi-ring barrel structure results in ΦcrAss001 having a much longer tail than typically observed in other podoviruses, which have only been observed to possess one such ring formed of one protein. The ring proteins share highly similar folds and consist of two domains, with three structurally conserved α-helices and a β-sandwich domain (Fig. 4b). In R5 (gp34), residues 15–26 adopt 2 conformations—5a and 5b—that facilitate binding to the 6-fold symmetric muzzle complex (Fig. 4b). Despite their conserved fold, the sequence identity among the ring proteins is relatively low, ranging from 14.9% (R2–R3) to 18.6% (R1–R2), although the more closely related paralogues in ΦcrAss001—R3 and R4—are 30.6% identical.

Fig. 4. The tail barrel and fibre docking hubs.

Fig. 4

a, Two opposing chains of ring proteins of the tail barrel, depicted as ribbon diagram, from top to bottom: R1 (gp43) (light cyan), R2 (gp40) (navy), R3 (gp35) (mid green), R4 (gp34) and R5 (gp34) (light green), THA (gp38) (orange), THB (gp39) (pink) and gp22 (dark grey). b, Individual subunits of each ring protein alongside their dodecameric ring assemblies viewed along the central axis. c, The collar formed by 12 tail hub assemblies, viewed along the central axis from the capsid end, depicted in ribbon diagram with chains of one tail hub coloured orange for THA1 and THA2 and pink for THB. TH, tail hub. d, The collar as in c, rotated 90° and enlarged, showing interlocking tail hubs. e, Enlarged view of disulfide bonds (yellow, ball and stick) between tail hub proteins and R1 and R2 proteins. Asterisks designate disulfide bonds.

The α-helical domain of gp34 shares a similar fold with tail-joining factors in tailed bacteriophages with different tail morphologies (HK97, Mu and P22) and in gene-transfer agents26,27. The presence of multiple ring protein genes, however, appears to be a conserved and defining feature of crassviruses (Fig. 1d). Some variation exists in this pattern of conservation: genes 34 (R3) and gp35 (R4 and R5) of ΦcrAss001 are more recent paralogues, and many other crassviruses instead encompass a single homologue of R3, R4 or R5, and therefore together with R1 and R2 possess three ring protein genes. The extended tail barrel in ΦcrAss001 that results from this set of ring proteins may assist the tail muzzle in reaching the outer cell membrane, through the approximately-30 nm compact capsular polysaccharide13. Having three ring protein genes rather than four could result in a shorter tail barrel, and the outgroup in sequence analysis, phage Φ13:2, is distinctive in its possessing only the R3/R4 gene homologue (Fig. 1d).

R1 and R2 form an interface for interaction with the tail fibre docking hubs, which are the point of attachment of the tail fibre appendages (Fig. 4a,c). The tail hub (TH) proteins form a continuous ‘collar’ around the tail barrel (Fig. 4c). Each hub assembly is an A2B trimer that consists of two copies of THA (gp38) and one copy of the smaller protein THB (gp39) (Fig. 4d). The C-terminal segment of THB, residues 67–107, extends into the neighbouring hub, resulting in continuous interlocking ring of 12 docking hubs around the barrel (Fig. 4d). Each docking hub, in turn, interacts with the trimeric tail fibre protein gp22, accommodating its N-terminal residues 1–26 (Fig. 4a). Gp22 interacts with additional tail fibre proteins making up the inner and outer tail fibre assemblies, which are highly flexible and less well resolved in localized reconstructions. The tail barrel and fibre docking hubs are stabilized by disulfide bonds (Extended Data Table 2), notably between fibre docking hub proteins (Fig. 4d), the fibre docking hub and ring proteins (Fig. 4e), and between R1 and R2.

Extended Data Table 2.

Disulfide bonds stabilising the tail assembly

graphic file with name 41586_2023_6019_Tab2_ESM.jpg

Residues forming disulfide interactions stabilising protein-protein interactions are indicated by numbers.

The muzzle assembly

At the end of the tail is a large hexameric assembly of the muzzle protein gp44 (Fig. 5). The muzzle protein is composed of five domains (Fig. 5a,b), of which the ring-joining domain and the seven-bladed β-propeller are similar with those of the phage T7 nozzle protein19, and the remaining three domains are distinct. Below the β-propeller are two immunoglobulin-like domains, and the fifth domain comprising the bulk of gp44 has no counterpart in known virus tail proteins or in any structurally characterized proteins.

Fig. 5. The muzzle assembly.

Fig. 5

a, Top, two opposing subunits of the muzzle protein shown as ribbons (red). Bottom, a hexamer of the muzzle viewed along the central axis with cargo protein in purple. Circled numbers denote the blades of the β-propeller. b, A single subunit of the muzzle protein (right chain in a rotated by 90° around the central tunnel axis), with the ring-joining domain in orange, the β-propeller domain in green, the immunoglobulin-like domain 1 (IG1) in pink, IG2 in cyan and the crass domain polypeptide segments in light blue and yellow. c, Topology diagram of the muzzle protein with domains coloured as in b. The dotted line in the crass domain indicates the β-barrel strands. d, Schematic of the muzzle protein with individual domains coloured as in b,c, with domain boundary residue numbers indicated. β, β-propeller domain. R, ring-joining domain.

Inserted between blades 2 and 3 of the propeller is immunoglobulin-like domain 1 (IG1, residues 163–292) and inserted into blade 3 is an immunoglobulin-like domain 2 (IG2); IG1 and IG2 share a similar topology (Fig. 5b). This immunoglobulin-like fold is structurally similar to the C2-set domains of immunoglobulins28 (CATH superfamily 2.60.40.150), albeit with a different order of strands, with highest structural similarity observed for synaptotagmin-129 (DALI Z-score of 5.1 and r.m.s.d. of 3.9 Å with PDB 5CCH chain F).

Serving as a platform for assembly of the large domain forming the distal part of the muzzle is IG2 (between residues 318–1021). This domain is formed by insertions into loops between IG2 strands E–F (residues 453–904) and G–H (residues 940–1012), with the E–F insertion making up the bulk of this region (Fig. 5c). As this domain adopts a fold with no significant structural similarity to known protein domains, we refer to it here as the ‘crass fold’. A small β-barrel below domain IG2 (indicated by a dotted line on Fig. 5c; 70 residues spanning segments 453–462, 563–571, 577–580, 581–591, 709–716, 725–733, 833–839, 894–901 and 902–905) structurally resembles other 6-stranded β-barrels (for example, r.m.s.d. of 3.2 Å with 50S ribosomal protein L2, PDB code 4U67, chain B).

The muzzle is gated by a loop within blade 4 of the propeller (Fig. 5a,b). One role of the muzzle is likely to be the formation of this constriction, preventing leakage of DNA and proteins contained in the capsid and tail until infection is initiated. Owing to the restricted channel diameter above and below the gate, and interactions at the inter-subunit interfaces, opening it would require conformational rearrangements of the IG and crass domains. The crass domain, owing to its location on the tail tip and the presence of multiple exposed surface loops, is likely to have a role in contacting the cell surface, thus initiating these conformational changes. The muzzle protein is highly conserved among the crassviruses (Fig. 1d), but exhibits length variations resulting from insertions into the crass domain and IG2 domain regions that can affect the assembly’s surface properties.

Cargo protein storage zones

Crassviruses are expected to carry several viral gene products inside the virions, including the large multi-subunit RNA polymerase30, collectively referred to here as cargo proteins. Consistent with this expectation, mass spectrometry of ΦcrAs001 virions (Extended Data Table 3) detected the presence of proteins referred to here as cargo protein 1 (gp45, 90.8 kDa), cargo protein 2 (gp46, 60.2 kDa), and three RNA polymerase-associated proteins: gp47 (262.5 kDa), gp49 (154.5 kDa) and gp50 (276.9 kDa), which are broadly conserved among the crassviruses (Fig. 1d).

Extended Data Table 3.

Virion proteins detected by mass spectrometry

graphic file with name 41586_2023_6019_Tab3_ESM.jpg

Individual proteins are shown with the number of unique peptides detected and the overall sequence coverage.

To investigate how these cargo proteins pack inside the virion, despite the absence of clear corresponding density in reconstructions, rotational averaging of the virion map around the axis of the portal protein tunnel was performed (Fig. 6a). Inspection of this map revealed an area inside the capsid with a substantially lower density where packing of the surrounding layers of DNA was discontinuous. This internal capsid region located close to the portal vertex was estimated to be about 1.10 × 107 Å3 in volume. Also notable is the density inside the tail barrel, which is higher than would be expected from solvent alone, and a strong rod-like feature within the portal (Fig. 6a). The space inside the tail barrel measures approximately 8.15 × 105 Å3.

Fig. 6. The cargo protein zones.

Fig. 6

a, Z-clipped view of the rotationally-averaged virion map. The protein cargo zones are indicated by outlines inside the capsid (yellow) and in the tail barrel (red). Purple arrows indicate locations of structured regions of cargo proteins that are resolved in reconstructions. bd, Structures of the portal-bound domain of C1 (gp45) (b), the tail barrel-bound region of C1 (c) and the muzzle-bound cargo protein fragment (d) are shown with the corresponding density maps (transparent grey). e, Schematic of cargo protein locations inside the virion. Proteins (purple) are stored in the cargo zones located in the tail (T) and capsid (C). IM, inner membrane; OM, outer membrane.

In the cryo-EM reconstructions, fragments of cargo protein 1 (C1) (gp45), were resolved inside both the capsid and the tail. As noted, residues 425–552 fold into a compact domain bound to the portal protein inside the capsid (Fig. 6b). Unexpectedly, well-defined density at the inner surface of R1 and R2 of the tail barrel corresponds to 12 copies of an additional α-helical segment of the same protein, gp45 (residues 214–245) (Fig. 6c). Thus, different portions of gp45 are found inside either the capsid or the tail, separated by the portal protein tunnel.

On the basis of volume measurements, we can estimate the storage capacity of the cargo zones. Assuming proteins were packed at 2.15 Å3 per Da volume per molecular weight, consistent with typical protein crystals with around 43% solvent content31, the capsid and tail cargo zones could store proteins with a total molecular mass of up to around 5.1 MDa inside the capsid and an additional 0.4 MDa inside the tail barrel. These two zones can thus harbour multiple copies of the cargo proteins, as detected by mass spectrometry (for example, 6 copies of each the three RNA polymerase-associated protein and up to 12 copies of the smaller proteins gp45 and gp46). The remaining internal capsid volume available for the 102,679 bp genome is approximately 1.84 × 108 Å3, resulting in a DNA packing density of 0.56 bp nm−3, consistent with packaging densities observed for other viruses32.

A further region of density inside the tail was observed at its distal end, next to the ring-joining domain of the muzzle protein (Fig. 6d). This region corresponds to 6 copies of a 14-residue segment containing an α-helix. Although this segment could not be assigned to a specific protein, it probably corresponds to yet another N-terminal segment of gp45, extending from the 214–245 fragment. Indeed, peptides spanning the regions 79–101 and 181–190 of gp45 were detected by mass spectrometry (Extended Data Fig. 3). Consistent with this, the 8.15 × 105 Å3 volume of the tail cargo zone could accommodate up to about 0.4 MDa of protein material, which is sufficient for 12 copies of residues 1–245 of gp45 (about 306 kDa). This makes the capsid cargo zone the likely locus for the remaining cargo proteins, C2 (gp46) and the RNA polymerase-associated proteins (gp47, gp49 and gp50) (Fig. 6e).

Extended Data Fig. 3. Sequence of the cargo protein gp45.

Extended Data Fig. 3

Regions detected by mass spectrometry are in purple, structured regions identified in cryo-EM density are bold and underlined; the predicted transmembrane helix is in yellow.

A model for protein ejection

The data reported here, along with previous observations on phage P22, and other podoviruses including T7, provide the basis for proposing a hypothetical model for protein ejection (Extended Data Fig. 4). Although podoviruses do not have a tail barrel as extended as that of ΦcrAss001, many do contain ‘ejectosome’ proteins that are packed as cargo inside their capsids. In the bacteriophage P22 capsid, there are 3 such proteins33 (gp7, gp16 and gp20) with a combined mass of around 2.5 MDa. For T7, ejection proteins were shown to form a membrane-spanning channel during the initial phase of infection34. Although not all cargo proteins were resolved in the reconstructions of ΦcrAss001, these proteins have to exit the capsid ahead of DNA because they form the channel for DNA ejection. As in the case of T7, we expect that gp45 and gp46 of ΦcrAss001 are involved in transmembrane channel formation, with the other three cargo proteins annotated as RNA polymerase-related proteins5. Given that one region of gp45 (residues 214–245) is already poised for ejection, positioned inside the tail barrel of ΦcrAss001, this protein is likely to be ejected first and to contribute to channel formation. A transmembrane helix is predicted by TMHMM-2.035 to exist in the N-terminal region of gp45 (residues 136–158) (Extended Data Fig. 3), which could insert into the host cell membrane in a similar manner as described for the T7 protein gp1436.

Extended Data Fig. 4. Model for ejection of cargo proteins.

Extended Data Fig. 4

Left cargo proteins (purple) are stored in the tail barrel and capsid cargo zones in association with DNA (green). Middle: cargo proteins are being extruded through the constrictions of the portal and tail barrel and are passing through the transmembrane channel (purple) into the host’s cytoplasm. Injected proteins can undergo refolding in the cytoplasm. Right: cargo proteins’ ejection is complete and is followed by DNA ejection. Viral RNA polymerases have re-folded and transcription of viral genome begins. CPS, capsular polysaccharide.

The two parts of cargo protein C1, gp45, identified in the tail barrel (residues 214–245) and in the capsid (residues 425–552) are separated by a distance of approximately 420 Å and could be bridged by the connecting 179 residues. Indeed, peptides spanning the regions 275–300 and 307–320 of gp45 were detected by mass spectrometry (Extended Data Fig. 3), indicating that these regions are present inside the virion. We can expect gp45 to be ejected in the N→C direction: with the N-terminal part located in the tail exiting first followed by the C-terminal domain located in the capsid (Extended Data Fig. 4). The size and globular structure of the C-terminal part of gp45 (Fig. 6b) suggests that its ejection involves partial unfolding.

The other four proteins detected by mass spectrometry (gp46, gp47, gp49 and gp50) are predicted by AlphaFold237 to adopt folded α-helical structures (Extended Data Fig. 5) and consistently, the crystal structure of a homologue of gp49 showed it is predominantly α-helical30. Several constrictions inside the tail barrel are less than 50 Å in diameter, requiring that these cargo proteins are partially unfolded to pass through.

Extended Data Fig. 5. Ribbon diagrams of cargo protein structures predicted by AlphaFold.

Extended Data Fig. 5

Top row, coloured according to the pLDDT score: 50, orange; 70, yellow; 100, blue. Mean overall pLDDT is indicated underneath each protein structure. Bottom row, coloured according to secondary structure (α-helix, red; β-strand, blue; coil, grey) with overall percentages of α- and β- residues indicated underneath.

Although understanding precisely how proteins and DNA are ejected requires further studies, it is likely that ejection is facilitated by pressure inside the head, which can reach38 around 100 atmospheres. The order in which crassvirus proteins are extruded through the tail is likely to be controlled both by the way these proteins are packed and by their interactions with one another. Once the cargo proteins forming the transmembrane tunnel have exited the virion, DNA could begin to descend through the tail barrel. DNA ejection could be facilitated by solid-to-fluid–like transitions as suggested for bacteriophage λ39,40.

When considering the roles of the cargo proteins, it should be noted that members of the epsilon group of Crassvirales lack a detectable homologue of gp45, whereas homologues of gp46 are missing in the Suoliviridae (delta) and zeta groups (Fig. 1d). Thus, the sets of cargo proteins and accordingly the ejection mechanisms may vary across the crassviruses.

In conclusion, the reconstruction of the ΦcrAss001 virion identifies the structural hallmarks of crassviruses and—combined with comparative genomic analysis—has enabled us to assign functions to the majority of the previously uncharacterized proteins involved in virion assembly and infection. Among these structural features, the muzzle protein containing the novel crass fold functions as a gatekeeper, which probably contributes to the contact between the virion and the host cell surface. The extended tail barrel formed by widely conserved paralogous ring proteins has a high capacity for storage of cargo proteins alongside the capsid cargo zone, ensuring that the proteins required for the initial stages of crassvirus infection are carried inside virions and delivered to the host cell. The presence of a cargo protein storage zone in the tail is a distinct feature that, to our knowledge, has not been observed in viruses before, and suggests a mechanism of compartmentalization of the virus genome and cargo proteins ensuring their ordered release into the host.

Methods

Virus propagation and purification

The host strain B. intestinalis APC919/174 was propagated anaerobically in Fastidious Anaerobe Broth (Neogen) at 37 °C. Cultures were infected once a value of A600 = 0.2 (~2 × 108 colony-forming units per ml) was reached, at a multiplicity of infection (MOI) of 1, and grown overnight at 37 °C in anaerobic jars. Cultures were centrifuged for 15 min at 5,200g at 4 °C and filtered through 0.45-μm polyethersulfone (PES) filters. Filtered lysate was concentrated by polyethylene glycol precipitation by addition of solid PEG-8000 to 10% w/v and NaCl to 0.5 M and incubation overnight at 4 °C. The solution was centrifuged at 5,000g for 30 min at 4 °C, and the pellet resuspended in a minimum volume of SM buffer (100 mM NaCl, 10 mM MgSO4, 50 mM Tris-HCl pH 7.5). The solution was applied to the top of a caesium chloride step gradient in an Ultra-Clear tube (Beckman), comprising of solutions of densities ~1.31 and ~1.55 g ml−1 in 20 mM Tris-HCl pH 7.5, 10 mM MgSO4. Gradients were centrifuged at 28,000 rpm in an SW28 rotor at 4 °C for 3 h. The band containing mature virions was extracted by needle side-puncture. Samples were dialysed against buffer containing 20 mM Tris-HCl pH 7.5, 150 mM NaCl, 10 mM MgSO4 overnight at 4 °C. Virus particles were further concentrated by centrifugation at 40,000 rpm in an SW41 Ti rotor at 8 °C for 1 h and resuspending in 50 μl of buffer. This sample was used for all subsequent electron microscopy analysis.

Electron microscopy

For negative staining, carbon-formvar coated copper grids (Agar Scientific) were plasma cleaned in a PELCO easiGlow for 60 s at 0.38 mbar (air) and 20 mA. Sample solution (5 µl) was applied to a grid for 1 min then blotted with filter paper, and stained with 5 µl of 2% w/v uranyl acetate. Micrographs were recorded using a Tecnai 12 G2 BioTWIN microscope with tungsten filament operating at 120 kV, with a Ceta 16M camera (Thermo Fisher Scientific). For initial screening of samples, 16 negatively stained micrographs were recorded at magnifications of 9,300–98,000×. For cryo-EM, sample was applied to lacey carbon grids supporting an ultrathin carbon layer on 400-mesh copper (Ted Pella), glow discharged as above, and vitrified using a Vitrobot Mark IV (Thermo Fisher Scientific) at 100% humidity and 4 °C. Data were collected on a 300 kV Titan Krios (Thermo Fisher Scientific) with images recorded on a Falcon3EC detector operating in integrating mode, using EPU software (2.5.0). In total, four datasets were recorded (Extended Data Table 1) and processed together in RELION 3.141. Motion correction was performed using RELION and contrast transfer functions (CTFs) were estimated using CTFFIND442 using dose-weighting. Empty micrographs containing no virus particles were excluded from processing beyond CTF estimation. Particles selected after 2D classification were used for 3D refinements, initially using icosahedral symmetrical averaging. To check if magnified pixel size calibrations were consistent across datasets, reconstructions were compared using determine_relative_pixel_size.py (890eb35)43 and by manually inspection, and pixel sizes scaled to reference optics group 1. A symmetry-expanded set of particles was generated using relion_particle_symmetry_expand. The unique vertex of the capsid was identified by application of a cylindrical mask on one vertex and 3D classification into 10 classes without performing image alignment. Particles with tails inside the masked region after classification were selected for further rounds of 3D refinement using local searches, after extracting sub-particles using relion_localized_reconstruction44. Alternatively, in the case of the muzzle assembly, sub-particles were extracted prior to masked classification. Additional symmetry expansions were performed to resolve the muzzle assembly from tail barrel particles refined with C12 symmetry, and of the asymmetric portal-containing vertex. Refinements were followed by CTF refinement of per-particle defocus values and astigmatism, and magnification anisotropy estimation. For the symmetry-free reconstruction of the whole virion, re-extracted particles were enlarged to include the whole of the virus particle images centred on the virion and reconstructed with Ewald sphere correction. Resolutions of the reconstructions were estimated using the Fourier shell correlation 0.143 threshold and using RELION’s local resolution estimation. Owing to the large particle size, gold diffraction data collected for optics group 1 was used to check pixel size calibration and magnification anisotropy45, and the final pixel size scaled during postprocessing. Atomic models (Extended Data Fig. 6) were generated by manual building in Coot (0.9.8.1)46, followed by real space refinement in Phenix (1.19)47. Images were generated using ChimeraX (1.5)48.

Extended Data Fig. 6. Representative density maps with corresponding fitted models for regions of virion proteins.

Extended Data Fig. 6

Maps are depicted in blue mesh with atomic models shown in stick style coloured by atom type. Each protein is labelled with residue ranges shown and the corresponding PDB codes indicated in parentheses.

Mass spectrometry analysis

Two samples of purified virions were analysed: one sample, R, was reduced and the other, NR, was untreated before running on 4–12% SDS–PAGE (Invitrogen), and the Coomassie-stained band was excised. Sample R was further reduced with 1.5 mg ml−1 dithioerythritol in-gel, while NR was untreated. Both samples were alkylated with 9.5 mg ml−1 iodoacetamide and digested overnight with trypsin. Resulting peptides were analysed by liquid chromatography–tandem mass spectrometry (LC–MS/MS) over 1 h acquisitions with elution from a 50 cm EasyNano PepMap column into an Orbitrap Fusion Tribrid mass spectrometer operated in DDA mode. Duplicate acquisitions were performed for each sample, one with MS2 measured in the liner ion trap and the other with high resolution MS2 acquired in the Orbitrap mass analyser. Spectra were peak picked and searched against a sequence database containing ΦcrAss001 hypothetical proteins and common proteomic contaminants. Oxidation of methionine and carbamidomethylation of cystine were set as variable modifications. Identified peptides were filtered to 1% false discovery rate as assessed against a reversed database search. Relative quantification between samples and acquisitions was performed within PEAKS Studio (10.5)49 from areas of aligned extracted ion chromatograms. The mean of the peak areas across the ion trap and Orbitrap acquisitions were calculated for each protein in R and NR samples respectively.

AlphaFold structure predictions

Structure predictions using AlphaFold237 were performed for ΦcrAss001 protein sequences: gp46, gp47, gp49, gp50, gp21 and gp29. In the case of gp21 and gp29, the C-terminal domains that could not be modelled ab initio using electron microscopy reconstructions (owing to limited resolution and flexibility in these parts) were rigid-body docked into the density maps above the corresponding gp21 and gp29 capsid binding sites. Complete pseudo-atomic models for each fibre protein oligomer were created by fusing the N-terminal (cryo-EM modelled) and C-terminal (AlphaFold modelled) regions linked in coot.

Protein sequence analysis

To calculate the representation of ΦcrAss001 virion protein homologues across the Crassvirales, the protein sequences encoded in the previously described reference genome set5 were searched. The reference set consisted of 673 virus genomes grouped into 211 clusters at 90% similarity at the genome nucleotide sequence level. Each virion protein was queried against the reference set using PSI-BLAST50. To account for over-representation of some crassviruses, a genome cluster was given a score of 1 if a ΦcrAss001 protein homologue was found in more than half of the genomes in that cluster (otherwise 0). The cluster scores were then summed within the previously delineated groups of crassviruses (alpha to zeta) and plotted against the corresponding ΦcrAss001 proteins (Extended Data Table 4, Fig. 1d). To identify homologues outside the Crassvirales, the Genbank and RefSeq virus databases were queried using PSI-BLAST.

Extended Data Table 4.

Representation of ΦcrAss001 protein homologues across crassviruses

graphic file with name 41586_2023_6019_Tab4_ESM.jpg

Scores shown were calculated as described in Methods.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Online content

Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41586-023-06019-2.

Supplementary information

Reporting Summary (1.5MB, pdf)
Peer Review File (542KB, pdf)

Acknowledgements

The authors thank S. Hart and J. Turkenburg for assistance with cryo-EM; A. Dowle and C. Taylor for assistance with mass spectrometry experiments; R. Thompson, E. Hesketh and D. Maskell; and support staff at eBIC (Diamond Light Source, UK). This project used the Viking Cluster high performance compute facility provided by the University of York; we are grateful for computational support from the University of York High Performance Computing service, Viking and the Research Computing team. This work was supported by the Wellcome Trust (grant 206377, 224665 to A.A.A.), Instruct-ERIC (PID: 10175), and Diamond Light Source for access to eBIC under proposal EM19832 funded by the Wellcome Trust, MRC and BBRSC. The cryo-EM facility at the University of York was supported by the Wellcome Trust, grant 206161. A.N.S. was supported by the Wellcome Trust (grant 220646) and ERC (grant agreement no. 101001684). E. V. Khokhlova and C.H. were supported by Science Foundation Ireland (SFI) under Grant no. SFI/12/RC/2273. N.Y. and E. V. Koonin are supported by the Intramural Research Program of the National Institutes of Health of the USA (National Library of Medicine). Molecular graphics and analyses were performed with UCSF ChimeraX, developed by the Resource for Biocomputing, Visualization, and Informatics at the University of California, San Francisco, with support from National Institutes of Health R01-GM129325 and the Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases.

Extended data figures and tables

Author contributions

O.W.B., A.N.S., E. V. Koonin, C.H. and A.A.A. conceived the study. A.N.S. and E. V. Khokhlova cultured cells and viruses. O.W.B., A.N.S. and E. V. Khokhlova purified virions. O.W.B. and D.E.D.P.H. collected cryo-EM data. O.W.B. processed cryo-EM data. O.W.B. and J.L.R.S. built and refined atomic models. O.W.B. and A.A.A. analysed structural data. N.Y. and E. V. Koonin performed comparative genomic analysis. O.W.B., A.N.S., E. V. Koonin, C.H. and A.A.A. wrote the manuscript, which was read, edited and approved by all authors.

Peer review

Peer review information

Nature thanks Jeremy Barr, Antoni Luque and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer review reports are available.

Data availability

Cryo-EM reconstructions and atomic coordinates comprising the virion capsid and tail structures have been deposited to the Electron Microscopy Data Bank under accession codes EMD-14088, EMD-14089, EMD-14090, EMD-14091, EMD-14092, EMD-14093, EMD-14094 and EMD-14100 (maps) and to the Protein Data Bank under accession codes 7QOF, 7QOG, 7QOH, 7QOI, 7QOJ, 7QOK and 7QOL (atomic coordinates), with example regions of maps and corresponding models in Extended Data Fig. 6. Data collection and refinement statistics are presented in Extended Data Table 1. Multiple sequence alignments of ΦcrAss001 virion protein homologues are available at https://ftp.ncbi.nih.gov/pub/yutinn/crAss_structural_prot_2022/.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Oliver W. Bayfield, Email: oliver.bayfield@york.ac.uk

Alfred A. Antson, Email: fred.antson@york.ac.uk

Extended data

is available for this paper at 10.1038/s41586-023-06019-2.

Supplementary information

The online version contains supplementary material available at 10.1038/s41586-023-06019-2.

References

  • 1.Shkoporov AN, Hill C. Bacteriophages of the human gut: the “known unknown” of the microbiome. Cell Host Microbe. 2019;25:195–209. doi: 10.1016/j.chom.2019.01.017. [DOI] [PubMed] [Google Scholar]
  • 2.Dutilh BE, et al. A highly abundant bacteriophage discovered in the unknown sequences of human faecal metagenomes. Nat. Commun. 2014;5:4498. doi: 10.1038/ncomms5498. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Guerin E, et al. Biology and taxonomy of crAss-like bacteriophages, the most abundant virus in the human gut. Cell Host Microbe. 2018;24:653–664.e6. doi: 10.1016/j.chom.2018.10.002. [DOI] [PubMed] [Google Scholar]
  • 4.Yutin N, et al. Discovery of an expansive bacteriophage family that includes the most abundant viruses from the human gut. Nat. Microbiol. 2018;3:38–46. doi: 10.1038/s41564-017-0053-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Yutin N, et al. Analysis of metagenome-assembled viral genomes from the human gut reveals diverse putative CrAss-like phages with unique genomic features. Nat. Commun. 2021;12:1044. doi: 10.1038/s41467-021-21350-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Shkoporov AN, et al. ΦCrAss001 represents the most abundant bacteriophage family in the human gut and infects Bacteroides intestinalis. Nat. Commun. 2018;9:4781. doi: 10.1038/s41467-018-07225-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Roux S, et al. Minimum information about an uncultivated virus genome (MIUVIG) Nat. Biotechnol. 2019;37:29–37. doi: 10.1038/nbt.4306. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Camarillo-Guerrero LF, Almeida A, Rangel-Pineros G, Finn RD, Lawley TD. Massive expansion of human gut bacteriophage diversity. Cell. 2021;184:1098–1109.e9. doi: 10.1016/j.cell.2021.01.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Nayfach S, et al. Metagenomic compendium of 189,680 DNA viruses from the human gut microbiome. Nat. Microbiol. 2021;6:960–970. doi: 10.1038/s41564-021-00928-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Koonin EV, Yutin N. The crAss-like phage group: how metagenomics reshaped the human virome. Trends Microbiol. 2020;28:349–359. doi: 10.1016/j.tim.2020.01.010. [DOI] [PubMed] [Google Scholar]
  • 11.Edwards RA, et al. Global phylogeography and ancient evolution of the widespread human gut virus crAssphage. Nat. Microbiol. 2019;4:1727–1736. doi: 10.1038/s41564-019-0494-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Benler S, et al. Thousands of previously unknown phages discovered in whole-community human gut metagenomes. Microbiome. 2021;9:78. doi: 10.1186/s40168-021-01017-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Shkoporov AN, et al. Long-term persistence of crAss-like phage crAss001 is associated with phase variation in Bacteroides intestinalis. BMC Biol. 2021;19:1–16. doi: 10.1186/s12915-021-01084-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Wikoff WR, et al. Topologically linked protein rings in the bacteriophage HK97 capsid. Science. 2000;289:2129–2133. doi: 10.1126/science.289.5487.2129. [DOI] [PubMed] [Google Scholar]
  • 15.Fokine A, et al. Structural and functional similarities between the capsid proteins of bacteriophages T4 and HK97 point to a common ancestry. Proc. Natl Acad. Sci. USA. 2005;102:7163–8. doi: 10.1073/pnas.0502164102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Xiang Y, Rossmann MG. Structure of bacteriophage ϕ29 head fibers has a supercoiled triple repeating helix–turn–helix motif. Proc. Natl Acad. Sci. USA. 2011;108:4806–4810. doi: 10.1073/pnas.1018097108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Motwani T, Teschke CM. Architect of virus assembly: the portal protein nucleates procapsid assembly in bacteriophage P22. J. Virol. 2019;93:00187-19. doi: 10.1128/JVI.00187-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Sun L, et al. Cryo-EM structure of the bacteriophage T4 portal protein assembly at near-atomic resolution. Nat. Commun. 2015;6:7548. doi: 10.1038/ncomms8548. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Cuervo A, et al. Structures of T7 bacteriophage portal and tail suggest a viral DNA retention and ejection mechanism. Nat. Commun. 2019;10:3746. doi: 10.1038/s41467-019-11705-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Chaban Y, et al. Structural rearrangements in the phage head-to-tail interface during assembly and infection. Proc. Natl Acad. Sci. USA. 2015;112:7009–7014. doi: 10.1073/pnas.1504039112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Lebedev AA, et al. Structural framework for DNA translocation via the viral portal protein. EMBO J. 2007;26:1984–1994. doi: 10.1038/sj.emboj.7601643. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Tang J, et al. Peering down the barrel of a bacteriophage portal—the genome packaging and release valve in P22. Structure. 2011;19:496–502. doi: 10.1016/j.str.2011.02.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Lokareddy RK, et al. Portal protein functions akin to a DNA-sensor that couples genome-packaging to icosahedral capsid maturation. Nat. Commun. 2017;8:14310. doi: 10.1038/ncomms14310. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Liu Y-T, Jih J, Dai X, Bi G-Q, Zhou ZH. Cryo-EM structures of herpes simplex virus type 1 portal vertex and packaged genome. Nature. 2019;570:257–261. doi: 10.1038/s41586-019-1248-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Simpson AA, Petr G, Jardine PJ, Dwight L, Michael G. Structure determination of the head–tail connector of bacteriophage phi29. Acta Crystallogr. D. 2001;D57:1260–1269. doi: 10.1107/S0907444901010435. [DOI] [PubMed] [Google Scholar]
  • 26.Iwasaki T, et al. Three-dimensional structures of bacteriophage neck subunits are shared in Podoviridae, Siphoviridae and Myoviridae. Genes Cells. 2018;23:528–536. doi: 10.1111/gtc.12594. [DOI] [PubMed] [Google Scholar]
  • 27.Bárdy P, et al. Structure and mechanism of DNA delivery of a gene transfer agent. Nat. Commun. 2020;11:3034. doi: 10.1038/s41467-020-16669-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Leonard, T. A. in Encyclopedia of Metalloproteins (eds Kretsinger, R. H., Uversky, V. N. & Permyakov, E. A.) 309–318 (Springer, 2013).
  • 29.Holm L. DALI and the persistence of protein shape. Protein Sci. 2020;29:128–140. doi: 10.1002/pro.3749. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Drobysheva AV, et al. Structure and function of virion RNA polymerase of a crAss-like phage. Nature. 2021;589:306–309. doi: 10.1038/s41586-020-2921-5. [DOI] [PubMed] [Google Scholar]
  • 31.Matthews BW. Solvent content of protein crystals. J. Mol. Biol. 1968;33:491–497. doi: 10.1016/0022-2836(68)90205-2. [DOI] [PubMed] [Google Scholar]
  • 32.Hua J, Huet A, Lopez CA, Toropova K. Capsids and genomes of jumbo-sized bacteriophages reveal the evolutionary reach of the HK97 fold. mBiol. 2017;8:01579-17. doi: 10.1128/mBio.01579-17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Wu W, et al. Localization of the houdinisome (ejection proteins) inside the bacteriophage P22 virion by bubblegram imaging. mBio. 2016;7:01152-16. doi: 10.1128/mBio.01152-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Swanson NA, et al. Cryo-EM structure of the periplasmic tunnel of T7 DNA-ejectosome at 2.7 Å resolution. Mol. Cell. 2021;81:3145–3159.e7. doi: 10.1016/j.molcel.2021.06.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Krogh A, Larsson B, Von Heijne G, Sonnhammer ELL. Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes. J. Mol. Biol. 2001;305:567–580. doi: 10.1006/jmbi.2000.4315. [DOI] [PubMed] [Google Scholar]
  • 36.Chen W, et al. Structural changes in bacteriophage T7 upon receptor-induced genome ejection. Proc. Natl Acad. Sci. USA. 2021;118:e2102003118. doi: 10.1073/pnas.2102003118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Jumper J, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596:583–589. doi: 10.1038/s41586-021-03819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Smith DE, et al. The bacteriophage φ29 portal motor can package DNA against a large internal force. Nature. 2001;413:748–752. doi: 10.1038/35099581. [DOI] [PubMed] [Google Scholar]
  • 39.Evilevitch A. The mobility of packaged phage genome controls ejection dynamics. eLife. 2018;7:e37345. doi: 10.7554/eLife.37345. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Liu T, et al. Solid-to-fluid-like DNA transition in viruses facilitates infection. Proc. Natl Acad. Sci. USA. 2014;111:14675–14680. doi: 10.1073/pnas.1321637111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Scheres SHW. RELION: implementation of a Bayesian approach to cryo-EM structure determination. J. Struct. Biol. 2012;180:519–530. doi: 10.1016/j.jsb.2012.09.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Rohou A, Grigorieff N. CTFFIND4: fast and accurate defocus estimation from electron micrographs. J. Struct. Biol. 2015;192:216–221. doi: 10.1016/j.jsb.2015.08.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Wilkinson ME, Kumar A, Casan al A. Methods for merging data sets in electron cryo-microscopy. Acta Crystallogr. D. 2019;75:782–791. doi: 10.1107/S2059798319010519. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Ilca SL, et al. Localized reconstruction of subunits from electron cryomicroscopy images of macromolecular complexes. Nat. Commun. 2015;6:8843. doi: 10.1038/ncomms9843. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Grant T, Grigorieff N. Automatic estimation and correction of anisotropic magnification distortion in electron microscopes. J. Struct. Biol. 2015;192:204–208. doi: 10.1016/j.jsb.2015.08.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Emsley P, Lohkamp B, Scott WG, Cowtan K. Features and development of Coot. Acta Crystallogr. D. 2010;66:486–501. doi: 10.1107/S0907444910007493. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Adams PD, et al. PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr. D. 2010;66:213–221. doi: 10.1107/S0907444909052925. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Goddard TD, et al. UCSF ChimeraX: meeting modern challenges in visualization and analysis. Protein Sci. 2018;27:14–25. doi: 10.1002/pro.3235. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Tran NH, et al. Deep learning enables de novo peptide sequencing from data-independent-acquisition mass spectrometry. Nat. Methods. 2019;16:63–66. doi: 10.1038/s41592-018-0260-3. [DOI] [PubMed] [Google Scholar]
  • 50.Altschul SF, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Reporting Summary (1.5MB, pdf)
Peer Review File (542KB, pdf)

Data Availability Statement

Cryo-EM reconstructions and atomic coordinates comprising the virion capsid and tail structures have been deposited to the Electron Microscopy Data Bank under accession codes EMD-14088, EMD-14089, EMD-14090, EMD-14091, EMD-14092, EMD-14093, EMD-14094 and EMD-14100 (maps) and to the Protein Data Bank under accession codes 7QOF, 7QOG, 7QOH, 7QOI, 7QOJ, 7QOK and 7QOL (atomic coordinates), with example regions of maps and corresponding models in Extended Data Fig. 6. Data collection and refinement statistics are presented in Extended Data Table 1. Multiple sequence alignments of ΦcrAss001 virion protein homologues are available at https://ftp.ncbi.nih.gov/pub/yutinn/crAss_structural_prot_2022/.


Articles from Nature are provided here courtesy of Nature Publishing Group

RESOURCES