Abstract
Identification of the molecular networks that facilitated the evolution of multicellular animals from their unicellular ancestors is a fundamental problem in evolutionary cellular biology. Choanoflagellates are recognized as the closest extant nonmetazoan ancestors to animals. These unicellular eukaryotes can adopt a multicellular‐like “rosette” state. Therefore, they are compelling models for the study of early multicellularity. Comparative studies revealed that a number of putative human orthologs are present in choanoflagellate genomes, suggesting that a subset of these genes were necessary for the emergence of multicellularity. However, previous work is largely based on sequence alignments alone, which does not confirm structural nor functional similarity. Here, we focus on the PDZ domain, a peptide‐binding domain which plays critical roles in myriad cellular signaling networks and which underwent a gene family expansion in metazoan lineages. Using a customized sequence similarity search algorithm, we identified 178 PDZ domains in the Monosiga brevicollis proteome. This includes 11 previously unidentified sequences, which we analyzed using Rosetta and homology modeling. To assess conservation of protein structure, we solved high‐resolution crystal structures of representative M. brevicollis PDZ domains that are homologous to human Dlg1 PDZ2, Dlg1 PDZ3, GIPC, and SHANK1 PDZ domains. To assess functional conservation, we calculated binding affinities for mbGIPC, mbSHANK1, mbSNX27, and mbDLG‐3 PDZ domains from M. brevicollis. Overall, we find that peptide selectivity is generally conserved between these two disparate organisms, with one possible exception, mbDLG‐3. Overall, our results provide novel insight into signaling pathways in a choanoflagellate model of primitive multicellularity.
Keywords: binding affinities, choanoflagellates, evolution, motifs, PDZ, peptide‐binding domains, protein–protein interactions, selectivity determinants, X‐ray crystallography
Short abstract
1. INTRODUCTION
The events in molecular evolution that led to the origination of multicellular eukaryotes are preserved in the genomes of choanoflagellates, and they are recognized as the closest living relatives to the common ancestor of metazoans and unicellular eukaryotes. 1 , 2 Of particular interest in evolutionary cellular biology is the ability of choanoflagellates to adopt a primitive multicellular state, known as a rosette. 3 , 4 , 5 Comparative studies have revealed that several human gene families have clear orthologs in the choanoflagellate clade, and many of these orthologous genes are differentially expressed during development of the rosette. 6 Furthermore, over 350 gene families previously thought to be present only in animal lineages exist in choanoflagellate genomes. 6 Therefore, there is interest in understanding the molecular underpinnings of signaling pathway proteins in choanoflagellates, in order to gain insight into how multicellularity emerged. Investigators have identified a multitude of conserved protein structural domains and architectures in the choanoflagellate proteome that are essential to metazoan intracellular signaling systems and development. Some examples include the Notch receptor, kinases (e.g., Src family kinases, CamKII, etc.), ubiquitin ligases (e.g., Cbl), and PDZ domains, among others. 2 , 7 , 8 , 9 , 10 , 11 Notably, although functionally present, mechanisms of regulation can vary dramatically; while the phosphorylation‐dependent regulation of Cbl is conserved in metazoans and choanoflagellates, the allosteric regulation of the SH3‐SH2‐Kinase module of Src family kinases is distinct. 9 , 12 , 13
Of the shared gene families in metazoans and choanoflagellates, the PDZ domain is particularly interesting for a number of reasons. PDZ domains play key functional roles in neuronal signaling, and the intercellular attachments that are formed during rosette development are reminiscent of neuronal synapses. 4 , 14 , 15 , 16 In addition, PDZ domains are overrepresented in the genome of choanoflagellate Monosiga brevicollis relative to the unicellular eukaryote Saccharomyces pombe. 2 This is also true of other unicellular eukaryotes, for example, the filasterean species Capasaspora owczarzaki, which, according to UniProt annotations, has 27 PDZ domains in 26 proteins, and the ichthyosporean species Sphaeroforma arctica, with 38 PDZ domains in 32 proteins. Finally, PDZ domains are known to have proliferated in the metazoan lineage. 16 , 17 , 18 , 19 Collectively, these data suggest that PDZ domains played an important role in the evolution of multicellularity and that further characterization of PDZ domains in choanoflagellates may yield insights into molecular mechanisms that facilitated primitive multicellular development.
PDZ domains were named after the first PDZ domain‐containing proteins that were identified (PSD‐95, Dlg1, and ZO‐1). 20 , 21 , 22 , 23 , 24 These initially discovered PDZ domains all contain a “GLGF” amino acid sequence. This shared sequence, referred to as the GLGF‐loop, or the carboxylate‐binding loop, comprises a key component of the canonical PDZ domain structure. The well‐conserved PDZ domain structure dictates its function in mediating protein–protein interactions. Specifically, the amide nitrogen atoms in the backbone of the GLGF‐loop directly interact with the carboxylate atoms of the extreme C‐terminus of a protein ligand. 25 , 26 PDZ domains are approximately 80–100 residues in length, and comparative analysis of hundreds of PDZ structures in the Protein Data Bank reveals a conserved structural fold, consisting of a core antiparallel β‐sheet and 1–2 α‐helices (Figure 1). 26 , 27 , 28 , 29 , 30 , 31 , 32
FIGURE 1.

Conserved fold of PDZ domain structures. The human PDZ homologues of the M. brevicollis PDZ domains studied in this paper are shown in cartoon representation, colored by conserved secondary structure elements, as labeled. Bound peptides are in black stick and labeled. In the DLG2‐3 (PDB ID: 2HE2) structure, only the ETSV residues are from the ATP2B4 protein. The P−4 His residue is an artifact of the C‐terminal extension protocol used for crystallization 29
PDZ domains are scaffolding domains that bind target proteins. In some instances, this facilitates localization of target proteins within close proximity of auxiliary enzymatic domains on the same polypeptide. In other instances, PDZ domain scaffolding activity functions to mediate protein trafficking, and impacts cellular signaling pathways. These PDZ domain interactions can be modulated by other protein–protein interaction domains on the same polypeptide, or in trans, by other proteins in larger macromolecular complexes. 26 , 33 , 34 An example of the scaffolding function of PDZ domains is the postsynaptic density of neurons, where multiple receptor signaling networks are brought into close physical proximity due to a number of PDZ domain‐mediated interactions. 35 As mentioned previously, an expansion of the number of PDZ domain‐containing genes coincided with the emergence of animal multicellularity. 17 This suggests that PDZ domains played a role in the evolution of multicellular animals. The human proteome contains 272 PDZ domains in a variety of protein architectures, but all PDZ domains share the same basic biochemical function of scaffolding protein–protein interactions.
Considering the importance of PDZ domains in a number of cellular processes, significant effort has been invested in characterizing individual PDZ peptide‐binding selectivities. These domains bind to short sequences in target proteins, often interacting with only six amino acid residues. In fact, the motifs of classically determined PDZ binding classes are dependent on only two residues, the extreme C‐terminal residue, termed P0 and two residues adjacent, or P−2. 26 For example, Class I PDZ domains recognize the motif X‐S/T‐X‐φ at the C‐terminus of target proteins (where X = any amino acid and ϕ = any hydrophobic amino acid). 26 Work in the last 10+ years using high throughput techniques, for example, phage display, peptide array, or the hold‐up assay, has shifted this classical view of PDZ domain binding to appreciate the importance of binding interactions at nonmotif residues in the peptide‐binding cleft. 36 , 37 , 38 In addition, a number of elegant studies using directed evolution, or other protein engineering techniques, have successfully identified structural elements that determine PDZ selectivity—often through only a small number of amino acid substitutions or post‐translational modifications. 39 , 40 , 41 , 42 , 43
The elucidation of PDZ binding selectivity has enabled investigators to trace the evolution of PDZ specificity throughout the tree of life, including in bacteria, yeast, and plants. 18 , 42 , 44 However, what remains to be determined is whether or not the selectivity determinants in PDZ domains related by evolution are also conserved, despite different signaling pathways, for example, in uni‐ versus multicellular organisms, or those with and without a nervous system. Previous work looking at the evolution of PDZ domains found that six amino acid positions determine lineage relationships among 40,000 PDZ domains in 40 proteomes and that four of these positions are in direct contact with nonmotif peptide residues (P−1 and P−3). 16 This result suggests that homologous proteins will share conserved residues in the peptide‐binding cleft, including those amino acids that directly interact with residues beyond the P0 and P−2 motif positions.
In order to investigate these questions on a molecular level, we crystallized and solved seven total structures of four PDZ domains from the choanoflagellate, Monosiga brevicollis, including homologues of PDZ domains from the human proteins Dlg1, GIPC1, and SHANK1 (Figure 1). We also investigated the binding affinities of a homologue of human SNX27 (Figure 1). These proteins are important in postsynaptic signaling and well conserved in M. brevicollis, despite over 200 million years of evolution between the last common ancestor of humans and choanoflagellates—and the emergence of neurons. 35 , 45 , 46 , 47
Structural and binding affinity analyses confirm that the residues in the peptide‐binding clefts are generally conserved in these proteins, with a notable exception. In the third PDZ domain of the M. brevicollis Dlg1 homologue, which we refer to as mbDLG‐3, there is a histidine‐to‐tyrosine substitution in a motif‐determining position, as well as substitutions in binding cleft residues that interact with nonmotif residues. This mutation is not shared in Dlg proteins from a number of other organisms, although it is in the Dlg protein from another choanoflagellate species, Salpingoeca rosetta, suggesting it is unique to choanoflagellates. Previous studies investigated the molecular basis of evolution, expansion, and rewiring in PDZ domain networks; however, here we find that for closely related PDZ domains, selectivity determinants for all residues in the binding cleft are generally conserved in evolution, despite a lack of conservation in shared target proteins. 48 , 49 In addition, we analyze the M. brevicollis proteome by conducting pairwise sequence alignments with all human PDZ domains in order to characterize its PDZome, verifying novel domains using homology modeling and Rosetta.
2. RESULTS
2.1. Structural and biochemical characterization of mbGIPC PDZ
To determine if residues that directly interact with the ligand are conserved, including all of those within the peptide‐binding cleft of Class I PDZ domains, we set out to characterize a number of PDZ domains from Monosiga brevicollis with clear homology to human PDZ proteins. We first chose to investigate the homolog of the human G‐alpha interacting protein (GAIP) interacting protein, C terminus, or GIPC. 50 GIPC was first identified as an interactor of the GAIP, but was quickly shown to also interact directly with G‐protein coupled receptors (GPCRs), as well as dopamine and N‐methyl‐D‐aspartate (NMDA) receptors in excitatory synapses of the central nervous system. 47 , 50 , 51 , 52 Thus, GIPC is important for both GPCR and neuronal signaling in human cells, and additional studies have shown that it broadly regulates vesicular trafficking of many transmembrane receptors via interactions with myosin VI. 30
The presence of a GIPC homolog in M. brevicollis is consistent with the identification of adhesion GPCRs in choanoflagellates. 53 , 54 Overall, full length GIPC proteins from human and M. brevicollis (UniProt ID: A9VCZ3_MONBE, termed mbGIPC) share 56% sequence identity over 79% of the protein. The human GIPC protein is a Class I PDZ binding domain, as defined above. Recognition of the P−2 Ser/Thr residue is facilitated by hydrogen bond formation with a conserved histidine in the first position of the conserved αB helix, termed αB‐1. 26 The human and choanoflagellate GIPC PDZ domains are 57% identical over 97% of the PDZ sequence (82 residues), which includes the residues defined by UniProt boundaries plus 3 C‐terminal residues (human GIPC, to be consistent with PDB ID 5V6T) and 15 additional C‐terminal residues in mbGIPC, to include the final β‐strand), including shared carboxylate‐binding loop sequences of ALGL and conservation of the Class I‐defining histidine in the αB‐1 position (Figure S1A, Table 1).
TABLE 1.
Sequence identity values of choanoflagellate and human PDZ domains
| Sequence identity (# of residues) | |||||||
|---|---|---|---|---|---|---|---|
| hGIPC1 | hSHANK1 | hSNX27 | hDLG1 | hDLG2 | hDLG3 (PSD‐95) | hDLG4 | |
| mbGIPC (A9VCZ3) | 57% (82) | ‐ | ‐ | ‐ | ‐ | ‐ | ‐ |
| mbSHANK1 (A9V7E4) | ‐ | 34% (90) | 38% (77) | ‐ | ‐ | ‐ | ‐ |
| mbSNX27 (A9URU5) | ‐ | 36% (89) | 46% (92) | ‐ | ‐ | ‐ | ‐ |
| mbDLG‐1 | ‐ | ‐ | ‐ | 45% (80) | 44% (80) | 48% (80) | 48% (80) |
| mbDLG‐2 | ‐ | ‐ | ‐ | 47% (76) | 48% (91) | 47% (76) | 47% (76) |
| mbDLG‐3 | ‐ | ‐ | ‐ | 43% (83) | 49% (77) | 44% (87) | 44% (87) |
Note: Sequence identity values of mbDLG PDZ domains (with UniProt identifiers in parentheses) with the corresponding PDZ domains from human proteins. BLASTP alignments were conducted using the domain boundaries of all PDZ domains, as defined by UniProt, with the exception of the GIPC PDZ domains, where 3 C‐terminal residues were added to hGIPC and 15 C‐terminal residues to mbGIPC, based on structural analyses. The number of residues in the resulting alignments is shown in parentheses. For the hDLG proteins, the relevant PDZ domain was used in each alignment (e.g., hDLG1 PDZ1 aligned to mbDLG‐1, PDZ2 aligned to mbDLG‐2, and PDZ3 aligned to mbDLG‐3).
We expressed and purified mbGIPC PDZ using previously described methods, and as described in more detail in the Materials and Methods. 38 , 55 Briefly, we used recombinant expression in Escherichia coli cells, followed by affinity and size exclusion chromatography to produce purified mbGIPC PDZ protein. With protein in hand, we crystallized and solved the structure of mbGIPC PDZ to a high resolution of 1.2 Å, as described in the Materials and Methods and Supplementary Information. Overall, this structure is consistent with the conserved PDZ fold, characterized by the central five‐stranded antiparallel β‐sheet (βA‐E) (Figure 1). As mentioned above, while many PDZ domains contain two α‐helices (αA‐B), it appears that αA is slightly strained and therefore not fully formed in the mbGIPC structure, a characteristic that is also true of the human GIPC PDZ domain (PDB IDs: 5V6B and 5V6T). 30 The peptide ligand forms an additional strand of the central β‐sheet (Figure 2A). Data collection and refinement statistics are in Table S1A.
FIGURE 2.

The crystal structure of the mbGIPC PDZ domain. (a) The interaction of mbGIPC (gray) with the C‐terminal tail of a molecule related by symmetry (green), backbone atoms shown in stick, reveals a canonical PDZ–peptide interaction where the peptide forms an additional strand of an antiparallel β‐sheet. Distances are labeled. (b) Average fluorescence polarization displacement isotherms are shown for mbGIPCtrunc PDZ. Titration curves correspond to the following peptides: GAIP (circles), B1AR (squares), TYRP1 (diamonds), and a decameric peptide matching the C‐terminal residues of the construct, ending in “SFDEI” (triangles). Error bars indicate the SD from the mean for triplicate experiments. (c) Alignment of mbGIPC PDZ domains with three separate C‐terminal tail sequences (gray ribbon, RMSD = ≤ 0.21 Å for ~350 main chain atoms), with tail sequences as sticks and colored as labeled. (d) The conservation between mbGIPC (gray cartoon, with cyan side chain residues as sticks; peptide is in cyan ribbon) and human GIPC (PDB ID: 5V6B, with Plexin‐D1 C‐terminal peptide from 5V6T [hot pink ribbon]; gray cartoon with hot pink side chain residues as sticks). Residues in the peptide‐binding cleft are labeled. All stick representation is colored by heteroatom (O = red, N = blue)
Although we had added a peptide matching the GAIP sequence, a high‐affinity human GIPC PDZ target, during crystallization, we were surprised to see that our crystal structure lacked the bound peptide. Instead, mbGIPC was interacting with the C‐terminal tail of a molecule related by symmetry (Figure S1B). This is a common mode of co‐crystallization for PDZ domains and ligands, for example, in the NHERF1 PDZ1 structure that is bound to the C‐terminal sequence of cystic fibrosis transmembrane conductance regulator (CFTR), as well as others. 29 , 56 The NHERF1 PDZ1‐CFTR example is distinct from our structure; however, in that, the C‐terminus of mbGIPC is not a Class I PDZ‐satisfying motif (sequence: KSFDEI). In our structure, which we will refer to as mbGIPCSFDEI, we see that the P0 Ile is accommodated by a hydrophobic pocket, as expected in Class I PDZ interactions. However, the conserved αB‐1 H157 residue is forming hydrogen bonds with the Asp in the P−2 position (distance: 2.6 Å), as well as the Ser in the P−4 position (2.8 Å) (Figure S1C).
In order to determine if this interaction is a crystal artifact, we created a truncated mutant, mbGIPCtrunc, lacking the final 7 residues of our original construct (or K181Δ), and calculated binding affinities for human GIPC targets using fluorescence polarization. We first measured the binding affinity of mbGIPCtrunc for a decameric fluorescent reporter peptide matching the sequence of GAIP (F*‐QGPSQSSSEA, where F* = FITC or fluorescein isothiocyanate), calculating a K D = 0.29 ± 0.02 μM in a quadruplicate experiment (Figure S1D). Next, we determined the affinities of a number of human GIPC PDZ targets using competition experiments, including decameric peptides of the C‐termini of GAIP (QGPSQSSSEA), tyrosinase‐related protein 1 (TYRP1, sequence: KLQNPNQSVV), and the β‐1 adrenergic receptor (B1AR, sequence: RPGFASESKV) (Table 2, Figure 2B). Notably, the GAIP peptide provides insight into how mbGIPC binds another human GIPC target, Plexin‐D1, which was previously crystallized in a Plexin D1/hGIPC/myosin V1 complex structure and that ends in the same “SEA” sequence. 30 Experimental protocols were based on previously described methods, and are described in more detail in the Materials and Methods. 38 , 55 , 57 , 58
TABLE 2.
Binding affinities of mbGIPCtrunc PDZ domain
| K i (μM) | ||
|---|---|---|
| Sequence | mbGIPC | |
| GAIP | QGPSQSSSEA | 0.23 ± 0.04 |
| TYRP1 | KLQNPNQSVV | 3.3 ± 1.0 |
| B1AR | RPGFASESKV | 8.5 ± 4.6 |
| mbGIPC C‐term | VEPKKSFDEI | >1,000 |
The binding affinities of mbGIPCtrunc PDZ for human GIPC PDZ targets suggest a large degree of conservation in selectivity determinants. Specifically, the affinity of mbGIPC PDZ for GAIP is 0.23 μM, despite a BLASTP search revealing no obvious GAIP homolog in M. brevicollis. 59 This result also suggests a minimal effect of the fluorescein moiety in binding. The binding affinities for TYRP1 and B1AR are ~10× and 20× worse, respectively (Figure 2B). These values are still relatively high‐to‐average affinity, as compared to typical PDZ domain interactions, which can range from the nanomolar to hundreds of micromolar range, but are centered around 1–30 μM. 11 , 48 , 50 Neither TYRP1 nor B1AR have clear homologues in M. brevicollis, according to BLASTP. 59 Notably, a competition experiment with a decameric peptide matching the C‐terminal sequence of our original construct (“SFDEI” sequence: VEPKKSFDEI) revealed little to no binding, defined here as a K i > 1,000 μM (Table 2, Figure 2B). Thus, we concluded that the binding interaction in our original structure was a crystal artifact.
In order to investigate the stereochemistry of a peptide binding interaction with mbGIPC that is not an artifact of crystallization, we mutated the final five residues of our original construct to those matching B1AR (mbGIPCB1AR; C‐terminal sequence: SESKV), GAIP (mbGIPCGAIP; SSSEA), and TYRP1 (mbGIPCTYRP1; NQSVV) (Table S1A). Previous work from ourselves and others suggests that the P−5 position is an important selectivity determinant in some PDZ domains. 37 , 38 However, we chose to keep this residue a lysine in our new constructs, due to crystal lattice contacts made by the lysine side chain, suggesting it may be important for crystallization (Figure S1E).
All three complexes successfully crystallized in the same space group as mbGIPCSFDEI and we determined crystal structures of mbGIPCB1AR and mbGIPCGAIP. The overall conformations of these structures to each other, as well as to the mbGIPCSFDEI structure, were very similar, with pairwise structural alignment RMSD values ≤ 0.21 Å for ~350 main chain atoms (Figure 2C). We were unable to fully refine the mbGIPCTYRP1 structure despite a successful molecular replacement solution, due to anisotropic data and relatively low resolution, compared to the others. Partial refinement (R work/R free (%) = 24.3/28.9) shows clear peptide‐specific density, confirming that this sequence interacts with mbGIPC in a manner that is consistent with PDZ domain peptide binding (Figure S1F). However, our structural analyses of the mbGIPC and human GIPC PDZ domains will be limited to the mbGIPCB1AR and mbGIPCGAIP structures.
Our mbGIPC structures share high structural similarity with the human GIPC PDZ domain. Structural alignment of main chain atoms between the mbGIPCB1AR and hGIPC PDZ domain (PDB ID: 5V6T) is 0.607 Å over 299 main chain atoms. This human GIPC PDZ structure was crystallized with the intracellular region of Plexin‐D1 (C‐terminal sequence: CYSEA) and the structures confirm that the peptide binding clefts of mbGIPC and human GIPC are very well conserved, with only two conservative substitutions (using human GIPC numbering): T148S and R159K (Figure 2D). We were unable to purify soluble human GIPC PDZ in our lab, despite testing multiple constructs (including using a SUMO‐tag), but our data strongly suggests that the binding affinities would be similar between these domains.
2.2. Structure characterization of mbSHANK1 PDZ
We previously compared binding affinities for another M. brevicollis PDZ domain, that of mbSHANK1 (UniProt ID: A9V7E4_MONBE), a protein that is homologous to human SHANK1 (Figure S2A). 55 In this work, we also created a homology model of mbSHANK1 PDZ using SwissModel and predicted stereochemical differences in the peptide binding pockets between these two proteins, specifically in those residues that interact with the P−3 position. 55 Here, we expand that investigation by presenting the crystal structure of mbSHANK1 PDZ (Figure 3A, Table S1B).
FIGURE 3.

The crystal structure of the mbSHANK1 PDZ domain. (a) The mbSHANK1 structure (blue cartoon, peptide in stick representation) is similar to a previously reported homology model (gray cartoon), RMSD = 0.668 Å over 276 main chain atoms. 55 The black arrows highlight the differences in the flexible βB‐βC loop. All sticks are colored by heteroatom (O = red, N = blue) and the peptide positions are labeled. (b, c) The interactions of the mbSHANK1 PDZ domain (gray cartoon, with side chains as sticks) with the F*‐GIRK3 peptide (blue sticks, interchangeably referred to as “GIRK3” peptide, since the fluorescein moiety is unresolved in the crystal structure) is characteristic of PDZ–peptide interactions. Measurements between interacting residues in the peptide‐binding cleft are labeled. The GIRK3 peptide is labeled and the sequence included in the figures is in (b). There is an additional peptide residue resolved in the crystal structure, the P−6 Pro, but it does not make interactions with mbSHANK1 PDZ. (d) Phylogenetic tree showing the relationship of a number of SHANK and SNX27 PDZ domain sequences from 11 organisms. SNX27 sequences are colored red. The mbSHANK1 and mbSNX27 sequences sit at the branch point of the other SHANK and SNX27 sequences
The protein mbSHANK1 PDZ was expressed and purified as previously described. 55 Crystallization of this protein in complex with a fluoresceinated peptide matching the C‐terminus of GIRK3 (F*‐GIRK3, sequence: F*‐LPPPESESKV) is described in the Materials and Methods and Supplementary Information. We collected data to a high resolution of ~2.2 Å; however, phasing by molecular replacement and structure refinement proved challenging. We employed an iterative Rosetta modeling approach coupled with Phenix in order to determine a molecular replacement solution with high confidence, as described in detail in the Supplementary Information. 60 , 61 , 62 , 63 Our refinement difficulties were due to a large degree of anisotropy in the diffraction data. Specifically, the high‐resolution limit along the a* and b* directions (2.2 Å) was substantially higher than that along the c* direction (3.4 Å). We were ultimately able to refine this model by truncating and scaling the reflections file appropriately, using the UCLA‐DOE Diffraction Anisotropy Server. 64 Crystallization attempts with a fluorescent β‐PIX peptide (sequence: F*‐NDPAWDETNL) were unsuccessful, despite binding mbSHANK1 PDZ with much higher affinity (K D = 7.3 μM for mbSHANK1 PDZ, and 5.1 μM for SHANK1 PDZ), as previously reported (Table 3). 55 For comparison, previous attempts to get K D values of mbSHANK1 and human SHANK1 PDZ domains with the F*‐GIRK3 peptide were incomplete, with estimates of affinities >1,000 μM for each (data not shown). We were also unable to grow crystals of mbSHANK1 in the apo form or following incubation with nonfluorescent versions of either the β‐PIX or GIRK3 peptides.
TABLE 3.
Binding affinities of SHANK1, SNX27, mbSHANK1, and mbSNX27 PDZ domains
| K i (μM) | IC50 (μM) | ||||
|---|---|---|---|---|---|
| Sequence | SHANK1 | mbSHANK1 | SNX27 | mbSNX27 | |
| BPIX | NDPAWDETNL | 20 ± 4.0 | 13 ± 4.7 | 0.26 ± 0.2 | 6.1 ± 4.0 |
| GIRK3 | LPPPESESKV | 1,070 ± 380 | 960 ± 160 | 1.7 ± 0.6 | 3.5 ± 0.2 |
| mGluR1 | RYKQSSSTL | 90 ± 11 | 1,020 ± 260 | 180 ± 60 | >1,000 |
| A9UP44 | EDTNQSESRL | 34 ± 10 | 44 ± 23 | 0.99 ± 0.7 | 2.9 ± 2.7 |
| A9UXE1 | ANPIQDETAL | 30 ± 5.0 | 72 ± 14 | 0.32 ± 0.2 | 5.9 ± 6.4 |
| A9V7Z4 | GTSLEDETAL | 9.8 ± 3.1 | 39 ± 19 | 0.19 ± 0.1 | 1.5 ± 0.9 |
Note: SHANK1 and mbSHANK1 PDZ domain measurements are previously published, with the exception of the GIRK3 peptide. 55
The crystal structure of mbSHANK1 bound to F*‐ GIRK3 is structurally very similar to our previously determined homology model. 55 The overall RMSD of these two structures is 0.668 Å over 276 main chain atoms, with the largest discrepancy occurring in the flexible βB‐βC loop (Figure 3A). In our structure, we see noncovalent interactions between T471 and the side chains of the P−4 Ser and P−5 Glu residues, as well as the P−5 Glu carbonyl, which may have helped to stabilize the βB‐βC loop for crystallization and may explain why this complex crystallized despite a relatively low binding affinity, although six residues of the loop are disordered in our structure (Figure 3B,C). In addition, we see electrostatic interactions between D488 and the P−1 Lys, as well as H517 and R518 with the P−2 Ser (Figure 3C).
In our previous work and based on our mbSHANK1 homology model, we hypothesized that the modest increase in affinity for β‐PIX by mbSHANK1 PDZ (K i = 13 μM vs. 20 μM for human SHANK1 PDZ) was due to an additional arginine residue that was located near the P−3 position, and, we figured, positioned to interact directly with the P−3 Glu. 55 Interestingly, our experimental structure reveals that neither of the arginine residues in the vicinity are interacting with the P−3 Glu of GIRK3. However, we do see that the sequence and length of the βB‐βC loop, which directly interacts with the peptide P−4 Ser and P−5 Glu residues in our structure, varies quite dramatically: the residues of the 11‐residue loop for mbSHANK1 PDZ are not conserved at all with those of the 18‐residue loop of human SHANK1 PDZ. It is unclear how these loops may differentially interact with the P−4 Asp and P−5 Trp of β‐PIX, but otherwise, the crystal structure confirms that the peptide‐binding clefts are generally conserved. 55
When we ran our initial BLASTP search for SHANK1 PDZ homologues in M. brevicollis, the top two sequence hits were relatively close in sequence identity: A9V7E4_MONBE, with 34% sequence identity over 90 residues, as well as A9URU5_MONBE, with 38% sequence identity over 77 residues (domain boundaries for human SHANK1 PDZ [residues 663–757] as defined by UniProt) (Table 1). Sequence alignments using the full‐length A9V7E4_MONBE protein and the human proteome confirmed its homology to the SHANK protein family, specifically due to the additional presence of ankyrin repeat domains, as well as SH3 and SAM domains. 45 , 65 Sequence alignments using the full‐length A9URU5_MONBE sequence and human proteome suggested that it is a homologue of sorting nexin‐27 (SNX27), with 25% sequence identity over 96% of the full‐length protein, and 46% sequence identity over 92 PDZ domain residues (Table 1). Therefore, we will refer to A9URU5_MONBE as mbSNX27.
We were interested in the relationship between the PDZ domain sequences in these four proteins due to the similar sequence similarities between mbSHANK1 and mbSNX27 PDZ domains and human SHANK1 PDZ. Therefore, we conducted a phylogenetic tree analysis of 10 PDZ sequences for SHANK1 or SNX27 homologues in a variety of organisms, as well as the PDZ domain sequences of mbSHANK1 and mbSNX27 (Figure 3D). Because we see that mbSHANK1 and mbSNX27 sit at the branch point between the SNX27 and SHANK1 sequences, we expressed and purified SNX27 and mbSNX27 PDZ domains, as described in the Materials and Methods, and compared binding affinities for all four domains using fluorescence polarization to six decameric peptides matching the C‐termini of: β‐PIX, GIRK3, and mGluR1, as well as A9UP44_MONBE, A9UXE1_MONBE, and A9V724_MONBE, which were previously identified as potential M. brevicollis targets of mbSHANK1 (Figure S2B–D, Table 2). 55
Our results reveal that overall, peptides which bind human SNX27 PDZ with relatively high affinity also bind mbSNX27 PDZ very strongly (Table 3). As previously reported, this is also true with human SHANK1 and mbSHANK1 PDZ domains (Table 3). 55 However, in all cases, the exact order of highest to lowest affinity peptides is distinct, perhaps reflective of single substitutions in the peptide binding cleft. We described the differences for SHANK1 and mbSHANK1 PDZ domains above and previously. 55 A homology model of mbSNX27 PDZ, using SNX27 PDZ as a template (PDB ID: 6SAK) contains the following substitutions at residues that may interact with the peptide (numbering based on SNX27): R58K, V61T, A83H, and R122I (Figure S2E). In addition, while the mGluR1 peptide binds the human PDZ domains with moderate affinity, it shows no measurable affinity for either of the M. brevicollis PDZ domains. Taken together, the resulting binding affinities are consistent with our central hypothesis that the target selectivity of PDZ lineages were set early in evolution, even in proteins that appear to be closely related to each other, for example, SHANK1 and SNX27 PDZ domains, based on overall sequence identity.
2.3. Structural characterization of PDZ domains in mbDLG
We also wanted to investigate multiple PDZ domains from a single M. brevicollis protein. Therefore, we experimentally determined the structures of PDZ2 and PDZ3 from UniProt ID A9UT73_MONBE (Table S1B). This protein is most closely related to the discs large family of human proteins, sequence identity values for the full‐length protein are as follows for: human Dlg1 (35.5%), Dlg2 (35.0%), Dlg3 (34.1%), and Dlg4 (30.5%). Therefore, we will refer to these PDZ domains as mbDLG‐2 and mbDLG‐3, respectively. The highest domain‐to‐domain identity for both mbDLG‐2 and mbDLG‐3 is to PDZ2 and PDZ3 of human Dlg2 (Figure S3A, Table 1).
Expression, purification, and crystallization of mbDLG‐2 and mbDLG‐3 were performed as described in the Materials and Methods and Supplementary Information, and following similar protocols as described in this work and elsewhere. 38 , 55 Interestingly, the sequence of mbDLG‐2 contains zero aromatic residues, and thus has an extinction coefficient at λ = 280 nm of 0. Therefore, protein quantification for crystallization was done via SDS‐PAGE analysis using BSA as a standard, as described in the Materials and Methods and Supplementary Information (Figure S3B). Because we were not confident in our precise determination of the mbDLG‐2 concentration, we chose not to calculate any binding affinities for this domain. However, sequence and structural similarities (described below) suggest that this domain will bind similar peptides as human Dlg2 PDZ2. For structure determination by X‐ray crystallography, the mbDLG‐2 protein was incubated with a peptide matching the final 10 residues of the HPV16 E6 oncoprotein (sequence: SSRTRRETQL) prior to crystallization and crystallized in two distinct space groups, differing in the ability to accommodate a peptide in the crystal lattice: P 21 21 21 and I 2 (which is related to C 2). 66 A structure of human Dlg2 PDZ2 (or DLG2‐2, PDB ID: 2BYG) was successfully used as a search model in molecular replacement for both crystal forms.
The mbDLG‐2 structure is very similar in both crystal symmetries. The overall RMSD is 0.153 Å for 300 main chain atoms (Figure 4A). We are unable to resolve an additional 6 C‐terminal residues in the orthorhombic (P 21 21 21) crystal, which form an α‐helix in our other monoclinic structure (described below), but there is positive density in the peptide‐binding cleft that likely corresponds to the HPV16 E6 peptide (Figure S3C). Iterative rounds of refinement after placing peptide residues into this density confirm that there may be multiple confirmations of the peptide within the pocket and that its occupancy is likely 0.50 or less. Therefore, we were not confident in modeling the peptide residues in the final structure. In the centered monoclinic (I 2) crystal, we are able to resolve the entire βA‐βB loop; however, crystal contacts with molecules related by symmetry are not compatible with peptide binding (Figure S3D). The major difference between these two structures is the location of the carboxylate‐binding loop, a shift that is consistent with carboxylate‐binding loop flexibility in a number of apo and peptide‐bound structures of human Dlg2 PDZ2 (Figures 4A and S3E). Finally, comparison with the peptide binding cleft of human Dlg2 PDZ2 (PDB ID: 4G69) reveals only two relatively conservative differences (using human Dlg2 PDZ2 numbering): N339S and K392R (Figure 4B).
FIGURE 4.

The crystal structure of the mbDLG‐2 PDZ domain. (a) Alignment of the two mbDLG‐2 structures, which crystallized in different space groups: I 2 (purple cartoon) and P 21 21 21 (gray cartoon). Overall RMSD = 0.153 Å for 300 main chain atoms. The biggest difference between the structures is a shift in the carboxylate‐binding loop, indicated by a black arrow. (b) The conservation between the peptide binding clefts of mbDLG‐2 (purple cartoon, with side chain residues as sticks) and human Dlg1 PDZ2 (PDB ID: 4G69, gray cartoon with side chain residues as sticks and peptide as ribbon and labeled). Residues in the peptide‐binding cleft are labeled. All stick representation is colored by heteroatom (O = red, N = blue)
Structure determination of the mbDLG‐3 PDZ domain was less straightforward, as described in the Supporting Information. However, we were ultimately able to determine a solution and refine it to a final R work/R free = 15.8/16.9 (Table S1B, Figure S4A). Sequence alignments of mbDLG‐3 with the human Dlg1‐4 PDZ3 domains suggest that the Class I‐determining αB‐1 histidine residue is a tyrosine in mbDLG‐3 (Figure S3A). Alignment of mbDLG‐3 to a peptide‐bound Dlg2 PDZ3 (PDB ID: 2HE2) structure, with an RMSD = 0.779 Å over 225 main chain atoms, confirms this class‐switching difference (Figure S4B). Critically, the P−2 Thr residue in the bound ETSV peptide sterically clashes with the tyrosine hydroxyl in mbDLG‐3 (Figure 5A). In general, the peptide binding cleft of mbDLG‐3 is the most dissimilar to that of its nearest homologs, Dlg2 and PSD‐95/Dlg4 PDZ3, with substitutions at four residues (numbering is for Dlg2 PDZ3): N434S, V436I, F448R, and H480Y (Figure 5B).
FIGURE 5.

The crystal structure of the mbDLG‐3 PDZ domain. (a) Substitution in the binding class‐determining αB‐1 residue between mbDLG‐3 (gray cartoon, with the side chain of Y305 as orange sticks) and human Dlg2 PDZ3 (Dlg2‐3, PDB ID: 2HE2; gray cartoon with the side chain of H480 as sticks). The P−2 Thr that interacts with H480 (gray sticks) in Dlg2‐3 sterically clashes with Y305 in mbDLG‐3 (red circle). The peptide (sequence: ETSV) is in sticks and colored by heteroatom, as labeled. (b) The conservation between mbDLG‐3 (gray cartoon, with side chain residues as orange sticks) and human Dlg2 PDZ 3 (or DLG2‐3, gray cartoon with side chain residues as sticks and peptide as ribbon). Residues in the peptide‐binding cleft are labeled. All stick representation is colored by heteroatom (O = red, N = blue). (c) Sequence alignment of Dlg1 PDZ3 and Dlg2 PDZ3 domains from multiple organisms reveals that the only PDZ domains with a tyrosine in the αB‐1 position are those from choanoflagellate species, Monosiga brevicollis and Salpingoeca rosetta. This position is highlighted with a black arrow, and sequence alignment coloring is by overall percentage identity (darker blue = higher % identity)
Despite this apparent class‐switching mutation, fluorescence polarization experiments to determine the affinity of mbDLG‐3 PDZ with a decameric peptide matching the C‐terminus of HPV18 E6 (F*‐RLQRRRETQV) or the same sequence with a P−2 Asp mutation (F*‐RLQRRREDQV, mutation in bold) revealed approximately threefold worse binding for the P−2 Asp‐containing peptide (Figure S4C). However, in both of these experiments, the estimated K D was >1,000 μM, suggesting little to no overall binding. It would be interesting to test the target specificity of mbDLG‐3 using a high throughput technique, for example, phage display or peptide arrays, in future experiments and/or to compare affinities with other known human Dlg4/PSD‐95 PDZ3 ligands, for example, CRIPT, Stargazin, or Neuroligin, among other Dlg PDZ3 proteins. 26 , 37 , 57
Finally, we investigated a number of Dlg sequences in various organisms to see if there are others with an H‐to‐Y substitution at the αB‐1 residue. None of the Dlg1 or Dlg2 (where available) sequences from Gallus gallus (chicken), Danio rerio (zebrafish), Callorhinchus milii (shark), Xenopus tropicalis (frog), Caenorhabditis elegans (worm), Anolis carolinensis (lizard), or Nematostella vectensis (sea anemone) contain a tyrosine residue at the αB‐1 position (Figures 5C and S4D). In addition, sequence alignments with human PDZ domains that do contain an αB‐1 Tyr (using UniProt domain boundaries) all reveal significantly less sequence identity with mbDLG‐3 than do the alignments to the human Dlg1‐4 PDZ3 domains (44–49% over 62–78 residues); these include CYTIP (28% over 25 residues), DPTOR (25% over 75 residues), NHRF3‐4 (29% over 75 residues), PDZIP‐3 (29% over 75 residues), RADIL (34% over 77 residues), RHG21 (29% over 112 residues), and RHG23 (28% over 71 residues). Taken together, the αB‐1 Tyr in mbDLG‐3 appears to be a substitution unique to this Dlg protein in choanoflagellates, although the molecular basis for this change is not known. Future experiments to determine cellular targets of mbDLG‐3 in M. brevicollis or S. rosetta would be interesting in order to decipher the functional consequence of this change.
2.4. Identification of 178 PDZ domains in Monosiga brevicollis proteome
Finally, we wanted to determine the number of PDZ domains in the Monosiga brevicollis proteome. Previous work from 2010 identified 113 PDZ domains in 58 genes, using a combination of sequence alignment (i.e., HMMER 2.3.2) and evolutionary approaches (i.e., EvolMap); these results included human orthologs to well‐studied human proteins such as Dlg, GIPC, and SHANK, the PDZ domains that are investigated in this work. 16 However, more recent annotations from the UniProt database reveal 169 PDZ domains in 70 unique proteins. 67 , 68 Automatic annotation in UniProt is done using the EMBL InterPro system, as well as UniRule and the Statistical Automatic Annotation System, which incorporate predictive models based on a number of sequence databases and machine learning. 67 , 68 , 69 , 70
For our search, we used a BLASTP‐based approach, comparing all sequences in the M. brevicollis proteome to our 272 previously curated human PDZ domain sequences. 26 We then wrote a Python‐based program to filter the results by alignment length and to identify the top “hit” by sequence identity for each of our putative PDZ domains (Table S2). Additional details of our search and filtering protocol are available in the Supporting Information. Using homology and Rosetta modeling, we were able to further investigate our true and false positives, as described below. Overall, our approach identified a total of 180 PDZ domains in 77 proteins within the M. brevicollis proteome (Table S2). We searched for human homologues for all of these proteins using the full‐length sequences and BLASTP, finding only about 20 of the proteins align to human proteins over ≥ 50% of the protein sequence (Table S2). While this does not confirm that these proteins are true homologues, it suggests that a large majority of the PDZ domains in M. brevicollis are present in unique protein architectures. In addition, we concluded that two PDZ domains annotated by UniProt are missing critical components of the conserved structural fold and are likely not PDZ domains, as described below. Our list therefore includes a total of 178 PDZ domains in M. brevicollis, including 11 previously unidentified PDZ domains.
2.5. Rosetta and homology modeling of M. brevicollis PDZ domains
To validate our M. brevicollis PDZ domains, we first manually curated all sequence hits of alignment lengths of 65 or higher and defined sequences as “borderline” if it was questionable whether or not they were of PDZ domains (Table S2). More information on this step is in the Supporting Information. Then, we used homology modeling with various PDZ templates and/or modeling using Rosetta to see if the conserved PDZ structural fold was consistent with each borderline sequence. 71 , 72 , 73 , 74 , 75 , 76 All borderline sequences were verified using the homology modeling application in Rosetta. 77 To improve sidechain packing, the predicted models were subjected to an all‐atom refinement protocol using Rosetta. 78 , 79 , 80 This approach identified a number of new candidate PDZ domains, defined as containing hydrophobic residues on the interior and hydrophilic residues on the exterior of the protein, as well as most of the conserved secondary structure elements, and positioning of the “GLGF” sequence in the correct location at the N‐terminal end of the βB strand. 26 We colored the structures by hydrophobicity of each residue, using color_h.py for PyMOL, where a darker red color indicates increased hydrophobic character, and compared to the other choanoflagellate structures presented in this work (Figures 6 and S5). 81 Interestingly, one of our PDZ hits is a protein with a carboxylate‐binding loop sequence of “HWNL” (A9V1Y4); while an Asn residue in the third position of the “GLGF” loop would be a very rare occurrence, our previous analysis of human PDZ domains annotated in UniProt identified three human proteins, RAPGEF6, CAR11, and CAR14, with PDZ domains that have a Gln residue at this position (sequences: PLQF, TSQL, and LEQI, respectively), suggesting that an Asn can be accommodated. 26
FIGURE 6.

Structural modeling of novel choanoflagellate PDZ domains using Rosetta. For all, (left figure) a cartoon representation of the model, with the αB‐1 residue and carboxylate‐binding loop side chains shown as sticks, as labeled, (middle) a cartoon rainbow‐colored depiction that highlights conserved secondary structural elements, and (right) a cartoon hydrophobicity‐colored (with side chain sticks, residues colored by hydrophobicity) structure. The hydrophobicity plots confirm that the PDZ fold is reasonable for each of these sequences (i.e., hydrophobic residues in the protein core (indicated by darker red color) and hydrophilic/polar residues on the surface (indicated by pink or white color)). Rosetta models are shown for A9UR42 (a, cyan sticks in left figure), A9UUD9 (b, green sticks), A9V4L7 (c, pink sticks), and A9V7G8 (d, yellow sticks)
We also investigated four sequences that are annotated as PDZ domains by UniProt, but were flagged as borderline based on our criteria (Figure S6, Table S2). The A9UPI8 PDZ18 Rosetta model contains two lysine residues near the interior of the protein, but these side chains may be in different orientations in an experimental structure (Figure S6A). In addition, neither our SwissModel nor Rosetta models include the C‐terminal residues of A9V109 PDZ1, after αB, and it is unclear if βE would properly form (Figure S6B). Based on our analyses, A9UPI8 PDZ18 (residues 2,222–2,300) and A9V109 PDZ1 (residues 537–595) are likely PDZ domains, although further experiments would need to confirm these results. For the other two domains, the third PDZ domain of A9V625 is defined as only 40 residues in UniProt (499–539), and our model suggests that while a longer sequence may adopt the PDZ fold, the carboxylate‐binding loop sequence is “NQRC,” which is highly unusual (Figure S6C). Therefore, we conclude that A9V625 PDZ3 is likely not a true PDZ domain. We also conclude that A9V7P9 does not contain a true PDZ domain. Here, our PDZ sequence alignment program only identifies a single alignment for this protein, even without length constraints considered: ZO1‐2 at 31.6% identical over 76 residues (residues 409–482), with an E‐value of 2.9 (Table S2). In addition, multiple attempts to determine homology or Rosetta models using residues ~380–470 or ~400–490 were unsuccessful, with a reasonable model reflecting a non‐PDZ fold (Figure S6D). Overall, our results suggest that a targeted BLASTP approach with a query consisting of all of the sequences from a human domain family, in combination with Rosetta and/or homology modeling, is a good approach for identifying protein domains in the proteomes of distantly related organisms.
3. CONCLUDING REMARKS
Structural comparison of the peptide‐binding clefts and peptide interactions of homologous domains from organisms related by hundreds of millions of years of evolution has the potential to provide insight into signaling networks in those species. Here, we chose to use structural biology and biochemistry to investigate three PDZ domain‐containing proteins that are important in human neuronal signaling in a species of choanoflagellates, our closest nonmetazoan ancestors. Many of the human targets of SHANK1, GIPC1, and the Dlg family (including PSD‐95) are either not conserved in choanoflagellates or do not contain PDZ binding sequences. 55 However, we find that the peptide binding‐cleft residues and binding affinities for human and/or choanoflagellate peptides are generally conserved in these related domains. Specifically, we see strong binding affinity correlations in SHANK1 and mbSHANK1 versus SNX27 and mbSNX27 PDZ domains, despite binding cleft substitutions in both cases. A notable exception is in mbDLG PDZ3, where we see a number of amino acid substitutions that may directly affect target specificity, although additional work needs to be done. Presumably, this is due to unique signaling pathways in choanoflagellates, and future work should use computational or experimental methods to identify endogenous targets of mbDLG PDZ3, as well as the other studied PDZ domains. 55 , 82
Our structures of four unique M. brevicollis PDZ domains provide the first structural determination of choanoflagellate PDZ domains to our knowledge. It would be interesting to look at DLG, GIPC, and/or MAGUK family PDZ protein homologues from other ancestral unicellular eukaryotes, for example, C. owczarzaki in future work. 16 , 83 , 84 Specifically, the C. owczarzaki GIPC homologue (UniProt ID A0A0D2U0D0) shares 38% sequence identity over 77 PDZ residues, as compared to the human GIPC PDZ domain. It is 51% identical over 75 residues, as compared to mbGIPC. In addition, the single PDZ domain of the C. owczarzaki DLG protein (D5HP87) is 42% identical over 85 residues to human DLG1 PDZ1 and 35% identical over 75 residues to mbDLG‐1, via BLASTP alignments and using UniProt domain boundaries. Notably, C. owczarzaki does not contain a SHANK protein homologue.
Our comparisons with known human PDZ domain structures, as well as homology and Rosetta modeling confirm that because the PDZ domain fold is well conserved, it is possible to get an initial idea of a PDZ domain structure without experimental structure determination. For example, we can use homology modeling with the closest‐related human PDZ domains by sequence identity, to propose the structures of mbDLG‐1 (template: INADL‐8 [PDB ID: 2DM8], 49.3% identical over 69 residues) and mbSNX27 (template: SNX27 (PDB ID: 6SAK), 45.7% identical over 92 residues) (Figures 7A,B and S2E, Table S2). We can also propose the structures of PDZ domains that are in proteins with no obvious relation to human proteins, aside from the presence of one or more known domains, for example, A9VDV9_MONBE, which contains 1 PDZ and 1 Kinase domain (according to UniProt). The highest sequence identity of this protein to any human protein, using BLASTP, is to the ROR2 tyrosine kinase receptor, at 26.53% over the kinase domain residues. The human ROR2 receptor does not contain a PDZ domain. As a template, we used the PARD3‐2 PDZ domain structure (PDB ID: 2KOM), which is 38.2% identical to the PDZ domain in A9VDV9_MONBE over 68 residues (Figure 7C). This is of particular interest considering that a large majority of the M. brevicollis PDZ domains are present in proteins with unique domain architecture, as compared to human PDZ proteins (Figure 7D). For example, there are no human PDZ domains in proteins that also contain pTyr‐binding SH2 signaling domains; however, in M. brevicollis, there are five PDZ‐and‐SH2‐containing proteins (Figure 7D, Table S2). 26 In addition, there are no human PDZ domain‐containing proteins with more than one guanylate kinase domain, but in M. brevicollis, there are two, including a protein with four guanylate kinase domains, as annotated by UniProt (Figure 7D, Table S2). 26 We hypothesize that these types of analyses can be applied to PDZ domains from multiple organisms related by evolution.
FIGURE 7.

Homology modeling and domain architectures of select M. brevicollis PDZ domains. Our results suggest that homology modeling is a reasonable tool for predicting the structure of PDZ domains in humans or other organisms, with no experimentally determined structures. Here, we show homology models for (a) mbDLG‐1 (template PDB ID: 2DM8), (b) mbSNX27 (template: 6SAK), and (c) A9VDV9_MONBE (template: 2KOM). For all, (left figure) a cartoon representation of the model, with the αB‐1 residue and carboxylate‐binding loop side chains shown as sticks, as labeled, (middle) a cartoon rainbow‐colored depiction that highlights conserved secondary structural elements, and (right) a cartoon hydrophobicity‐colored (with side chain sticks, residues colored by hydrophobicity) structure. The hydrophobicity plots confirm that the PDZ fold is reasonable for each of these sequences (i.e., hydrophobic residues in the protein core (indicated by darker red color) and hydrophilic/polar residues on the surface (indicated by pink or white color). Coloring for the left‐side figures are: mbDLG‐1 (blue), mbSNX27 (gold), and A9VDV9 (hot pink). All sticks are colored by heteroatom: O = red, N = blue, S = yellow. (d) Protein domain architecture schematics for a number of M. brevicollis PDZ domain‐containing proteins, based on UniProt annotations with one exception (see below). Colors represent different domains, black = PDZ domains (numbered), light blue = SH3, purple = guanylate kinase (labeled Guan_Kinase), yellow = ankyrin repeat region (ANK_region), pink = tyrosine phosphatase (Tyr_Phosphatase), light green = SH2, orange = SAM, and dark red = FERM. Note: A9V1Y4 PDZ1 is a novel PDZ domain (not annotated as such in UniProt) based on Rosetta modeling (Figure S5)
Protein–protein interactions that involve PDZ domains act as critical nodes for signaling and trafficking pathways in a cell. It is clear that this is true in differentiated cells, such as those in complex multicellular organisms, as well as in single‐celled organisms. Deciphering the PDZ‐mediated interactions in choanoflagellates may elucidate important characteristics of the selectivity determinants and the evolution of this important peptide‐binding domain. Furthermore, there are a number of proteins and protein architectures that contain PDZ domains in choanoflagellates that are not conserved in humans. Future work could investigate how these proteins, for example A9VDV9 mentioned above, act in signaling pathways in M. brevicollis and how this provides insight into the transition from uni‐ to multicellular life on Earth. Taken together, we suggest that investigating the structure–function relationship for individual domains in both uni‐ and multicellular organisms is an important component in building a holistic understanding of the signaling networks of an organism and in understanding the origin of multicellularity.
4. MATERIALS AND METHODS
4.1. Protein expression and purification
Expression and purification of all human and M. brevicollis PDZ domains followed a similar protocol as previously reported for mbSHANK1 PDZ. 55 His‐tagged versions of the PDZ domains were inserted into the pET28a (+) vector by gene synthesis (GenScript) and expressed in Escherichia coli BL21 (DE3) cells. Cells were lysed using sonication and immobilized metal‐affinity chromatography (5 mL HisTrap [GE Healthcare]) was used to purify proteins from the clarified supernatant. The wash buffer used was: 25 mM imidazole pH 8.5, 25 mM Tris pH 8.5, 25 mM NaCl, 10% (v/v) glycerol, and 0.25 mM TCEP, and elution buffer was: 400 mM imidazole pH 8.5, 25 mM Tris pH 8.5, 50 mM NaCl, 10% (v/v) glycerol, and 0.5 mM TCEP. With the exception of human SHANK1 and SNX27 and mbSNX27 PDZ domains, the protein was then dialyzed in dialysis buffer (same as gel filtration buffer described below), and incubated with PreScission protease to cleave off the His‐tag. The cleaved protein was then purified using a second nickel column with the wash and elution buffers described above. All proteins were further purified on a Superdex S75 column, using gel filtration buffer (25 mM Tris pH 8.5, 125 mM NaCl, 10% [w/v] glycerol, and 0.5 mM TCEP). Proteins were concentrated using Amicon centrifugal concentrators (3 MWCO). Concentrated proteins used in fluorescence polarization assays were flash frozen in liquid nitrogen for storage at −80°C. Proteins used for crystallization were stored at 4°C.
For all proteins except mbDLG‐3, protein was quantitated with the A280 and the experimental extinction coefficient values of: 1490 cm−1*M−1 for all mbGIPC PDZ domains (including mbGIPCSFDEI, mbGIPCtrunc, mbGIPCB1AR, mbGIPCGAIP, and mbGIPCTYRP1), 8,480 cm−1*M−1 for SHANK1 PDZ, 11000 cm−1*M−1 for mbSHANK1, 9,970 cm−1*M−1 for mbSNX27 PDZ, 2980 cm−1*M−1 for SNX27 PDZ, and 5,960 cm−1*M−1 for mbDLG‐3 PDZ. For mbDLG‐2 PDZ, protein was quantified using an SDS‐PAGE gel with BSA standards (Figure S3B). Due to the gel being overloaded, the exact concentration of mbDLG‐2 PDZ used for crystallization was not determined, but we approximated the concentration to be 12.5–25 mg/mL.
4.1.1. Crystallization, data collection, and structure determination
Prior to crystallization, all PDZ domains were dialyzed into a crystallization buffer (25 mM NaCl, 10 mM Hepes pH 7.4) for 2–4 hr. The protein concentrations used for crystallization were as follows: mbSHANK1 (6 mg/mL), mbGIPCSFDEI (23.4 mg/mL), mbGIPCB1AR (17 mg/mL), mbGIPCGAIP (22.8 mg/mL), mbGIPCTYRP1 (27.3 mg/mL), mbDLG‐2 (see above, 12.5–25 mg/mL), and mbDLG‐3 (10 mg/mL). Peptides were added at a final concentration of 1 mM and incubated with protein for 1‐hr prior to crystallization for the following: F*‐GIRK3:mbSHANK1 PDZ and HPV16 E6:mbDLG‐2 PDZ. All initial crystallization conditions were identified using the PEG/Ion screen (Hampton Research). The crystallization conditions of crystals used for data collection were: mbGIPCSFDEI (100 mM ammonium tartrate dibasic pH 7.0, 12% [w/v] PEG 3350), mbGIPCB1AR (200 mM sodium malonate pH 7.0, 20% [w/v] PEG 3350), mbGIPCGAIP (4% [v/v] Tacsimate pH 4.0, 12% [w/v] PEG 3350), mbGIPCTYRP1 (100 mM DL‐Malic acid pH 7.0, 12% [w/v] PEG 3350), mbDLG‐2 in I 2 space group (100 mM sodium malonate pH 5, 12% [w/v] PEG 3350), mbDLG‐2 in P 21 21 21 space group (200 mM sodium malonate pH 5, 20% [w/v] PEG 3350), and mbSHANK1 (250 mM NaCl, 100 mM Bis‐Tris pH 5.5, 32% [w/v] PEG 3350).
For data collection, crystals were transferred into cryoprotectant buffer. For mbSHANK1, this was well solution plus 20% (w/v) glycerol. For other proteins, 15% (w/v) glycerol was added directly to the respective PEG/Ion screen solution. The crystals were flash‐cooled by plunging into liquid nitrogen. Data were collected at the Advanced Light Source (ALS) at the Lawrence Berkeley National Laboratory (LBNL) on beamline 5.0.1, at λ = 0.977410 Å over 360°, with Δϕ = 0.25° frames and an exposure time of 0.5 s per frame. Data were processed using the XDS package (Table S1). 85 , 86 , 87 Molecular replacement was performed using Phenix with the following search models: mbGIPC (PDB ID: 5V6B, human GIPC), mbSHANK1 (de novo structural model using the Robetta server and Rosetta optimization as described in the Supporting Information), mbDLG‐2 (2BYG, human Dlg2 PDZ2), and mbDLG‐3 (6QJF, human PSD‐95 PDZ3). 60 , 88 , 89 Phenix Autobuild was used to generate a better starting model for refinement for mbDLG‐3 PDZ. 90 Refinement was performed using Phenix, manual refinement was done using Coot, and model geometry was assessed using Molprobity and the PDB validation server. 60 , 88 , 91 , 92 , 93 , 94 , 95 All crystal data and refinement statistics are in Table S1.
Additional details regarding the structure determination of mbSHANK1 and mbDLG‐3 PDZ domains are in the Supporting Information. PDB accession codes for the structures presented here are: 6X1X (mbGIPCSFDEI), 6X20 (mbGIPCB1AR), 6X22 (mbGIPCGAIP), 6X23 (mbSHANK), 6X1P (mbDLG‐2, spacegroup I 2), 6X1R (mbDLG‐2, spacegroup P 21 21 21), and 6X1N (mbDLG‐3).
4.1.2. Binding assays by fluorescence polarization
Fluorescence polarization assays were performed as previously described. 25 , 38 , 55 , 58 Replicate experiments were performed to determine the K D values of mbGIPCtrunc PDZ (N = 4) for the fluorescence peptide, F*‐GAIP (FITC‐QGPSQSSSEA), and SNX27 PDZ (N = 3) for the fluorescent peptides, F*‐β‐PIX (FITC‐NDPAWDETNL) and F*‐GIRK3 (FITC‐LPPPESESKV) (Figures S1D and S2B). For mbGIPCtrunc PDZ, we determined a K D value of 0.29 ± 0.02 μM for F*‐GAIP. For SNX27, we determined a K D value of 0.022 ± 0.007 μM for F*‐β‐PIX and 0.327 ± 0.135 μM for F*‐GIRK3. Limited yield of purified mbSNX27 resulted in the inability to calculate K D values for that protein; thus, we reported IC50 values for our unlabeled peptides (Figure S2D, Table 3). We also conducted K D experiments of mbDLG‐3 PDZ with the peptide, F*‐HPV18 E6 (FITC‐RLQRRRETQV), and containing a P−2 Asp residue, in bold (FITC‐RLQRRREDQV) (Figure S4C). All calculated values were >1,000 μM, although the P−2 Asp bound with an affinity approximately threefold lower (K D = 1,340 ± 50 μM and 4,300 ± 800 μM, respectively).
Competition experiments: The final protein concentrations for K i experiments were equal to: 0.6 μM for mbGIPC PDZ, 0.05 μM for SNX27 PDZ, and 5 μM for mbSNX27 PDZ. For SHANK1 and mbSHANK1 PDZ K i experiments with the GIRK3 peptide, we used 10 μM protein (based on previously determined K D values of 7.3 μM for mbSHANK1 PDZ, and 5.1 μM for SHANK1 PDZ). 55 Competition experiments were performed in triplicate, using the following reporter peptides at 0.03 μM final concentration: mbGIPC (F*‐GAIP), SNX27 (F*‐β‐PIX) and mbSNX27 (F*‐β‐PIX). Binding affinities for K i experiments were determined using SOLVER and IC50 values using Kaleidagraph, as previously described (Figures 2B and S2C,D). 38 , 57 , 58
4.2. Monosiga brevicollis proteome search and modeling by Rosetta
The M. brevicollis proteome search was performed using BLASTP from the command line, while filtering and identification of the top sequence identity results were done using a script we wrote in Python. 59 , 96 The query sequences were the 272 previously curated PDZ domains in the human proteome, and they were searched against the M. brevicollis proteome. 26 For each identified borderline sequence, the protein sequence and associated template structure were provided as input into the Rosetta comparative modeling (RosettaCM) application. 77 Additional information on all materials and methods is in the Supporting Information.
CONFLICT OF INTEREST
The authors declare no competing interest.
AUTHOR CONTRIBUTIONS
Melody Gao: Data curation; formal analysis; investigation. Iain Mackley: Investigation. Samaneh Mesbahi‐Vasey: Formal analysis; investigation; methodology. Haley Bamonte: Investigation. Sarah Struyvenberg: Investigation. Louisa Landolt: Investigation. Nick Pederson: Investigation. Lucy Williams: Investigation. Christopher Bahl: Conceptualization; data curation; formal analysis; methodology; resources; supervision; visualization; writing‐review and editing. Lionel Brooks: Data curation; formal analysis; investigation; methodology; validation; writing‐review and editing. Jeanine Amacher: Conceptualization; data curation; formal analysis; funding acquisition; investigation; methodology; resources; supervision; validation; visualization; writing‐original draft; writing‐review and editing.
Supporting information
Data S1. Supplemental Materials and Methods
Figure S1 Crystal contacts in mbGIPCSFDEI structure and electron density for peptide in mbGIPCTYRP1. (A) Sequence alignment of human GIPC and mbGIPC PDZ domains. (B) The spatial relationship between mbGIPC (green cartoon) PDZ and a molecule related by symmetry (gray cartoon, mbGIPC') is shown. The last five residues of the mbGIPC'C‐terminal tail (sequence SFDEI) are shown in gray stick. (C) The binding of mbGIPC (green cartoon, H157 side chain in stick representation) to the C‐terminal tail of mbGIPC'(gray sticks and labeled) is unconventional and a crystal artifact. (D) The average fluorescence polarization isotherm is shown for mbGIPCtrunc PDZ and the F*‐GAIP peptide (at 30 nM), including the SD for each data point. This experiment was performed in quadruplicate and the calculated K D = 0.29 ± 0.02 μM. E, The K182 residue in mbGIPCSFDEI PDZ (green sticks) makes crystal lattice contacts with the main chain carbonyl atoms of R134 and V135 in a molecule related by symmetry, mbGIPC'(gray sticks). For this reason, we chose to keep a lysine in this position in our other mbGIPC constructs (mbGIPCB1AR, mbGIPCGAIP, and mbGIPCTYRP1). (F) Final refinement of mbGIPCTYRP1 proved challenging, so we did not deposit this structure in the Protein Data Bank. However, there is clear electron density for the C‐terminal sequence, NQSVV (2Fo‐FCmap in blue mesh and contoured at 1 s), and it is consistent with mbGIPC binding to the other C‐terminal sequences (GAIP and B1AR). Here, mbGIPC is in gray cartoon with side chain sticks in gray. The TYRP1 sequence is in green stick representation and labeled. For all, sticks are colored by heteroatom (O = red, N = blue).
Figure S2 Structural and biochemical characterization of mbSNX27. (A) Sequence alignment of human SHANK1 and mbSHANK1PDZ domains. Secondary structure elements are labeled by arrows (b‐strands) and wavy lines (a‐helices). (B) Example fluorescence polarization isotherms are shown for SNX27PDZ and the F*‐b‐PIX (black squares) and F*‐GIRK3 peptides (black circles, both reporter peptides at 30 nM). (C, D) Average fluorescence polarization displacement isotherms are shown for SNX27 PDZ (C) and mbSNX27PDZ (D). Titration curves are shown for the following peptides: b‐PIX (circles), GIRK3 (squares),and mGluR1 (diamonds), or choanoflagellate proteins A9UP44 (triangles), A9UXE1 (upside‐down triangles), and A9V7Z4 (gray circles). Error bars indicate SD from the mean for triplicate experiments. The reporter peptide used in both experiments was F*‐b‐PIX.(E) The conservation between mbSNX27 (gold cartoon, with side chain residues as sticks) and human SNX27 (PDB ID: 6SAK, gray cartoon with side chain residues as sticks)is shown. The peptide(GIRK3 sequence: ESESKV) is from an additional human SNX27 structure (3QE1) and is shown as gray ribbon and labeled. The RMSD value between the human SNX27 structures is 0.377 Å over 302 main chain atoms. The mbSNX27PDZ structure was made using SwissModel with 6SAK as a template. Residues in the peptide‐binding cleft are labeled. All stick representation is colored by heteroatom (O = red, N = blue).
Figure S3 Sequence alignment of Dlg proteins, mbDLG‐2 structures, and human Dlg2 PDZ2 structures. (A) Sequence alignment of PDZ domains from human DLG1‐4 and mbDLG proteins. (B) Because mbDLG‐2 does not contain any aromatic residues, and thus has an extinction coefficient equal to 0, we quantified the protein for crystallization using SDS‐PAGE. A dilution series of bovine serum albumin (BSA) protein was added as a standard and compared to mbDLG‐2protein used for crystallization, as labeled. The gel is overloaded, so we estimated the protein concentration to be 12.5–25 mg/mL, which suggests that the density of our protein signal, on average, is closest to the 12.5 and 25 μg BSA lanes. (C) The P 212121mbDLG‐2 PDZ domain (gray cartoon, electron density is shown in blue mesh, 2Fo‐Fcmap contoured at 1 s) structure revealed strong positive density (green mesh, Fo‐Fcmap contoured at 2.5 s, highlighted by the green circle) in the peptide binding cleft, which is likely the HPV16 E6 peptide that was incubated with the PDZ domain prior to crystallization; however, iterative rounds of refinement suggested that the occupancies of peptide atoms were <0.5 and that there were multiple conformations. Ultimately, the peptide could not be confidently modeled. A black arrow points to the carboxylate‐binding loop, which is labeled. (D) The mbDLG‐2 PDZ domain (purple surface) that crystallized in the I2 space group is not bound by peptide, due to molecules related by symmetry (gray surface). The mbDLG‐2 PDZ domain was aligned with the CAL PDZ domain structure bound to a HPV16 E6 peptide (PDB ID: 4JOP, RMSD = 0.859 Å for 239 main chain atoms). Steric clashes between molecules related by symmetry and the HPV16 E6 peptide (yellow sticks and labeled) are highlighted with red circles. (E) Comparison of the locations of carboxylate‐binding loop sequences for a number of human Dlg2 PDZ2 structures, including those in the apo form (cyan cartoon) and peptide‐bound structures (gray cartoon, with peptides as gray ribbons). PDB ID codes included are: 2AWW, 2AWX, 2G2L, 2I0L, 2M3M, 2OQS, 2X7Z, 4G69, 4OAJ.
Figure S4 The comparison of mbDLG‐3 structures to structures used as Molecular Replacement (MR) search models, as well as binding and sequence analyses. (A) The structure of mbDLG‐3 PDZ (orange cartoon) was aligned with (left) structures modeled using the Robetta server (gray cartoon), RMSD values are between 0.902 Å (234 atoms) and 1.373 Å (259 atoms), and (right) PSD‐95/Dlg4PDZ3 (6QJF, gray cartoon), which was used as the successful search model for molecular replacement, RMSD = 0.726 Å over 225 main chain atoms. Specifically, the N‐and C‐terminal regions of the bB‐bC loop are most similar to PSD‐95/Dlg4PDZ3, as highlighted by black arrows. (B) Substitution in the binding class‐determining aB‐1 residue between mbDLG‐3 (gray cartoon, with the side chain of Y305 as orange sticks) and human Dlg2 PDZ3 (Dlg2‐3, PDB ID: 2HE2; gray cartoon with the side chain of H480 as sticks). (C) Fluorescence polarization isotherms are shown for mbDLG‐3PDZand the F*‐HPV18 E6 (black circles and squares, sequence: F*‐RLQRRRETQV) and F*‐HPV18 E6 with a P‐2Asp(F*‐RLQRRREDQV)peptides (gray diamonds and triangles, both reporter peptides at 30 nM). (D) The full alignment of a number of Dlg1 and Dlg2 PDZ3 sequences from eleven organisms. Only PDZ3 sequences from the choanoflagellate species, Monosiga brevicollis, and Salpingoeca rosettacontain a Tyr at the aB‐1 residue (position 78 in the alignment). The alignment is colored by sequence identity, which darker blue colors indicating a higher % identity.
Figure S5 Hydrophobicity of choanoflagellate PDZ domains: experimentally determined crystal structures (A) and models determined by Rosetta or SwissModel (B). All PDZ domains, with experimentally determined structures (A) or models of borderline sequences from our M. brevicollis proteome search (B) are labeled and shown in cartoon representation, with side chain sticks. Coloring is by degree of hydrophobicity, using previously determined values and
Figure S6 Rosetta models of borderline structures that are classified as PDZ domains in UniProt. Based on our filtering criteria, a number of sequences that are classified as PDZ domains by UniProt are borderline. We modeled these putative PDZ domains using Rosetta and all are shown in cartoon, with side chains as sticks, and colored based on hydrophobicity. Our results suggest that A9UPI8 PDZ18 (A) and A9V109 PDZ1 (B) are likely true PDZ domains, but that the sequence of A9V625 PDZ3 (C) is questionable because it is not long enough, and that of A9V7P9 PDZ (D) is not compatible with the conserved PDZ domain fold. For all, the UniProt IDs, residue numbers, method of modeling (all Rosetta) and carboxylate‐binding loop sequence are shown, with the exception of the carboxylate‐binding loop sequence for A9V7P9 because we could not identity a potential one from the sequence or structure
Table S1. Data collection and refinement statistics.
Table S2. Revisions with domains
ACKNOWLEDGMENTS
The authors would like to sincerely thank Drs. Christine Gee (UC Berkeley) and Dean Madden (Geisel School of Medicine at Dartmouth) for important discussions about anisotropic crystal data. In addition, Jacob Olson and Bodi Van Roy were involved in early work on mbDLG PDZ domain purification, and we would also like to thank the other members of the Amacher lab for helpful discussions and assistance. All data collection was done at the ALS at LBNL on beamline 5.0.1. We would like to thank the Berkeley Center for Structural Biology (BCSB) for assistance using the facility, specifically, Stacey Ortega for administrative and Marc Allaire and Dr. Daniil Prigozhin for technical help. The BCSB is supported in part by the Howard Hughes Medical Institute. The ALS is a Department of Energy, Office of Science User Facility under Contract No. DE‐AC02‐05CH11231. The Pilatus detector on 5.0.1. was funded under NIH grant S10OD021832. Other grant information: J. F. A. was funded by NSF CHE‐1904711, M. G. was funded by NSF‐REU grant CHE‐1757629, and C. B. was funded by Rosetta Licensing Fund grant RC8010. Start‐up funds from Western Washington University also contributed to this project.
Gao M, Mackley IGP, Mesbahi‐Vasey S, et al. Structural characterization and computational analysis of PDZ domains in Monosiga brevicollis . Protein Science. 2020;29:2226–2244. 10.1002/pro.3947
Funding information Division of Chemistry, Grant/Award Numbers: 1757629, 1904711; National Institutes of Health, Grant/Award Number: S10OD021832; Rosetta Licensing Fund, Grant/Award Number: RC8010; U.S. Department of Energy, Grant/Award Number: Office of Science User Facility Contract DE‐AC02‐05CH11231; Western Washington University; Howard Hughes Medical Institute
REFERENCES
- 1. Brunet T, King N. The origin of animal multicellularity and cell differentiation. Dev Cell. 2017;43:124–140. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. King N, Westbrook MJ, Young SL, et al. The genome of the choanoflagellate Monosiga brevicollis and the origin of metazoans. Nature. 2008;451:783–788. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Levin TC, Greaney AJ, Wetzel L, King N. The Rosetteless gene controls development in the choanoflagellate S rosetta. Elife. 2014;3:e04070. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Laundon D, Larson BT, McDonald K, King N, Burkhardt P. The architecture of cell differentiation in choanoflagellates and sponge choanocytes. PLoS Biol. 2019;17:e3000226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Dayel MJ, Alegado RA, Fairclough SR, et al. Cell differentiation and morphogenesis in the colony‐forming choanoflagellate Salpingoeca rosetta. Dev Biol. 2011;357:73–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Richter DJ, Fozouni P, Eisen MB, King N. Gene family innovation, conservation and loss on the animal stem lineage. Elife. 2018;7:e34226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Shah NH, Amacher JF, Nocka LM, Kuriyan J. The Src module: An ancient scaffold in the evolution of cytoplasmic tyrosine kinases. Crit Rev Biochem Mol Biol. 2018;53:535–563. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. del Sol A, Carbonell P. The modular organization of domain structures: Insights into protein‐protein binding. PLoS Comput Biol. 2007;3:e239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Amacher JF, Hobbs HT, Cantor AC, et al. Phosphorylation control of the ubiquitin ligase Cbl is conserved in choanoflagellates. Protein Sci. 2018;27:923–932. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Bhattacharyya M, Stratton MM, Going CC, et al. Molecular mechanism of activation‐triggered subunit exchange in ca(2+)/calmodulin‐dependent protein kinase II. Elife. 2016;5:e13405. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Booth DS, King N. Genome editing enables reverse genetics of multicellular development in the choanoflagellate Salpingoeca rosetta. Elife. 2020;9:e56193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Li W, Young SL, King N, Miller WT. Signaling properties of a non‐metazoan Src kinase and the evolutionary history of Src negative regulation. J Biol Chem. 2008;283:15491–15501. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Schultheiss KP, Suga H, Ruiz‐Trillo I, Miller WT. Lack of Csk‐mediated negative regulation in a unicellular SRC kinase. Biochemistry. 2012;51:8267–8277. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Burkhardt P, Grønborg M, McDonald K, Sulur T, Wang Q, King N. Evolutionary insights into premetazoan functions of the neuronal protein homer. Mol Biol Evol. 2014;31:2342–2355. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Burkhardt P, Stegmann CM, Cooper B, et al. Primordial neurosecretory apparatus identified in the choanoflagellate Monosiga brevicollis. Proc Natl Acad Sci U S A. 2011;108:15264–15269. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Sakarya O, Conaco C, Egecioglu O, Solla SA, Oakley TH, Kosik KS. Evolutionary expansion and specialization of the PDZ domains. Mol Biol Evol. 2010;27:1058–1069. [DOI] [PubMed] [Google Scholar]
- 17. Harris BZ, Lim WA. Mechanism and role of PDZ domains in signaling complex assembly. J Cell Sci. 2001;114:3219–3231. [DOI] [PubMed] [Google Scholar]
- 18. Ponting CP. Evidence for PDZ domains in bacteria, yeast, and plants. Protein Sci. 1997;6:464–468. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Muley VY, Akhter Y, Galande S. PDZ domains across the microbial world: Molecular link to the proteases, stress response, and protein synthesis. Genome Biol Evol. 2019;11:644–659. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Bryant PJ, Watson KL, Justice RW, Woods DF. Tumor suppressor genes encoding proteins required for cell interactions and signal transduction in drosophila. Development. 1993;119:239–249. [PubMed] [Google Scholar]
- 21. Woods DF, Bryant PJ. The discs‐large tumor suppressor gene of drosophila encodes a guanylate kinase homolog localized at septate junctions. Cell. 1991;66:451–464. [DOI] [PubMed] [Google Scholar]
- 22. Cho KO, Hunt CA, Kennedy MB. The rat brain postsynaptic density fraction contains a homolog of the drosophila discs‐large tumor suppressor protein. Neuron. 1992;9:929–942. [DOI] [PubMed] [Google Scholar]
- 23. Woods DF, Bryant PJ. Molecular cloning of the lethal(1)discs large‐1 oncogene of drosophila. Dev Biol. 1989;134:222–235. [DOI] [PubMed] [Google Scholar]
- 24. Kennedy MB. Origin of PDZ (DHR, GLGF) domains. Trends Biochem Sci. 1995;20:350. [DOI] [PubMed] [Google Scholar]
- 25. Amacher JF, Cushing PR, Bahl CD, Beck T, Madden DR. Stereochemical determinants of C‐terminal specificity in PDZ peptide‐binding domains: A novel contribution of the carboxylate‐binding loop. J Biol Chem. 2013;288:5114–5126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Amacher JF, Brooks L, Hampton TH, Madden DR. Specificity in PDZ‐peptide interaction networks: Computational analysis and review. J Struct Biol X. 2020;4:100022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Doyle DA, Lee A, Lewis J, Kim E, Sheng M, MacKinnon R. Crystal structures of a complexed and peptide‐free membrane protein‐binding domain: Molecular basis of peptide recognition by PDZ. Cell. 1996;85:1067–1076. [DOI] [PubMed] [Google Scholar]
- 28. Slep KC. Structure of the human discs large 1 PDZ2‐adenomatous polyposis coli cytoskeletal polarity complex: Insight into peptide engagement and PDZ clustering. PLoS One. 2012;7:e50097. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Elkins JM, Papagrigoriou E, Berridge G, et al. Structure of PICK1 and other PDZ domains obtained with the help of self‐binding C‐terminal extensions. Protein Sci. 2007;16:683–694. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Shang G, Brautigam CA, Chen R, Lu D, Torres‐Vázquez J, Zhang X. Structure analyses reveal a regulated oligomerization mechanism of the PlexinD1/GIPC/myosin VI complex. Elife. 2017;6:e27322. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Stangl A, Elliott PR, Pinto‐Fernandez A, et al. Regulation of the endosomal SNX27‐retromer by OTULIN. Nat Commun. 2019;10:4320. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Lee JH, Park H, Park SJ, Kim HJ, Eom SH. The structural flexibility of the shank1 PDZ domain is important for its binding to different ligands. Biochem Biophys Res Commun. 2011;407:207–212. [DOI] [PubMed] [Google Scholar]
- 33. Nourry C, Grant SGN, Borg J‐P. PDZ domain proteins: Plug and play! Sci STKE. 2003;2003:RE7. [DOI] [PubMed] [Google Scholar]
- 34. Lee H‐J, Zheng JJ. PDZ domains and their binding partners: Structure, specificity, and modification. Cell Commun Signal. 2010;8:8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Kim E, Sheng M. PDZ domain proteins of synapses. Nat Rev Neurosci. 2004;5:771–781. [DOI] [PubMed] [Google Scholar]
- 36. Duhoo Y, Girault V, Turchetto J, et al. High‐throughput production of a new library of human single and tandem PDZ domains allows quantitative PDZ‐peptide interaction screening through high‐throughput holdup assay. Methods Mol Biol. 2019;2025:439–476. [DOI] [PubMed] [Google Scholar]
- 37. Tonikian R, Zhang Y, Sazinsky SL, et al. A specificity map for the PDZ domain family. PLoS Biol. 2008;6:e239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Amacher JF, Cushing PR, Brooks L, Boisguerin P, Madden DR. Stereochemical preferences modulate affinity and selectivity among five PDZ domains that bind CFTR: Comparative structural and sequence analyses. Structure. 2014;22:82–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Gógl G, Biri‐Kovács B, Durbesson F, et al. Rewiring of RSK‐PDZ interactome by linear motif phosphorylation. J Mol Biol. 2019;431:1234–1249. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Ferrer M, Maiolo J, Kratz P, et al. Directed evolution of PDZ variants to generate high‐affinity detection reagents. Protein Eng Des Sel. 2005;18:165–173. [DOI] [PubMed] [Google Scholar]
- 41. Huang J, Koide A, Makabe K, Koide S. Design of protein function leaps by directed domain interface evolution. Proc Natl Acad Sci U S A. 2008;105:6578–6583. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Kim J, Kim I, Yang J‐S, et al. Rewiring of PDZ domain‐ligand interaction network contributed to eukaryotic evolution. PLoS Genet. 2012;8:e1002510. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Skelton NJ, Koehler MFT, Zobel K, et al. Origins of PDZ domain ligand specificity. Structure determination and mutagenesis of the Erbin PDZ domain. J Biol Chem. 2003;278:7645–7654. [DOI] [PubMed] [Google Scholar]
- 44. Teyra J, Ernst A, Singer A, Sicheri F, Sidhu SS. Comprehensive analysis of all evolutionary paths between two divergent PDZ domain specificities. Protein Sci. 2020;29:433–442. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Sheng M, Kim E. The Shank family of scaffold proteins. J Cell Sci. 2000;113:1851–1856. [DOI] [PubMed] [Google Scholar]
- 46. Sheng M, Kim E. The postsynaptic organization of synapses. Cold Spring Harb Perspect Biol. 2011;3:a005678. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Yi Z, Petralia RS, Fu Z, et al. The role of the PDZ protein GIPC in regulating NMDA receptor trafficking. J Neurosci. 2007;27:11663–11675. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Ernst A, Gfeller D, Kan Z, et al. Coevolution of PDZ domain‐ligand interactions analyzed by high‐throughput phage display and deep sequencing. Mol Biosyst. 2010;6:1782–1790. [DOI] [PubMed] [Google Scholar]
- 49. te Velthuis AJW, Sakalis PA, Fowler DA, Bagowski CP. Genome‐wide analysis of PDZ domain binding reveals inherent functional overlap within the PDZ interaction network. PLoS One. 2011;6:e16047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. De Vries L, Lou X, Zhao G, Zheng B, Farquhar MG. GIPC, a PDZ domain containing protein, interacts specifically with the C terminus of RGS‐GAIP. Proc Natl Acad Sci U S A. 1998;95:12340–12345. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Jeanneteau F, Guillin O, Diaz J, Griffon N, Sokoloff P. GIPC recruits GAIP (RGS19) to attenuate dopamine D2 receptor signaling. Mol Biol Cell. 2004;15:4926–4937. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Jeanneteau F, Diaz J, Sokoloff P, Griffon N. Interactions of GIPC with dopamine D2, D3 but not D4 receptors define a novel mode of regulation of G protein‐coupled receptors. Mol Biol Cell. 2004;15:696–705. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53. Krishnan A, Schiöth HB. The role of G protein‐coupled receptors in the early evolution of neurotransmission and the nervous system. J Exp Biol. 2015;218:562–571. [DOI] [PubMed] [Google Scholar]
- 54. Krishnan A, Almén MS, Fredriksson R, Schiöth HB. The origin of GPCRs: Identification of mammalian like rhodopsin, adhesion, glutamate and frizzled GPCRs in fungi. PLoS One. 2012;7:e29817. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55. Valgardson J, Cosbey R, Houser P, et al. MotifAnalyzer‐PDZ: A computational program to investigate the evolution of PDZ‐binding target specificity. Protein Sci. 2019;28:2127–2143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Karthikeyan S, Leung T, Ladias JA. Structural basis of the Na+/H+ exchanger regulatory factor PDZ1 interaction with the carboxyl‐terminal region of the cystic fibrosis transmembrane conductance regulator. J Biol Chem. 2001;276:19683–19686. [DOI] [PubMed] [Google Scholar]
- 57. Vouilleme L, Cushing PR, Volkmer R, Madden DR, Boisguerin P. Engineering peptide inhibitors to overcome PDZ binding promiscuity. Angew Chem Int Ed Engl. 2010;49:9912–9916. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58. Cushing PR, Fellows A, Villone D, Boisguérin P, Madden DR. The relative binding affinities of PDZ partners for CFTR: A biochemical basis for efficient endocytic recycling. Biochemistry. 2008;47:10084–10098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. [DOI] [PubMed] [Google Scholar]
- 60. McCoy AJ. Solving structures of protein complexes by molecular replacement with Phaser. Acta Crystallogr. 2007;D63:32–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61. DiMaio F, Terwilliger TC, Read RJ, et al. Improved molecular replacement by density‐ and energy‐guided protein structure optimization. Nature. 2011;473:540–543. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62. DiMaio F, Echols N, Headd JJ, Terwilliger TC, Adams PD, Baker D. Improved low‐resolution crystallographic refinement with Phenix and Rosetta. Nat Methods. 2013;10:1102–1104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63. DiMaio F, Song Y, Li X, et al. Atomic‐accuracy models from 4.5‐Å cryo‐electron microscopy data with density‐guided iterative local refinement. Nat Methods. 2015;12:361–365. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64. Strong M, Sawaya MR, Wang S, Phillips M, Cascio D, Eisenberg D. Toward the structural genomics of complexes: Crystal structure of a PE/PPE protein complex from Mycobacterium tuberculosis . Proc Natl Acad Sci U S A. 2006;103:8060–8065. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65. Monteiro P, Feng G. SHANK proteins: Roles at the synapse and in autism spectrum disorder. Nat Rev Neurosci. 2017;18:147–157. [DOI] [PubMed] [Google Scholar]
- 66. Chaptal V, Kilburg A, Flot D, et al. Two different centered monoclinic crystals of the E. coli outer‐membrane protein OmpF originate from the same building block. Biochim Biophys Acta. 2016;1858:326–332. [DOI] [PubMed] [Google Scholar]
- 67. UniProt Consortium . Reorganizing the protein space at the universal protein resource (UniProt). Nucleic Acids Res. 2012;40:D71–D75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68. UniProt Consortium . UniProt: A worldwide hub of protein knowledge. Nucleic Acids Res. 2019;47:D506–D515. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69. Magrane M, Consortium U. UniProt knowledgebase: A hub of integrated protein data. Database. 2011;2011:bar009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70. Hunter S, Apweiler R, Attwood TK, et al. InterPro: The integrative protein signature database. Nucleic Acids Res. 2009;37:D211–D215. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71. Biasini M, Bienert S, Waterhouse A, et al. SWISS‐MODEL: Modelling protein tertiary and quaternary structure using evolutionary information. Nucleic Acids Res. 2014;42:W252–W258. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72. Waterhouse A, Bertoni M, Bienert S, et al. SWISS‐MODEL: Homology modelling of protein structures and complexes. Nucleic Acids Res. 2018;46:W296–W303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73. Arnold K, Bordoli L, Kopp J, Schwede T. The SWISS‐MODEL workspace: A web‐based environment for protein structure homology modelling. Bioinformatics. 2006;22:195–201. [DOI] [PubMed] [Google Scholar]
- 74. Bordoli L, Kiefer F, Arnold K, Benkert P, Battey J, Schwede T. Protein structure homology modeling using SWISS‐MODEL workspace. Nat Protoc. 2009;4:1–13. [DOI] [PubMed] [Google Scholar]
- 75. Ford AS, Weitzner BD, Bahl CD. Integration of the Rosetta suite with the python software stack via reproducible packaging and core programming interfaces for distributed simulation. Protein Sci. 2020;29:43–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76. Lau Y‐TK, Baytshtok V, Howard TA, et al. Discovery and engineering of enhanced SUMO protease enzymes. J Biol Chem. 2018;293:13224–13233. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77. Song Y, DiMaio F, Wang RY‐R, et al. High‐resolution comparative modeling with RosettaCM. Structure. 2013;21:1735–1742. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78. Nivón LG, Moretti R, Baker D. A Pareto‐optimal refinement method for protein design scaffolds. PLoS One. 2013;8:e59004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79. Conway P, Tyka MD, DiMaio F, Konerding DE, Baker D. Relaxation of backbone bond geometry improves protein energy landscape modeling. Protein Sci. 2014;23:47–55. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80. Alford RF, Leaver‐Fay A, Jeliazkov JR, et al. The Rosetta all‐atom energy function for macromolecular modeling and design. J Chem Theory Comput. 2017;13:3031–3048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81. Eisenberg D, Schwarz E, Komaromy M, Wall R. Analysis of membrane and surface protein sequences with the hydrophobic moment plot. J Mol Biol. 1984;179:125–142. [DOI] [PubMed] [Google Scholar]
- 82. Ivarsson Y, Arnold R, McLaughlin M, et al. Large‐scale interaction profiling of PDZ domains through proteomic peptide‐phage display using human and viral phage peptidomes. Proc Natl Acad Sci U S A. 2014;111:2542–2547. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83. de Mendoza A, Suga H, Ruiz‐Trillo I. Evolution of the MAGUK protein gene family in premetazoan lineages. BMC Evol Biol. 2010;10:93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84. Ruiz‐Trillo I, Roger AJ, Burger G, Gray MW, Lang BF. A phylogenomic investigation into the origin of metazoa. Mol Biol Evol. 2008;25:664–672. [DOI] [PubMed] [Google Scholar]
- 85. Kabsch W. XDS. Acta Crystallogr. 2010;D66:125–132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86. Kabsch W. Integration, scaling, space‐group assignment and post‐refinement. Acta Cryst. 2010;D66:133–144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87. Kabsch W. Automatic processing of rotation diffraction data from crystals of initially unknown symmetry and cell constants. J Appl Cryst. 1993;26:795–800. [Google Scholar]
- 88. Adams PD, Afonine PV, Bunkóczi G, et al. PHENIX: A comprehensive python‐based system for macromolecular structure solution. Acta Cryst. 2010;D66:213–221. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89. Kim DE, Chivian D, Baker D. Protein structure prediction and analysis using the Robetta server. Nucleic Acids Res. 2004;32:W526–W531. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90. Terwilliger TC, Grosse‐Kunstleve RW, Afonine PV, et al. Iterative model building, structure refinement and density modification with the PHENIX AutoBuild wizard. Acta Cryst. 2008;D64:61–69. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91. Emsley P, Lohkamp B, Scott WG, Cowtan K. Features and development of coot. Acta Cryst. 2010;D66:486–501. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92. Chen VB, Arendall WB, Headd JJ, et al. MolProbity: All‐atom structure validation for macromolecular crystallography. Acta Cryst. 2010;D66:12–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93. Laskowski RA, Moss DS, Thornton JM. Main‐chain bond lengths and bond angles in protein structures. J Mol Biol. 1993;231:1049–1067. [DOI] [PubMed] [Google Scholar]
- 94. Vaguine AA, Richelle J, Wodak SJ. SFCHECK: A unified set of procedures for evaluating the quality of macromolecular structure‐factor data and their agreement with the atomic model. Acta Cryst. 1999;D55:191–205. [DOI] [PubMed] [Google Scholar]
- 95. Berman H, Henrick K, Nakamura H. Announcing the worldwide protein data Bank. Nat Struct Biol. 2003;10:980. [DOI] [PubMed] [Google Scholar]
- 96. NCBI Resource Coordinators . Database resources of the national center for biotechnology information. Nucleic Acids Res. 2017;45:D12–D17. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data S1. Supplemental Materials and Methods
Figure S1 Crystal contacts in mbGIPCSFDEI structure and electron density for peptide in mbGIPCTYRP1. (A) Sequence alignment of human GIPC and mbGIPC PDZ domains. (B) The spatial relationship between mbGIPC (green cartoon) PDZ and a molecule related by symmetry (gray cartoon, mbGIPC') is shown. The last five residues of the mbGIPC'C‐terminal tail (sequence SFDEI) are shown in gray stick. (C) The binding of mbGIPC (green cartoon, H157 side chain in stick representation) to the C‐terminal tail of mbGIPC'(gray sticks and labeled) is unconventional and a crystal artifact. (D) The average fluorescence polarization isotherm is shown for mbGIPCtrunc PDZ and the F*‐GAIP peptide (at 30 nM), including the SD for each data point. This experiment was performed in quadruplicate and the calculated K D = 0.29 ± 0.02 μM. E, The K182 residue in mbGIPCSFDEI PDZ (green sticks) makes crystal lattice contacts with the main chain carbonyl atoms of R134 and V135 in a molecule related by symmetry, mbGIPC'(gray sticks). For this reason, we chose to keep a lysine in this position in our other mbGIPC constructs (mbGIPCB1AR, mbGIPCGAIP, and mbGIPCTYRP1). (F) Final refinement of mbGIPCTYRP1 proved challenging, so we did not deposit this structure in the Protein Data Bank. However, there is clear electron density for the C‐terminal sequence, NQSVV (2Fo‐FCmap in blue mesh and contoured at 1 s), and it is consistent with mbGIPC binding to the other C‐terminal sequences (GAIP and B1AR). Here, mbGIPC is in gray cartoon with side chain sticks in gray. The TYRP1 sequence is in green stick representation and labeled. For all, sticks are colored by heteroatom (O = red, N = blue).
Figure S2 Structural and biochemical characterization of mbSNX27. (A) Sequence alignment of human SHANK1 and mbSHANK1PDZ domains. Secondary structure elements are labeled by arrows (b‐strands) and wavy lines (a‐helices). (B) Example fluorescence polarization isotherms are shown for SNX27PDZ and the F*‐b‐PIX (black squares) and F*‐GIRK3 peptides (black circles, both reporter peptides at 30 nM). (C, D) Average fluorescence polarization displacement isotherms are shown for SNX27 PDZ (C) and mbSNX27PDZ (D). Titration curves are shown for the following peptides: b‐PIX (circles), GIRK3 (squares),and mGluR1 (diamonds), or choanoflagellate proteins A9UP44 (triangles), A9UXE1 (upside‐down triangles), and A9V7Z4 (gray circles). Error bars indicate SD from the mean for triplicate experiments. The reporter peptide used in both experiments was F*‐b‐PIX.(E) The conservation between mbSNX27 (gold cartoon, with side chain residues as sticks) and human SNX27 (PDB ID: 6SAK, gray cartoon with side chain residues as sticks)is shown. The peptide(GIRK3 sequence: ESESKV) is from an additional human SNX27 structure (3QE1) and is shown as gray ribbon and labeled. The RMSD value between the human SNX27 structures is 0.377 Å over 302 main chain atoms. The mbSNX27PDZ structure was made using SwissModel with 6SAK as a template. Residues in the peptide‐binding cleft are labeled. All stick representation is colored by heteroatom (O = red, N = blue).
Figure S3 Sequence alignment of Dlg proteins, mbDLG‐2 structures, and human Dlg2 PDZ2 structures. (A) Sequence alignment of PDZ domains from human DLG1‐4 and mbDLG proteins. (B) Because mbDLG‐2 does not contain any aromatic residues, and thus has an extinction coefficient equal to 0, we quantified the protein for crystallization using SDS‐PAGE. A dilution series of bovine serum albumin (BSA) protein was added as a standard and compared to mbDLG‐2protein used for crystallization, as labeled. The gel is overloaded, so we estimated the protein concentration to be 12.5–25 mg/mL, which suggests that the density of our protein signal, on average, is closest to the 12.5 and 25 μg BSA lanes. (C) The P 212121mbDLG‐2 PDZ domain (gray cartoon, electron density is shown in blue mesh, 2Fo‐Fcmap contoured at 1 s) structure revealed strong positive density (green mesh, Fo‐Fcmap contoured at 2.5 s, highlighted by the green circle) in the peptide binding cleft, which is likely the HPV16 E6 peptide that was incubated with the PDZ domain prior to crystallization; however, iterative rounds of refinement suggested that the occupancies of peptide atoms were <0.5 and that there were multiple conformations. Ultimately, the peptide could not be confidently modeled. A black arrow points to the carboxylate‐binding loop, which is labeled. (D) The mbDLG‐2 PDZ domain (purple surface) that crystallized in the I2 space group is not bound by peptide, due to molecules related by symmetry (gray surface). The mbDLG‐2 PDZ domain was aligned with the CAL PDZ domain structure bound to a HPV16 E6 peptide (PDB ID: 4JOP, RMSD = 0.859 Å for 239 main chain atoms). Steric clashes between molecules related by symmetry and the HPV16 E6 peptide (yellow sticks and labeled) are highlighted with red circles. (E) Comparison of the locations of carboxylate‐binding loop sequences for a number of human Dlg2 PDZ2 structures, including those in the apo form (cyan cartoon) and peptide‐bound structures (gray cartoon, with peptides as gray ribbons). PDB ID codes included are: 2AWW, 2AWX, 2G2L, 2I0L, 2M3M, 2OQS, 2X7Z, 4G69, 4OAJ.
Figure S4 The comparison of mbDLG‐3 structures to structures used as Molecular Replacement (MR) search models, as well as binding and sequence analyses. (A) The structure of mbDLG‐3 PDZ (orange cartoon) was aligned with (left) structures modeled using the Robetta server (gray cartoon), RMSD values are between 0.902 Å (234 atoms) and 1.373 Å (259 atoms), and (right) PSD‐95/Dlg4PDZ3 (6QJF, gray cartoon), which was used as the successful search model for molecular replacement, RMSD = 0.726 Å over 225 main chain atoms. Specifically, the N‐and C‐terminal regions of the bB‐bC loop are most similar to PSD‐95/Dlg4PDZ3, as highlighted by black arrows. (B) Substitution in the binding class‐determining aB‐1 residue between mbDLG‐3 (gray cartoon, with the side chain of Y305 as orange sticks) and human Dlg2 PDZ3 (Dlg2‐3, PDB ID: 2HE2; gray cartoon with the side chain of H480 as sticks). (C) Fluorescence polarization isotherms are shown for mbDLG‐3PDZand the F*‐HPV18 E6 (black circles and squares, sequence: F*‐RLQRRRETQV) and F*‐HPV18 E6 with a P‐2Asp(F*‐RLQRRREDQV)peptides (gray diamonds and triangles, both reporter peptides at 30 nM). (D) The full alignment of a number of Dlg1 and Dlg2 PDZ3 sequences from eleven organisms. Only PDZ3 sequences from the choanoflagellate species, Monosiga brevicollis, and Salpingoeca rosettacontain a Tyr at the aB‐1 residue (position 78 in the alignment). The alignment is colored by sequence identity, which darker blue colors indicating a higher % identity.
Figure S5 Hydrophobicity of choanoflagellate PDZ domains: experimentally determined crystal structures (A) and models determined by Rosetta or SwissModel (B). All PDZ domains, with experimentally determined structures (A) or models of borderline sequences from our M. brevicollis proteome search (B) are labeled and shown in cartoon representation, with side chain sticks. Coloring is by degree of hydrophobicity, using previously determined values and
Figure S6 Rosetta models of borderline structures that are classified as PDZ domains in UniProt. Based on our filtering criteria, a number of sequences that are classified as PDZ domains by UniProt are borderline. We modeled these putative PDZ domains using Rosetta and all are shown in cartoon, with side chains as sticks, and colored based on hydrophobicity. Our results suggest that A9UPI8 PDZ18 (A) and A9V109 PDZ1 (B) are likely true PDZ domains, but that the sequence of A9V625 PDZ3 (C) is questionable because it is not long enough, and that of A9V7P9 PDZ (D) is not compatible with the conserved PDZ domain fold. For all, the UniProt IDs, residue numbers, method of modeling (all Rosetta) and carboxylate‐binding loop sequence are shown, with the exception of the carboxylate‐binding loop sequence for A9V7P9 because we could not identity a potential one from the sequence or structure
Table S1. Data collection and refinement statistics.
Table S2. Revisions with domains
