Abstract
Identification of ambiguous encoding in protein secondary structure is paramount to develop an understanding of key protein segments underlying amyloid diseases. We investigate two types of structurally ambivalent peptides, which were hypothesized in the literature as indicators of amyloidogenic proteins: discordant α-helices and chameleon sequences. Chameleon sequences are peptides discovered experimentally in different secondary-structure types. Discordant α-helices are α-helical stretches with strong β-strand propensity or prediction. To assess the distribution of these features in known protein structures, and their potential role in amyloidogenesis, we analyzed the occurrence of discordant α-helices and chameleon sequences in nonredundant sets of protein domains (n = 4263) and amyloidogenic proteins extracted from the literature (n = 77). Discordant α-helices were identified if discordance was observed between known secondary structures and secondary-structure predictions from the GOR-IV and PSIPRED algorithms. Chameleon sequences were extracted by searching for identical sequence words in α-helices and β-strands. We defined frustrated chameleons and very frustrated chameleons based on varying degrees of total β propensity ≥α propensity. To our knowledge, this is the first study to discern statistical relationships between discordance, chameleons, and amyloidogenicity. We observed varying enrichment levels for some categories of discordant and chameleon sequences in amyloidogenic sequences. Chameleon sequences are also significantly enriched in proteins that have discordant helices, indicating a clear link between both phenomena. We identified the first set of discordant-chameleonic protein segments we predict may be involved in amyloidosis. We present a detailed analysis of discordant and chameleons segments in the family of one of the amyloidogenic proteins, the Prion Protein.
Keywords: discordance, chameleon sequences, amyloid fibril, amyloidosis, prion, conformational variability, bioinformatics, β-sheet, structural plasticity, structurally ambivalent peptides
Introduction
Identification of ambiguous encoding in protein secondary structure is paramount to develop an understanding of key protein segments underlying many conformational diseases. Amyloid diseases, such as prion disease, for example, are characterized by major conformational changes whereby native proteins stretches can adopt β-sheet conformations and stabilize into multimeric assemblies that are highly pathogenic.1–3 This behavior is linked to many human diseases, including transmissible spongiform encephalopathies and has been experimentally observed in dozens of other proteins.2,3 The resultant amyloids are fibrillar assemblies of β-sheets with a characteristic “cross-β” configuration, that is, with β-strand axes orthogonal to the major long axis of the fibrils.4–6 Accordingly, a key task is to understand how the protein sequence can encrypt these alternative conformations and configurations of protein chains, in additional to their normal, cellular forms. Two sequence feature types that evidence ambiguous encoding of secondary structures, and which are hypothesized in the literature as indicators of amyloidogenic sequences, are “discordant” and “chameleon” sequences.
The phenomenon of discordance7,8 refers to sequences of a native α-helix, which may sometimes have a high intrinsic β propensity; this may arise because of specific long-range side-chain interactions causing the conservation of amino acids that would otherwise prefer to be in β-strands. Kallberg et al. detected 37 incidences of “discordant” α-helices in >1300 protein structures, of ≥7 residues in length. These discordant α-helices were predicted to form β-strands by multiple orthogonal secondary-structure algorithms, even though they had been determined to be α-helices in known experimental three-dimensional structures.7,8 These α–β discordant stretches were hypothesized to be associated with amyloid fibril formation,8 although no statistical relationship between discordance and amyloidogenicity was discerned.
Another aspect of ambiguity in encoding secondary structure is the existence of chameleon sequences. Chameleon sequences are short peptide stretches experimentally shown to adopt multiple secondary structure conformations in different proteins.9,10 Chameleons are thus structurally ambivalent peptides, because they assume different secondary structures in different contexts.11–13 Short 5-mer chameleon sequences in the PDB were first reported in a study of the use of sequence homology for protein structure prediction,14 and later studies reported hexapeptide chameleon sequences in a larger PDB database.9 The longest “chameleon-HE” [i.e., Helix (H) vs. Sheet (E)] sequence reported to date is seven residues long.11,12,15 Chameleon sequences that adopt both α-helical and β-sheet conformation are of particular interest in the analysis of the sequence determinants of amyloidogenicity.
The discordant α-helix and chameleon sequence phenomena are possibly strong indicators of conformational plasticity, and prime candidates for causation of amyloidogenicity.
In this work, we decided to rigorously test the hypotheses that discordant α-helices and chameleon sequences have a causative link to amyloid formation. To this end, we have performed a meta-scale statistical analysis of the distribution of discordant and chameleon sequences in a database of protein domains (SCOP), as well as in amyloidogenic proteins and their determinants. From our analysis of discordant stretches in protein domains, we suggest protein functions where structurally ambivalent peptides may be of importance. We also discuss the enrichment we have observed for various definitions of discordant α-helices and chameleon sequences, in amyloidogenic determinants. We introduce the first set of identified discordant-chameleonic segments that may be involved in amyloidosis. Finally, we explore in detail the specific important case of chameleon and discordant segments in the PrP protein family.
Results
Distribution of discordant α-helices
The distribution of discordant α-helices of length ≥5 residues was analyzed in a nonredundant data set of 4263 SCOP protein domains (see Methods for details). Using the consensus of the GOR-IV and PSIPRED secondary-structure prediction algorithms, we identified 119 discordant α-helices (Table I). The complete list is in Supporting Information Table I. α–β discordances or “mispredictions” by the individual secondary-structure prediction algorithms are also tabulated. The discordant α-helices occur in 111 protein domains, with eight domains having two discordant stretches each (Fig. 1). These 111 domains are dubbed “discordant protein domains”. Most discordant stretches were five residues long, with stretches greater than seven being exceedingly rare (Fig. 1). The list of discordant protein domains includes known amyloidogenic proteins, including pilin, alpha-lactalbumin (PDB 1b9o), chicken lysozyme (PDB 3lzt), triacylglycerol lipase (PDB 1tca), cytochrome c (PDB 1gu2), and human PrP (PDB 1i4m).8,16–18 The list also includes the PrP paralog Doppel (which has not been shown experimentally to make amyloid), and a yeast prion candidate, MCM1, which ranked amongst the top 10 yeast prion candidates in an experimental screen.19 Interestingly, almost 10% of the discordant protein domains are viral proteins; conformational changes of proteins in viral envelopes facilitate host membrane fusion and entry of viruses. The discordant protein domains do not have a general tendency for increased β propensity, compared with protein domains in general [Fig. 2(A)]; this is further demonstrated by the discordant stretches clearly having greater β than α propensity, compared with random samplings from these protein domains of α-helical stretches of the same length [Fig. 2(B); P-values for two-tailed t-tests ≤10−3 for all stretch lengths].
Table I.
Amyloidogenic protein chains (n = 77) | SCOP domains (n = 4263) | Significancea | Comments | |
---|---|---|---|---|
Total number of residues in the dataset | 11939 (3810 residues in α-helices ≥5 residues long) | 800218 (273330 residues in α-helices ≥5 residues long) | – | – |
Discordant proteins | 5 (1300 total residues) | 111 (28348 total residues) | – | Consensusc |
Discordant helicesb | 5 (30) | 119 (667) | 0.027 | Consensusc |
Discordant helicesb | 40 (266) | 907 (6352) | <0.000001 | GORc |
Discordant helicesb | 9 (59) | 280 (1689) | 0.018 | Psipredc |
Binomial probability (one-tailed test, using a Poisson approximation) of obtaining the observed number (or a larger number) of initial residue positions in discordant alpha-helical stretches in the alpha-helices ≥5 residues in length in the amyloidogenic protein set, compared to the same statistics for the SCOP domain set. Subtracting the maximum possible disallowed positions (i.e., from other positions within the discordant alpha-helical stretches, and the residue positions immediately adjacent to the start and end of them) has a negligible effect on the calculations.
Uninterrupted stretches of discordant residues with a length of 5 or greater constitute a discordant helix. The number of discordant residues is indicated in brackets.
As comparison to our consensus approach, the number of discordant helices obtained using one secondary structure prediction tool is demonstrated.
Distribution of chameleon sequences
Chameleons were defined as peptides of five or six residues in length that occur in both α-helix and β-strand secondary assignments made by the DSSP algorithm10–12 (see Methods for details). In the ASTRALSCOP protein domain data set analyzed, we observed that a sizeable fractions (∼14%) of all 5-mer α-helical peptides are chameleons (Table II); however, a much smaller fraction of 6-mers α-helical peptides are also observed in β-strands (0.6%). Very Frustrated chameleons were defined as the subset of these chameleon sequences that have β propensity ≥1.5× their α propensity [Fig. 3(C)]. These sequences are thus predicted to be more “frustrated” when in an α-helical state, that is, the specific local side-chain environment of the α-helix is “frustrating” the propensity for the sequence to adopt a β conformation. These sequences are very unusual, occurring at the rate of only 1 in ∼890 α-helical 5-mers, and almost nonexistent for 6-mers (just two cases).
Table II.
SCOP domains with <40% identity (n = 4263 chains) | Discordant proteins (n = 119 chains) | Amyloidogenic proteins with <40% identity (n = 77 chains)a | Amyloidogenic determinants (n = 45)b | |||||
---|---|---|---|---|---|---|---|---|
5-mer | 6-mer | 5-mer | 6-mer | 5-mer | 6-mer | 5-mer | 6-mer | |
Total helical Fragments | 213,854 | 160,843 | 9035 | 7341 | 2786 | 2436 | 120 | 99 |
Chameleons in helices | 29645 (13.86%) | 986 (0.613%) | 1329 (14.70%) | 49 (0.66%) | 364 (13.06%) | 15 (0.615%) | 20 (16.7%) | 2 (3.36%) |
Frustrated chameleons in helicesc | 16283 (7.614%) | 498 (0.310%) | 846 (9.363%) | 29 (0.395%) | 197 (7.071%) | 6 (0.246%) | 16 (13.3%) | 2 (3.36%) |
Very frustrated chameleons in helicesc | 240 (0.1122%) | 2 (0.0012%) | 14 (0.15%) | 0 | 6 (0.215%) | 0 | 2 (7.09%) | 0 (1.68%) |
% Frustrated chameleons in chameleon fragments | 0.8095 | 0.2028 | 1.053 | 0 | 1.648 | 0 | 18.18 | 50 |
PChameleond | – | – | 0.01 | 0.29 | 0.89 | 0.53 | 0.22 | 0.12 |
PFrustrated chameleond | – | – | 2.9x10−10 | 0.11 | 0.15 | 0.37 | 0.02 | 0.04 |
PVery frustrated chameleond | – | – | 0.15 | 0.92 | 0.10 | 0.97 | 0.008 | ---------- |
Significance of identified chameleons (See PChameleon and PFrustrated chameleon and PVery frustrated chameleon) is calculated using a hypergeometric test against the number of chameleonic fragments identified in SCOP (for all other cohorts than SCOP).
Single-chain domains of 77 Amyloidogenic protein structures are selected for the analysis.
From the selected determinants (n=45), determinants with experimentally verified secondary structure were selected for analysis. Of these, 17 determinants had helical structures ≥5 residues.
Frustrated Chameleons and Very Frustrated Chameleons are calculated as described in the Methods section. We defined frustrated chameleons as chameleons with higher β propensity than α propensity, and very frustrated chameleons as chameleon sequences with very high β propensity values (operationally, with total β propensity ≥1.5× α propensity).
This is the length-specific binomial probability of finding chameleon or frustrated chameleon sequences in each of the cohorts in comparison to the numbers of chameleon and frustrated chameleons observed in all helices in the SCOP database. A Poisson approximation is used for expected values <0.1. Significant P-values (P<0.05) are in bold. All counts in categories with P<0.05 are also significant for tests of significance against nonredundant sets of whole protein chains, derived from the DSSP database with a 40% sequence identity threshold, using BLASTCLUST (52).
One might expect that chameleon sequences would have a tendency to low sequence complexity, that is, sequences that fit into both α and β secondary structure might have an enrichment of amino-acid runs in them. To test this hypothesis, we analyzed the distribution of amino-acid runs in the 5-mer and 6-mer chameleon sequences from the ASTRALSCOP domain data set (Table III). We examined runs of size 4, 5, and 6 for all 20 amino-acid residue types. We found that a small number of amino-acid run types are significantly over-represented in chameleon sequences (most frequently runs of alanine, histidine, valine, and leucine; Table III).
Table III.
Sequence and length of run | |||
---|---|---|---|
Length (4X,5X,6X) | Sequence Observed | Count observed in pentameric chameleons of nonredundant SCOP data set (n = 29,645) | Count observed in hexameric chameleons of nonredundant SCOP data set (n = 986) |
4 | AAAA | 47a | 7a |
4 | HHHH | 4a | 2a |
4 | LLLL | 45a | 0 |
4 | TTTT | 1 | 0 |
4 | VVVV | 15a | 0 |
5 | AAAAA | 8a | 0 |
5 | HHHHH | 3 | 1 |
Identified same-residue runs of lengths 4–6 are shown, and their corresponding counts in the chameleon sets.
P < 0.01 for a Poisson distribution given observed frequency of each run in all α-helices in the SCOP database. There are no significantly underrepresented runs in the chameleons.
Are discordant, chameleon, and frustrated chameleon sequences over-represented in amyloidogenic sequences?
To assess whether these ambiguous encoding segments are enriched in amyloidogenic determinants and their proteins, a set of amyloidogenic proteins from the current literature16–18,20 were reduced for sequence redundancy (with a 40% sequence identity threshold) using the PISCES tool and manual curation.21 We identified five cases of discordant α-helices (identified as the consensus of mispredictions by both the GOR-IV and PSIPRED algorithms) in amyloidogenic proteins (Tables I and IV). This is a moderately significant enrichment (Table I). All but one of these discordant stretches are not in amyloidogenic determinants of these proteins (Table IV); the lone exception being in an amyloidogenic determinant of the Prion Protein (PrP). Interestingly, comparison of discordance using only one prediction tool indicates a highly significant enrichment of discordant α-helices that were identified through GOR-IV β mispredictions alone (Table I, P < 0.000001).
Table IV.
Protein | PDB | Discordant region | Discordant segment | Chameleon sequence | Amyloidogenic determinant? |
---|---|---|---|---|---|
Coagulation factor XIII (H. sapiens) | 1ex0:A | 239–244 | IKVSRV | None | No |
Lysozyme (H. sapiens) | 1jsf:A | 28–33 | WMCLAK | None | No |
Cytochrome c (B.taurus) | 1ppj:T | 52–58 | VAFYLVY | None | No |
Prion protein (H. sapiens) | 3hak:A | 56–60 | VNITI | 1e4k:C, 1e4j:A, 1fnl:A, 1hfl, 1op8:F, 1op8:C, 1op8:D, 2vov:A, 2vow:A, 2vox:A | Yes |
Triacylglycerol lipase (C. antartica) | 3icv:A | 255–260 | FSYVVG | None | No |
The discordant segments and corresponding chameleon proteins are identified.
We investigated whether chameleon and frustrated chameleon sequences are enriched in amyloidogenic proteins, and more specifically, in their experimentally defined amyloidogenic determinants. We compared these chameleon occurrences to those observed for the nonredundant SCOP protein domain sets (Table II). As one would expect, in all cohorts analyzed, there is an over-abundance of pentameric chameleon sequences over hexameric ones. Counting up chameleon sequences simply, we find marginal significant enrichments in amyloidogenic determinants of frustrated and very frustrated chameleons (Table II, binomial P-values ≤ 0.04). However, arguably, one should remove over-lapping cases of chameleon 5-mers and 6-mers in protein sequences. After doing this, we only observe a significant enrichment of 6-mer frustrated chameleons in amyloidogenic determinants (Table V). This indicates a marginal link of α−β chameleons in known α-helices to amyloidogenicity.
Table V.
SCOP domains with <40% identity (n = 4263 chains) | Discordant proteins (n = 119 chains) | Amyloidogenic proteins with <40% Identity (n = 77 chains)a | Amyloidogenic determinants (n = 45)b,c | |||||
---|---|---|---|---|---|---|---|---|
5-mer | 6-mer | 5-mer | 6-mer | 5-mer | 6-mer | 5-mer | 6-mer | |
Total Helical Fragments | 46,199 | 36,255 | 2045 | 1627 | 632 | 506 | 33 | 25 |
Chameleons in Helices | 15,577 (33.7%) | 888 (2.5%) | 661 (32.3%) | 42 (2.6%) | 208 (32.9%) | 12 (2.4%) | 11 (33.3%) | 2 (8%) |
Frustrated Chameleons in Helicesd | 8056 (17.4%) | 443 (1.2%) | 393 (19.2%) | 24 (1.5%) | 106 (16.77%) | 5 (0.988%) | 8 (24.2%) | 2 (8%) |
PChameleone | – | – | 0.09 | 0.38 | 0.35 | 0.53 | 0.56 | 0.12 |
PFrustratedChameleone | – | – | 0.02 | 0.20 | 0.35 | 0.41 | 0.21 | 0.04 |
Single-chain domains of 77 amyloidogenic protein structures are selected for the analysis.
From the selected determinants (n = 45), determinants with experimentally verified secondary structure were selected for analysis. Of these 17 determinants had helical structures ≥ 5 residues.
Adding the small transmembrane protein 1SFP amyloidogenic determinant does not change the significances, except for making an enrichment of 6-mer chameleons generally (P = 0.03).
Frustrated Chameleons and Very Frustrated Chameleons are calculated as described in the Methods section. We defined frustrated chameleons as chameleons with higher β propensity than α propensity, and very frustrated chameleons as chameleon sequences with very high β propensity values (operationally, with total β propensity ≥1.5× α propensity).
This is the length-specific binomial probability of finding chameleon or frustrated chameleon sequences in each of the cohorts in comparison to the numbers of chameleon and frustrated chameleons observed in all helices in the SCOP database. A Poisson approximation is used for expected values <0.1. Significant P-values (P < 0.05) are in bold. All counts in categories with P < 0.05 are also significant for tests of significance against nonredundant sets of whole protein chains, derived from the DSSP database with a 40% sequence identity threshold, using BLASTCLUST.52
Segments that are both chameleon and discordant
Although we have demonstrated that both discordant and chameleon segments are moderately or marginally linked to amyloidogenicity, respectively, we discovered that 5-mer chameleons are significantly enriched in the discordant protein domains we had identified (Table II). This indicates a clear link between the two phenomena, and any sequence segment can occur as both phenomena. We wished to discern whether combining both phenomena may improve predictions of amyloidogenic segments. We identified 28 discordant α-helical segments that also contain chameleon sequences that occur in at least one further protein (Supporting Information Table II), as described in Methods section. The subset of these that are also predicted by the algorithm Pafig22 to be amyloidogenic are listed in (Table VI). Notable examples of discordant α-helical segments exhibiting chameleon conformational behavior include the discordant stretch of Human PrP helix 2 (which to date is the only example of a known amyloidogenic protein with a combined discordant and chameleonic sequence segment), and the only identified chameleon sequence of the PrP paralog Doppel, in its most N-terminal α-helix. The list includes other interesting candidates, such as HSV Glycoprotein D; HSV is proposed to be linked with amyloidogenicity in Alzheimer's disease.
Table VI.
PDB | Discordant Segment | Protein (Organism) | Pafig RS | Chameleon Sequences |
---|---|---|---|---|
1ccw:B | 306–310 GVIVT | Glutamate mutase, large subunit (Clostridium cochlearium) | 9 | 1hzp:A, 1hzp:B, 1m1m:A, 1m1m:B, 1okk:D, 1rj9:A, 1u6e:A, 1u6e:B, 1u6s:A, 1u6s:B, 2ahb:A, 2ahb:B, 2aj9:A, 2aj9:B, 2cnw:F, 2cnw:E, 2cnw:D, 2iyl:D, 2j7p:E, 2j7p:D, 2q9a:A, 2q9a:B, 2q9b:A, 2q9b:B, 2q9c:A, 2q9c:B, 2qnx:A, 2qnx:B, 2qny:A, 2qny:B, 2qnz:A, 2qnz:B, 2qO0:A, 2qO0:B, 2qO1:A, 2qO1:B, 2qx1:A, 2qx1:B, 3dii:A, 3dii:B, 3dij:A, 3dij:B |
1gxy:A | 70–74 TALVA | Eukaryotic mono-ADP-ribosyltransferase ART2.2 (Rattus norvegicus) | 9 | 1jxh:A, 1jxi:A, 1jxi:B, 2eay:A, 2eay:B, 2uzh:C, 2uzh:A, 2uzh:B, 3ddy:A, 1llj:A |
1i4m:A | 62–66 VNITI | Prion protein domain (Homo sapiens) | a | 1e4k:C, 1e4j:A, 1fnl:A, 1hf1, 1op8:F, 1op8:C, 1op8:D, 2vov:A, 2vow:A, 2vox:A |
1jma:A | 233–237 VYSLK | HSV glycoprotein D (Herpes simplex virus type 1) | 9 | 1a22:B, 1axi:B, 1hwg:B, 1hwg:C, 1hwh:B, 3hhr:B, 3hhr:C, 1kf9:F, 1kf9:E, 1kf9:C, 1kf9:B, 2aew:A, 2aew:B |
1nth:A | 273–277 TTIVD | Monomethylamine methyltransferase MtmB (Archaeon Methanosarcina barkeri) | 8 | 1nfg:C, 1nfg:A, 1nfg:B, 1nfg:D, 1nu5:A |
1tca:A | 232–236 FSYVV | Triacylglycerol lipase (Candida antarctica, form b) | 7 | 1iic:A, 1iic:B, 1iid:A, 2nmt:A, 2p6e:F, 2p6e:E, 2p6e:C, 2p6e:A, 2p6e:B, 2p6e:D, 2p6f:F, 2p6f:E, 2p6f:C, 2p6f:A, 2p6f:B, 2p6f:D, 2p6g:F, 2p6g:E, 2p6g:C, 2p6g:A, 2p6g:B, 2p6g:D |
1v74:A | 99–103 RIYLE | Colicin D nuclease domain (Escherichia coli) | 5 | 1s3o:A, 1s3o:B, 2dud:A, 2dud:B, 3ull:A, 3ull:B |
1xg7:A | 149–153 IVFTV | Hypothetical protein PF0904 (Pyrococcus furiosus) | 5 | 2hew:F, 2hey:F, 2hey:G |
1muk:A | 505–509 SVAIL | Reovirus polymerase lambda3 Reovirus (TaxId: 10891)} | 9 | 1knx:C, 1knx:B, 1knx:D |
Proteins are sorted in descending order by the reliability score (RS) of the Pafig fibril-forming hexapeptide segment.
The reliability score is not shown for some proteins which were part of the training set of the Pafig support vector model.
Chameleons and discordance in the PrP family
In our data set here, PrP was the only known example of an amyloidogenic protein with a sequence segment that is both chameleonic and discordant (Table VII). Using new sequences for echinoderms, reptiles and birds,23 we re-examined the phylogenetic distribution of the α-helical discordance in PrP helix 2 and found that it not only occurs in mammals but also in birds and is absent from amphibian and reptilian PrP family members (Table VII and Supporting Information Table III). However, globular PrP domain structures from amphibians and reptiles do contain chameleon sequences (Supporting Information Table III). Analysis of the PrP discordant segments using Consurf24 indicates deep conservation in mammals for residues that have high beta propensity, such as valine [Supporting Information Fig. 1(A)]. This conservation trend is also correlated with an increased predicted relative importance for these residues as determined by the Evolutionary Trace algorithm;25,26 80% of the discordant stretch is found within the top 68% of important residues of the protein [Supporting Information Fig. 1(B)].
Table VII.
Discordant Chain | Protein | Discordant Segment | Discordant Sequence | Chameleon Sequences |
---|---|---|---|---|
1xyx:A | PrP Mouse (Mus musculus) | 60–64 | VNITI | 1e4j:A, 1e4k:A, 1fnl:C, 1hf1:A |
1b10:A | PrP Golden hamster (Mesocricetus auratus) | 56–60 | VNITI | 1e4j:A, 1e4k:A, 1fnl:C, 1hf1:A |
1i4m:A | PrP Human (Homo sapiens) | 62–66 | VNITI | 1e4j:A, 1e4k:A, 1fnl:C, 1hf1:A |
1y2s:A | PrP Sheep (Ovis aries) | 62–66 | VNITV | 1iz6:A |
1xyw:B | PrP American Elk (Cervus elaphus nelsoni) | 60–64 | VNITV | 1iz6:A |
1u3m:A | PrP Chicken (Gallus gallus) | 64–68 | ITVTE | |
1xyj:A | PrP Cat (Felis silvestris catus) | 60–64 | VNITV | 1iz6:A |
1xyk:A | Prp Dog (Canis familiaris) | 60–64 | VNITV | 1iz6:A |
1xyq:A | Prp Pig (Sus scrofa) | 60–64 | VNITV | 1iz6:A |
2fj3:A | Rabbit (Ornithorhynchus anatinus) | 56–60 | VNITV | 1iz6:A |
1dx0:A | Bovine | 57–61 | VNITV | 1iz6:A |
2k56:A | Bank Vole (Clethrionomys glareolus) | 62–66 | VNITI | 1e4j:A, 1e4k:A, 1fnl:C, 1hf1:A |
1lg4:A | Human Doppel (Homo sapiens) | 25–29 | RYYEA | 1a6c:A |
1i17:A | Mouse Doppel (Mus musculus) | 27–31 | RYYAA | 1crf:A |
The discordant stretch contains an N-glycosylation site (N-x-[T or S], where x is any residue); we checked whether this was a general phenomenon for discordance, but observed no significant association, with PrP being the only such case. However, one notable tendency is that the orthogonal bundle is the most observed protein architecture amongst the discordant proteins (20% of the protein list, Supporting Information Fig. 2), and is the same as that of the PrP fold.7,27
Interestingly, the combined chameleonic and α-helix discordant region was highly conserved throughout mammals (as VNITI or VNITV; Supporting Information Table III). Discordant stretches that are also chameleonic, are additionally observed in the first helix of Doppel (Table VII), thus providing a prediction for an amyloidogenic determinant in this protein.
Discussion
We have performed an exhaustive study for sequences capable of being found in secondary structure types, either explicitly, such as chameleons, or potentially, as is the case with discordant stretches. Conformational plasticity of these sequences makes them prime candidates for amyloidogenic segments, which are largely characterized by a conformational change from an α-helix to β-sheet conformation. To test this hypothesis it was imperative to first develop an understanding of the distribution of these segments in protein domains in general, to facilitate statistical comparisons with the subset of amyloid-forming proteins.
Our meta-analysis of discordant proteins in a nonredundant dataset of protein domains suggests possible roles of discordance, which have been overlooked in previous publications addressing this topic.8 We have observed that discordant protein domains are enriched for specific protein-fold types and functional categories. For example, using Gene Ontology (GO) terms, we have observed that the most frequent molecular functions of discordant proteins were “metal–ion binding” (GO:0046872) and “hydrolase” activity (GO:0016787), with more than a quarter of the discordant proteins exhibiting either activity (Supporting Information Table IV). Discordant proteins were found frequently in the “extracellular” region (GO:0005576, 16 proteins), and “membrane” (GO:0016020, 16 proteins) of cells, whereas the most frequent biological process of these proteins was “transport” (GO:0006810, 9 proteins). These results complement our findings that almost 10% of discordant proteins are viral proteins, where such functions are imperative for host interaction and viral replication and survival. An analysis of 3D folds using CATH also indicated significant enrichment of the orthogonal bundle (P ≤ 10−31, using hypergeometric probability) and three-layer αβα sandwich (P ≤ 10−11) architectures (Supporting Information Fig. 2), suggesting that amino acid orientations in these folds may promote discordance. The effect of these folds on the discordant stretches, and their implications on the overall function of their respective discordant proteins would be an interesting point for future research. Taken collectively, our findings shed light on other protein functions—besides amyloidogenicity—where discordance may be of importance, including protein–ligand interactions and viral replication.
Testing for enrichment of discordant and chameleon segments in amyloid proteins has revealed, contrary to our expectations, that these characteristics are poor predictors of amyloidogenic segments. When “discordance” was first proposed by Kallberg et al.,8 the authors proposed that discordant segments are associated with amyloid fibril formation, but no significant statistical relationship between the two was discerned. Interestingly, many more publications have since emerged involving sequence analysis of discordant segments and experimental analyses of “discordant” proteins from the Kallberg study, such as the Alzheimer Aβ peptide and lung surfactant proteins.28–30 However, none of the subsequent publications had rigorously tested for a statistical relationship between amyloid proteins and discordance, despite the increasing availability of protein domains and continuous identification of new amyloid sequences. Since the pioneering study of Kallberg,8 this is the first study to discern a statistical relationship between discordance and amyloidogenicity, to determine whether such segments are truly associated with amyloids. Our analysis raised a couple of important points, mainly that prediction of discordance is heavily dependent on the prediction algorithm. A decade after its publication, and with the current protein databases more than triple in size than the initial discordance study, our study indicates that discordant segments are only “moderately enriched” in amyloid proteins. Although our results, using the consensus predictions from both GOR and PSIPRED, are significant (P = 0.027), they are only “moderate” in comparison with the “high” enrichment observed using either of the tools separately, such as GOR (P < 0.000001). The GOR-IV algorithm works through considering all possible residue pair frequencies in a sliding window of 17 residues in length; it thus just considers the local sequence environment in a basic way; in contrast, the neural-network based PSIPRED method is a “black-box” machine-learning technique.31,32 We opted for a consensus approach to increase stringency of our selection criteria and prevent bias by the use of only one prediction algorithm. Comparing discordance predictions using GOR with other state-of-the-art algorithms is an interesting point for future research.
With respect to chameleon segments, to our knowledge, this is the first study that attempts to derive a statistical relationship between chameleons and amyloidogenicity. An initial analysis of chameleons in amyloid proteins and their determinants indicated a significant enrichment between frustrated and very frustrated chameleons and amyloid determinants, but a more rigorous analysis (removing overlapping chameleon segments) severely limits the classes of chameleons that still share a significant association (Table V). Hexapeptide frustrated chameleons are the only class of chameleons that remains enriched in these segments, but that significance is marginal (P = 0.04). Taken collectively, the paucity of chameleons and frustrated chameleons in observed amyloid proteins and their determinants, and their poor or marginal enrichment in these proteins, suggests that chameleons are not reliable predictors of amyloidogenic segments.
It was interesting to discover a significant enrichment of chameleons in discordant proteins (Table II), even after the removal of overlapping chameleons (Table V). This suggested a clear link between the phenomena and raised the question of how sequences sharing both characteristics may play a role in amyloidosis. Although our observations indicate that discordant and chameleon sequences, taken separately, are not reliable predictors of amyloidogenic segments, we found that 32% of the sequences with both phenomena may be prone to amyloidogenicity. These discordant-chameleonic segments and their proteins (identified in Table VI) include already known amyloidogenic proteins such as PrP. PrPs are responsible for a variety of neurodegenerative prion diseases, including human Creutzfeldt–Jakob disease, sheep scrapie, and bovine spongiform encephalopathy in cattle.1,33 Notably, our analysis of a conserved discordant segment in PrP helix 2 within mammals is further supported by genome-wide analysis34 and MD simulations34,35 of the PrP helices, which indicated conformational instability in the second half of helix 2, and a drastic decrease of α-helical content accompanied by an increase of β-strands during transition of PrP from its cellular form (PrPC) to its pathogenic, aggregated form (PrPSC). Interestingly, our analysis of ambiguous encoding in the prion family also identified a discordant-chameleonic segment in its paralog, Doppel, which may suggest an evolutionary importance for discordant-chameleonic segments. Notably, this segment in Doppel was the only identified chameleon segment for that protein, but its relationship to discordance had not been previously elucidated. The discordant-chameleonic segments we have identified in Table VI also include other proteins, such as HSV Glycoprotein D (1jma:A), whose homologs are already involved in amyloidogenicity. HSV1 is a member of the herpes virus and is proposed as a strong risk factor for Alzheimer's disease, which is primarily characterized by Amyloid Beta (Aβ) amyloid plaque formation in the brain. Neuronal and glial cells infected with HSV1 led to the increased production and rise in intracellular levels of Aβ amyloid protein, and Aβ amyloid plaques have been observed in mouse brains after HSV1 infection.36 Indeed, homology has been observed between the carboxyl-terminal region of the Aβ peptide with an internal sequence of HSV1 Glycoprotein B (gB), subsequently shown to form β-pleated sheets, self-assemble into fibrils, and accelerate AB fibril formation in vitro.37 In HSV1, gB is responsible for attaching the HSV protein to the cell surface, whereas glycoprotein D (gD) facilitates binding of the virus to cell surface receptors. Our discovery of discordance in HSV Glycoprotein D suggests that like Glycoprotein B HSV may contain several discordant proteins that facilitate its viral entry and ultimately contribute to amyloidosis. Experimental analyses of HSV Glycoprotein D and other interesting candidates from Table VI would we required to verify their role in amyloidosis, and to shed light on how the combined discordant-chameleonic effect may play a role in amyloid formation.
Although this study has focused on discordant and chameleonic segments, and their potential for amyloidogenicity based on secondary structure predictions and sequence analysis, it is worth noting that energy barriers also influence their potential for amyloidogenicity. As has been demonstrated for the VGSN peptide in Aβ,38 overcoming the large energy barriers of peptide interactions must first happen before aggregate structures can be formed. One aspect for future research, which is beyond the scope of this study, would be to analyze and compare the peptide interactions and energy barriers of discordant segments, chameleon segments, and discordant-chameleonic segments.
We have performed a meta-scale analysis of chameleon and discordant stretches in protein domains and amyloidogenic proteins and their determinants, to understand the extent to which these segments contribute to conformational flexibility in proteins, as well as their relationship to amyloid formation. From our analysis of discordant stretches in protein domains, we propose several protein functions where conformationally variable segments may play a strong role. Our analysis of discordant stretches in amyloidogenic proteins and their determinants indicates an enrichment of discordant stretches in amyloid determinants, but this enrichment is dependent on the prediction algorithm. To our knowledge, this is the first study to also address the statistical relationship between chameleons and amyloids, and after alleviating sources of potential bias, we conclude that chameleons are not reliable predictors of amyloidogenic segments. We have however uncovered interesting exceptions where a combination of discordant-chameleonic segments may be heavily involved in amyloidosis, but further experimental analysis would be required to develop an understanding of how they contribute to amyloid formation.
Materials and Methods
Protein data sets
We explored the distribution of discordant α-helices and chameleon sequences in three separate cohorts of protein sequence data: (i) single-chain protein domains from the SCOP database, (ii) known amyloid-forming proteins, and (iii) the PrP protein family.
SCOP protein domains
We downloaded a nonredundant set of “genetic” single-chain domain protein sequences (n = 4263) from ASTRALSCOP,39 based on PDB SEQRES records (release 1.73, astral-scopdom-seqres-gd-all-1.73). This was the nonredundant set made such that all sequences in it have pairwise sequence similarity ≤40%).
Also, we derived a data of single protein chains from the entire DSSP database,40 also with a 40% sequence identity threshold (n = 10,940).
Known amyloid-forming proteins
We identified pathogenic and nonpathogenic amyloid-forming proteins through cross-referencing the current literature with the UniProt database.16–18,20,41 In Uniprot, 59 identifiers were mapped to 1346 PDB structures. Of these, we selected a nonredundant set of PDB sequences with less than 40% identity, using the PISCES procedure (n = 77; Supporting Information Table V.21
Amyloidogenic determinants are defined as the union of overlapping subsequences within amyloidogenic protein chains that have been experimentally determined to be amyloidogenic. We determined a nonredundant list of 45 amyloidogenic determinants from the current literature (Supporting Information Table VI)16,20; of these, 17 determinants had an α-helical sequence of ≥5 residues.
The PrP protein family
We selected representative prion and doppel three-dimensional structures from the PDB and SUPERFAMILY databases,42 ignoring mutant or engineered models. For species with multiple structures, we selected one structure per species as designated by SUPERFAMILY. A total of 16 PDB structures were selected, comprising two Doppels (Human: 1lg4; Mouse: 1i17) and 14 PrPs (Mouse: 1xyx; Hamster: 1b10; Human: 1i4m; Sheep: 1y2s; Elk: 1xyw; Chicken: 1u3m; Turtle: 1u5l; Frog: 1xu0; Cat: 1xyj; Dog: 1xyk; Pig: 1xyq; Bovine: 1dx0; Bank Vole: 2k56; Rabbit: 2fj3). We selected from Genbank additional prion sequences for which a three-dimensional structure was not available. A total of 24 such sequences were selected (Fruit Bat: gi|27733840; Eurasian bat: gi|27733844; Silky Anteater: gi|27733872; Nine-banded Armadillo: gi|202071082; Asiatic elephant: gi|27733858; African Bush elephant: gi|182636942; Sunda flying lemur: gi|27733816; Large tree shrew: gi|27733818; Aardvark: gi|27733864; Elephant shrew: gi|27733866; Carribean manatee: gi|27733860; Platypus: gi|171473244; Hottentot Golden mole: gi|27733870; Short-tailed opossum: gi|91680539; Tammar Wallaby: gi|49618779; Sperm whale: gi|27733856; Zebrafish: gi|45387601; Zebra finch: gi|123303169; Bottle-nosed dolphin: gi|61743503; Hippopotamus: gi|27733854; Roe Deer: gi|50442322; Domestic goat: gi|119489906; Chimpanzee: gi|56122310; Orangutan: gi|474369).
Experimentally determined and predicted secondary structures
Secondary structures assignments of three-dimensional protein structures were extracted from the Dictionary of Secondary Structure of Proteins (DSSP).40 DSSP defines eight classes of secondary structures based on hydrogen bond patterns: α-helix (H), 310-helix (G), π-helix(I), extended strand (E), isolated β-bridge (B), hydrogen bonded turn (T), bend (S), and coil (_). To facilitate comparison of DSSP with secondary structure prediction tools, these classes were reduced to three states (helix, strand, and loop), as in previous analyses.8 Secondary structures predictions on selected proteins were performed using the GOR-IV43 and PSIPRED44 algorithms, with default parameters. Both programs employ a three-state classification of secondary structures [“helix,” “strand,” and “loop” (or “Coil”)].
Identification of discordant stretches
A schematic of the definition of discordant α-helices is illustrated in [Fig. 3(A)]. Discordant α-helices were identified if discordance was observed between DSSP secondary-structure assignments and: (i) the GOR-IV secondary-structure prediction algorithm alone; (ii) the PSIPRED secondary-structure algorithm alone; (iii) the consensus of the GOR IV and PSIPRED secondary structure prediction algorithms.8 Discordant stretches ≥5 residues were selected for further statistical analysis.
Structural and functional analysis of discordant proteins
Classification of protein architectures was determined using CATH.45 Over-representation of CATH architectures was calculated using hypergeometric probability. Molecular functions and biological processes were determined using the GO46 database, and functional relationships between GO terms and protein sequence sets were mapped using GOLEM47 and DAVID.48
Identification of chameleon sequences
Using an in-house script, we identified α−β chameleon sequences by searching for the same 5-mer and 6-mer protein sequence words in both α-helices and β-strands from DSSP secondary structure assignments,40 in SCOP protein domains39 [Fig. 3(B)]. NetCSSP49 and ChamSequence Finder50 were also used to identify additional cases.
Calculation of secondary structure propensities
Conformational preferences for α-helix or β-strand for any protein subsequence was determined by averaging the respective amino acid propensities of the whole subsequence using the physiochemical scales of Chou and Fasman.51 For chameleon sequences, we defined “frustrated chameleons” as chameleon sequences with higher β than α propensity, and “very frustrated chameleons” as chameleon sequences with very high β propensity values [operationally, with a ≥1.5-fold occurrence of beta propensity over alpha propensity; Fig. 3(C)].
Prediction of amyloid fibrillogenicity
We used the algorithm Pafig22 to identify sequence segments that predict as fibril-forming hexapeptides. Pafig uses a support vector machine (SVM) to identify fibril-forming hexapeptides based on a classifier of 41 physiochemical properties of amino acids, with an overall prediction accuracy (Q2) of 81% and a Matthews correlation coefficient of 0.63. Predictions generated includes a reliability index (RI) based on the output of the SVM. Fibril-forming segments are defined as having RIs ≥ 5 (out of 10).
Evolutionary conservation
Evolutionary conservation was analyzed using the tool Consurf.24 Data residue variety for each of the amino-acid positions in each segment was noted, and the greatest and least occurring amino acid per position in a multiple sequence alignment. Evolutionary importance of conformationally-variable segments was also determined using the Evolutionary Trace Report Maker25 and Evolutionary Trace Viewer.26
Acknowledgments
The authors thank Dr. Manish Kumar for technical assistance with the PSIPRED program.
References
- 1.Aguzzi A, Sigurdson C, Heikenwaelder M. Molecular mechanisms of prion pathogenesis. Molecular Mechanisms of Prion Pathogenesis. 2008;3:11–40. doi: 10.1146/annurev.pathmechdis.3.121806.154326. [DOI] [PubMed] [Google Scholar]
- 2.Caughey B, Lansbury PT. Protofibrils, pores, fibrils, and neurodegeneration: separating the responsible protein aggregates from the innocent bystanders. Ann Rev Neurosci. 2003;26:267–298. doi: 10.1146/annurev.neuro.26.010302.081142. [DOI] [PubMed] [Google Scholar]
- 3.Chiti F, Dobson CM. Protein misfolding, functional amyloid, and human disease. Ann Rev Biochem. 2006;75:333–366. doi: 10.1146/annurev.biochem.75.101304.123901. [DOI] [PubMed] [Google Scholar]
- 4.Kajava AV, Steven AC, Andrey Kajava JMS, David ADP. β-rolls, β-helices, and Other β-Solenoid Proteins. Advances in Protein Chemistry. San Diego, California, USA: Elsevier Academic Press; 2006. pp. 55–96. [DOI] [PubMed] [Google Scholar]
- 5.Kajava AV, Squire JM, Parry DAD, Andrey Kajava JMS, David ADP. β-Structures in Fibrous Proteins Advances in Protein Chemistry. San Diego, California, USA: Elsevier Academic Press; 2006. pp. 1–15. [DOI] [PubMed] [Google Scholar]
- 6.Nelson R, Sawaya MR, Balbirnie M, Madsen AO, Riekel C, Grothe R, Eisenberg D. Structure of the cross-β spine of amyloid-like fibrils. Nature. 2005;435:773–778. doi: 10.1038/nature03680. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Dima RI, Thirumalai D. Exploring the propensities of helices in PrPC to form β-sheet using NMR structures and sequence alignments. Biophys J. 2002;83:1268–1280. doi: 10.1016/S0006-3495(02)73899-X. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Kallberg Y, Gustafsson M, Persson B, Thyberg J, Johansson J. Prediction of amyloid fibril-forming proteins. J Biol Chem. 2000;276 (16):12945–12950. doi: 10.1074/jbc.M010402200. [DOI] [PubMed] [Google Scholar]
- 9.Bruce IC, Scott RP, Fred EC. Origins of structural diversity within sequentially identical hexapeptides. Protein Sci. 1993;2:2134–2145. doi: 10.1002/pro.5560021213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Igor BK, Shalom R. Comparative computational analysis of prion proteins reveals two fragments with unusual structural properties and a pattern of increase in hydrophobicity associated with disease-promoting mutations. Protein Sci. 2004;13:3230–3244. doi: 10.1110/ps.04833404. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Jun-Tao G, Jerzy WJ, Ying X. Analysis of chameleon sequences and their implications in biological processes. Proteins. 2007;67:548–558. doi: 10.1002/prot.21285. [DOI] [PubMed] [Google Scholar]
- 12.Mezei M. Chameleon sequences in the PDB. Protein Eng. 1998;11:411–414. doi: 10.1093/protein/11.6.411. [DOI] [PubMed] [Google Scholar]
- 13.Minor DL, Kim PS. Context-dependent secondary structure formation of a designed protein sequence. Nature. 1996;380:730–734. doi: 10.1038/380730a0. [DOI] [PubMed] [Google Scholar]
- 14.Kabsch W, Sander C. On the use of sequence homologies to predict protein-structure—identical pentapeptides can have completely different conformations. Proc Natl Acad Sci USA. 1984;81:1075–1078. doi: 10.1073/pnas.81.4.1075. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Xianghong Z, Frank A, Gerd F, Gaston HG, Gareth C. An analysis of the helix-to-strand transition between peptides with identical sequence. Proteins. 2000;41:248–256. doi: 10.1002/1097-0134(20001101)41:2<248::aid-prot90>3.0.co;2-j. [DOI] [PubMed] [Google Scholar]
- 16.Harrison RS, Sharpe PC, Singh Y, Fairlie DP. Reviews of physiology, biochemistry and pharmacology. Vol. 159. Berlin: Springer-Verlag Berlin; 2007. pp. 1–77. Amyloid peptides and proteins in review. [DOI] [PubMed] [Google Scholar]
- 17.Uversky VN. Amyloidogenesis of natively unfolded proteins. Curr Alzheimer Res. 2008;5:260–287. doi: 10.2174/156720508784533312. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Uversky VN, Fink AL. Conformational constraints for amyloid fibrillation: the importance of being unfolded. BBA-Proteins Proteomics. 2004;1698:131–153. doi: 10.1016/j.bbapap.2003.12.008. [DOI] [PubMed] [Google Scholar]
- 19.Alberti S, Halfmann R, King O, Kapila A, Lindquist S. A systematic survey identifies prions and illuminates sequence features of prionogenic proteins. Cell. 2009;137:146–158. doi: 10.1016/j.cell.2009.02.044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Susan T, Andrew JD. Amyloidogenic sequences in native protein structures. Protein Sci. 2010;19:327–348. doi: 10.1002/pro.314. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Wang G, Dunbrack RL., Jr PISCES: a protein sequence culling server. Bioinformatics. 2003;19:1589–1591. doi: 10.1093/bioinformatics/btg224. [DOI] [PubMed] [Google Scholar]
- 22.Tian J, Wu N, Guo J, Fan Y. Prediction of amyloid fibril-forming segments based on a support vector machine. BMC Bioinformatics. 2009;10:S45–S52. doi: 10.1186/1471-2105-10-S1-S45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Harrison PM, Khachane A, Kumar M. Genomic assessment of the evolution of the prion protein gene family in vertebrates. Genomics. 2010;95:268–277. doi: 10.1016/j.ygeno.2010.02.008. [DOI] [PubMed] [Google Scholar]
- 24.Landau M, Mayrose I, Rosenberg Y, Glaser F, Martz E, Pupko T, Ben-Tal N. ConSurf 2005: the projection of evolutionary conservation scores of residues on protein structures. Nucl Acids Res. 2005;33:W299–W302. doi: 10.1093/nar/gki370. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Mihalek I, Res I, Lichtarge O. Evolutionary trace report_maker: a new type of service for comparative analysis of proteins. Bioinformatics. 2006;22:1656–1657. doi: 10.1093/bioinformatics/btl157. [DOI] [PubMed] [Google Scholar]
- 26.Morgan DH, Kristensen DM, Mittelman D, Lichtarge O. ET viewer: an application for predicting and visualizing functional sites in protein structures. Bioinformatics. 2006;22:2049–2050. doi: 10.1093/bioinformatics/btl285. [DOI] [PubMed] [Google Scholar]
- 27.Riek R, Hornemann S, Wider G, Billeter M, Glockshuber R, Wuthrich K. NMR structure of the mouse prion protein domain PrP(121–231) Nature. 1996;382:180–182. doi: 10.1038/382180a0. [DOI] [PubMed] [Google Scholar]
- 28.Johansson J. Membrane properties and amyloid fibril formation of lung surfactant protein. Biochem Soc Trans. 2001;29:601–606. doi: 10.1042/bst0290601. [DOI] [PubMed] [Google Scholar]
- 29.Li J, Hosia W, Hamvas A, Thyberg J, Jornvall H, Weaver TE, Johansson J. The N-terminal propeptide of lung surfactant protein C is necessary for biosynthesis and prevents unfolding of a metastable alpha-helix. J Mol Biol. 2004;338:857–862. doi: 10.1016/j.jmb.2004.03.051. [DOI] [PubMed] [Google Scholar]
- 30.Paivio A, Nordling E, Kallberg Y, Thyberg J, Johansson J. Stabilization of discordant helices in amyloid fibril-forming proteins. Protein Sci. 2004;13:1251–1259. doi: 10.1110/ps.03442404. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Garnier J, Gibrat J-F, Robson B, Russell FD. GOR Method for Predicting Protein Secondary Structure from Amino Acid Sequence. Methods in Enzymology. San Diego, California, USA: Elsevier Academic Press; 1996. pp. 540–553. [DOI] [PubMed] [Google Scholar]
- 32.King RD, Sternberg MJ. Identification and application of the concepts important for accurate and reliable protein secondary structure prediction. Protein Sci. 1996;5:2298–2310. doi: 10.1002/pro.5560051116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Prusiner SB. Prions. Proc Natl Acad Sci USA. 1998;95:13363–13383. doi: 10.1073/pnas.95.23.13363. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Dima RI, Thirumalai D. Probing the instabilities in the dynamics of helical fragments from mouse PrPC. Proc Natl Acad Sci USA. 2004;101:15335–15340. doi: 10.1073/pnas.0404235101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Bae S-H, Legname G, Serban A, Prusiner SB, Wright PE, Dyson HJ. Prion proteins with pathogenic and protective mutations show similar structure and dynamics. Biochemistry. 2009;48:8120–8128. doi: 10.1021/bi900923b. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Wozniak MA, Itzhaki RF, Shipley SJ, Dobson CB. Herpes simplex virus infection causes cellular beta-amyloid accumulation and secretase upregulation. Neurosci Lett. 2007;429:95–100. doi: 10.1016/j.neulet.2007.09.077. [DOI] [PubMed] [Google Scholar]
- 37.Cribbs DH, Azizeh BY, Cotman CW, LaFerla FM. Fibril formation and neurotoxicity by a herpes simplex virus glycoprotein B fragment with homology to the Alzheimer's A beta peptide. Biochemistry. 2000;39:5988–5994. doi: 10.1021/bi000029f. [DOI] [PubMed] [Google Scholar]
- 38.Tarus B, Straub JE, Thirumalai D. Dynamics of Asp23→Lys28 salt-bridge formation in AÎ→10–35 monomers. J Am Chem Soc. 2006;128:16159–16168. doi: 10.1021/ja064872y. [DOI] [PubMed] [Google Scholar]
- 39.Chandonia J-M, Hon G, Walker NS, Lo Conte L, Koehl P, Levitt M, Brenner SE. The ASTRAL compendium in 2004. Nucl Acids Res. 2004;32:D189–D192. doi: 10.1093/nar/gkh034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Kabsch W, Sander C. Dictionary of protein secondary structure—pattern-recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983;22:2577–2637. doi: 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]
- 41.The UniProt C. The Universal Protein Resource (UniProt) in 2010. Nucl Acids Res. 2010;38:D142–D148. doi: 10.1093/nar/gkp846. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Gough J, Chothia C. SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments. Nucl Acids Res. 2002;30:268–272. doi: 10.1093/nar/30.1.268. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Sen TZ, Jernigan RL, Garnier J, Kloczkowski A. GOR V server for protein secondary structure prediction. Bioinformatics. 2005;21:2787–2788. doi: 10.1093/bioinformatics/bti408. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Jones DT. Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol. 1999;292:195–202. doi: 10.1006/jmbi.1999.3091. [DOI] [PubMed] [Google Scholar]
- 45.Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM. CATH—a hierarchic classification of protein domain structures. Structure. 1997;5:1093–1108. doi: 10.1016/s0969-2126(97)00260-8. [DOI] [PubMed] [Google Scholar]
- 46.Gene Ontology C. The Gene Ontology (GO) database and informatics resource. Nucl Acids Res. 2004;32:D258–D261. doi: 10.1093/nar/gkh036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Sealfon R, Hibbs M, Huttenhower C, Myers C, Troyanskaya O. GOLEM: an interactive graph-based gene-ontology navigation and analysis tool. BMC Bioinformatics. 2006;7:443–451. doi: 10.1186/1471-2105-7-443. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Huang DW, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2008;4:44–57. doi: 10.1038/nprot.2008.211. [DOI] [PubMed] [Google Scholar]
- 49.Kim C, Choi J, Lee SJ, Welsh WJ, Yoon S. NetCSSP: web application for predicting chameleon sequences and amyloid fibril formation. Nucl Acids Res. 2009;37:W469–W473. doi: 10.1093/nar/gkp351. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Yoon S, Jung H. Analysis of chameleon sequences by energy decomposition on a pairwise per-residue basis. Protein J. 2006;25:361–368. doi: 10.1007/s10930-006-9023-6. [DOI] [PubMed] [Google Scholar]
- 51.Chou PY, Fasman GD. Conformational parameters for amino-acids in helical, beta-sheet, and random coil regions calculated from proteins. Biochemistry. 1974;13:211–222. doi: 10.1021/bi00699a001. [DOI] [PubMed] [Google Scholar]
- 52.Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]