Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2018 Jan 29;13(1):e0191222. doi: 10.1371/journal.pone.0191222

ProtDataTherm: A database for thermostability analysis and engineering of proteins

Hassan Pezeshgi Modarres 1,2,3, Mohammad R Mofrad 2,4, Amir Sanati-Nezhad 1,5,*
Editor: Eugene A Permyakov6
PMCID: PMC5788348  PMID: 29377907

Abstract

Protein thermostability engineering is a powerful tool to improve resistance of proteins against high temperatures and thereafter broaden their applications. For efficient protein thermostability engineering, different thermostability-classified data sources including sequences and 3D structures are needed for different protein families. However, no data source is available providing such data easily. It is the first release of ProtDataTherm database for analysis and engineering of protein thermostability which contains more than 14 million protein sequences categorized based on their thermal stability and protein family. This database contains data needed for better understanding protein thermostability and stability engineering. Providing categorized protein sequences and structures as psychrophilic, mesophilic and thermophilic makes this database useful for the development of new tools in protein stability prediction. This database is available at http://profiles.bs.ipm.ir/softwares/protdatatherm. As a proof of concept, the thermostability that improves mutations were suggested for one sample protein belonging to one of protein families with more than 20 mesophilic and thermophilic sequences and with known experimentally measured ΔT of mutations available within ProTherm database.

Introduction

Thermophilic and hyper thermophilic microorganisms have become attractive to scientists specifically after reporting the microorganisms living at temperatures higher than 75°C (1). The extracted enzymes from such high temperature tolerating microorganisms have been studied to understand modulating factors of their improved thermostability and then to use it as a guidance for improving thermostability of proteins with lower thermal stability for biotechnological applications [1]. The knowledge about the preferred living temperature of microorganisms can help to approximate thermostability criteria of their expressed proteins and a direct relationship between the growth temperature of microorganisms and the melting point of their corresponding proteins [2]. Currently available data on homologous proteins are valuable for engineering of proteins to gain higher stability by for example introducing more salt-bridges or strengthening the hydrophobic cores within protein structure [3]. Although structure-based protein engineering, known as rational engineering or rational design, is the most popular methodology for thermostability engineering of proteins, the limited number of available protein structures is still a challenge to prevalent utilization of the methodology [4]. On the other hand, because of modern advances in DNA sequencing technologies, the number of sequenced proteins belonging to different families is growing rapidly [3, 5]. Advances in applications of protein sequences for protein engineering could assist the existing routine structure-based rational methods. The consensus concept (CC) is the most popular sequence-based protein engineering approach to extract thermo-stabilizing mutations out of homologous sequences [617]. In CC approach, a multiple sequence alignment (MSA) is first made and then non-consensus residues are substituted by the most frequently occurring amino acids [5]. However, there is no guarantee that all suggested mutations induced by CC approach can increase thermostability [9, 14, 16, 18]. To detect thermo-stabilizing mutations with higher probability, one can take the advantage of comparing the target sequence with homologues tolerant at higher temperatures [3]. To make it feasible for different families of proteins, one needs to have access to other proteins from the same family with a higher thermal stability. However, the main challenge using this method is the difficulty in finding homologues with a label showing the thermostability category. To overcome this challenge, we developed a comprehensive database that contains protein sequences that belongs to different microorganisms and clustered based on the Pfam ID. The user can find the Pfam ID of a protein of interest and find its homologues, categorized as psychrophilic, mesophilic and thermophilic. In addition to sequences, PDB IDs are also provided if a 3D structure is available for the Pfam ID of interest.

Materials and methods

First, a database was made for microorganisms such that each microorganism is categorized based on its growth temperature (GT) using BacDive [19] and NCBI [20] databases. For every microorganism, all available sequences with their corresponding sequence information, including Pfam ID [21] and PDB ID [22], if available, were obtained from UniProt database [23]. All the process was conducted using python programming language [24], incorporating Biopython module [25] (Fig 1).

Fig 1. Flowchart of the database formation.

Fig 1

In our database, all protein sequences have two labels: Pfam IDs and thermostability category. To facilitate the use of the database for thermostability analysis and engineering, sequences are clustered based on their Pfam IDs. For each Pfam ID cluster, we can find proteins from the same family labeled with their thermostability category. Therefore, for a target protein sequence, the user can find the corresponding Pfam ID from the Pfam database [21] and uses the Pfam ID as the primary input to search over the database. For each Pfam ID family, we categorized sequences based on their Uniprot IDs as psychrophilics (GT< 20°C), mesophilics (20°C < GT < 40°C), and thermophilic (40°C <GT). For each protein family, the available PDB structures are shown and categorized like sequences. All sequence IDs, protein family IDs, and PDB IDs, are UniProt, Pfam and RCSB IDs, respectively.

For the case study, first, Pfams containing more than 20 mesophilic and thermophilic sequences were found. Then, for pattern analysis, the AXB patterns were considered in each sequence where A and B can be any of 20 standard amino acids and X is a separation number between 0 and 10. Therefore, A0B means all double amino acid compositions that are subsequent like VE, and A1B patterns are all double amino acid compositions that there is one amino acid between them. For example, all patterns with Ala as the first amino acid, Val as the second, and with only one amino acid spacing between Ala and Val from the 20 standard amino acids are considered as A1V. The condition 0 = <X = <10 was used for the spacing values. Furthermore, for any of sequences in mesophilic and thermophilic sequences, the number of occurring AXB patterns were counted and saved for each sequence. Finally, we have a group of data for both mesophilic and thermophilic sequences with the corresponding patterns. Therefore, for a given AXB (e.g. V4H pattern), there is one group of numbers for mesophilic and thermophilic categories with their corresponding average number. The Rank Sum test with critical p-value of 0.05 was used to detect AXB patters and distinguish mesophilic sequences from thermophilic sequences.

Results and discussion

A PHP webpage is designed as the user interface to access the database. The user can find the Pfam ID for a protein of interest (e.g. using Pfam database) and search it in the first page of the website (Fig 2, panel A). The results are then presented in the next page including all available sequences and structures within the database for the submitted Pfam ID (Fig 2, panel B). The database contains more than 14 million protein sequences and PDB structures for 9962 protein family, categorized based on their thermal stability as psychrophilic, mesophilic and thermophilic (Table 1). Totally, there are 14155392 protein sequences and 30950 PDB structures available in the database. For 957 members of protein families there is at least one PDB structure available for a thermophilic protein that can be used for structural comparison between mesophilic and thermophilic proteins (Table 1). In addition, for 3355 protein families there are at least 20 sequences belonging to thermophilic proteins as well as 3046 protein families with at least 20 sequences belonging to psychrophilic proteins. For such protein families, we can use amino acid content comparison between psychrophilic/mesophilic and mesophilic/thermophilic proteins to gain protein family-based specific knowledge of thermostability modulating factors.

Fig 2. The view of the webpage.

Fig 2

A) Users can enter the Pfam ID as input at the first page. B), All available sequences and structures are presented for different classes at the result page.

Table 1. The distribution of protein sequences and structures over the three classes of thermostability.

Mesophilic sequences 13111756
Thermophilic sequences 661072
Psychrophilic sequences 382564
Mesophilic structures 23069
Thermophilic structures 7741
Psychrophilic structures 140
Pfams with at least one Mesophilic structure 2306
Pfams with at least one Thermophilic structure 957
Pfams with at least one Psychrophilic structure 82
Pfams with at least 20 Thermophilic sequence 3355
Pfams with at least 20 Psychrophilic sequence 3046

Other databases

Two databases, namely PGTdb [26] and Protherm [27], are presently available to provide data concerning protein thermostability. To the knowledge of authors, the PGTdb database is not presently available while it was the only resource that could provide experimental information about thermostability classification of protein sequences based on GT of their corresponding organisms (psychrophilic, mesophilic and thermophilic). On the other hand, ProTherm database provides thermodynamics data for mutagenesis but only for a limited number of proteins. Our database contains much higher number of microorganisms, protein sequences and PDB structures. This database categorizes all the sequences for different Pfam families according to their thermostability criteria and provides easier access to the needed data for analysis and engineering of protein families.

Case study: Pattern recognition for protein engineering

One important goal of all thermostability analysis is to understand how one can take advantage of the knowledge from analysis of the differences between two categories, engineer mesophilics by minimum number of mutations, and enhance protein thermostability towards thermophilic sequences. Here, as a case study, we selected a protein belonging to one of those protein families with more than 20 mesophilic and thermophilic sequences where its ΔT of mutations is experimentally available within ProTherm database. In the ProTherm database, ribonuclease H from Escherichia Coli (strain K12) (with PDB_ID of 2RN2, solved using X-ray diffraction, resolution 1.48Å) was selected. Ribonuclease belongs to Pfam ID of PF00075, with the reported ΔT upon mutation using thermal experiments and is amongst the proteins with the highest number of reported thermodynamic measurements for the effect of mutations on its stability.

An algorithm (Algorithm 1) is designed to suggest thermostability improving mutations: for all AXB patterns with meaningful population difference between mesophilic and thermophilic sequences in the family (Pfam ID of PF00075) (see methods for definition of meaningful population difference), we chose those AXB patterns that have a higher average number of repeats than mesophilic within thermophilic category. We then found AXY patterns in the target sequence (ribonuclease H from Escherichia Coli) that the Y is not equal to B in the pattern. For these selected patterns, we suggest Y→B mutation. The same approach was used for ZXB to suggest Z→A mutations. If the mutation was available in the ProTherm database, the ΔT value was checked. If ΔT > 0, the suggested mutation was considered as a successful thermostability improving suggestion and if ΔT < 0, it was defined as a failed suggestion. The results are shown in Table 2 where 72% of the suggested mutations can improve thermostability. This result confirms that the proposed method can be considered as a sequence-based thermostability engineering method only if we have categorized sequences as thermophilic and mesophilic for protein family of the target proteins. The accuracy of the suggested mutations for thermostability engineering is expected to be improved over such a database by recruiting more complicated methods like machine learning techniques. However, further studies with incorporation of more proteins from diverse range of protein families should be conducted to better evaluate the accuracy of this method.

Table 2. Ave_The: Average of the number of patterns for thermophilic sequences, Ave_Mes: Average of the number of patterns for mesophilic sequences.

Pattern Positions on Sequence Mutation ΔT P_value Ave_The Ave_Mes
ER 61 E, H 62 H 62 R 1.3 0.0032 1.782 1.474
LR 74 V, R 75 V 74 L 3.7 0.0106 1.773 1.492
LE 134 D, E 135 D 134 L 5.5 0.0007 1.918 1.437
NK 95 K, K 96 K 95 N 3.2 0.00841 1.951 1.43
SI 52 A, I 53 A 52 S -5.8 0.04146 1.579 1.21
SG 10 D, G 11 D 10 S 9.2 0.0083 1.836 1.63
KI 52 A, I 53 A 52 K 19.5 0.0398 2.385 1.69
EG 10 D, G 11 D 10 E 3.8 0.012 1.74 1.376
F1E 8 F, D 10 D 10 E 3.8 0.0199 1.463 1.242
F1S 8 F, D 10 D 10 S 9.2 0.0186 1.767 1.284
C2N 41 R, N 44 R 41 C 1.6 0.0002 1.282 1.052
A2Y 70 D, Y 73 D 70 A 3.8 0.004 1.647 1.304
E2Y 70 D, Y 73 D 70 E 1.8 0.0331 1.583 1.12
E2C 10 D, C 13 D 10 E 3.8 0.008 1.409 1
L2N 49 L, A 52 A 52 N -5.9 0.0323 1.617 1.301
L2N 67 L, D 70 D 70 N 5.5 0.0323 1.617 1.301
V2K 119 E, K 122 E 119 V 2.7 0.0379 1.635 1.264
N3I 130 N, D 134 D 134 I 4.6 0.0281 1.667 1.246
N3N 130 N, D 134 D 134 N 6.4 0.0004 1.658 1.265
N3E 130 N, D 134 D 134 E 3.1 0.0353 1.757 1.557
N3V 130 N, D 134 D 134 V 4.1 0.0031 1.541 1.299
N3V 70 D, V 74 D 70 N 5.5 0.0031 1.541 1.299
R3V 91 K, K 95 K 91 R 0.5 0.0005 1.554 1.26
V3Y 24 A, Y 28 A 24 V 3.2 0.0419 1.638 1.136
E3V 48 E, A 52 A 52 V 7.8 0.0133 2.023 1.852
E3V 64 E, S 68 S 68 V 1.9 0.0133 2.023 1.852
E3V 70 D, V 74 D 70 E 1.8 0.0133 2.023 1.852
E3V 94 D, V 98 D 94 E -1.2 0.0133 2.023 1.852
Y3L 52 A, L 56 A 52 Y -7.6 0.0146 1.636 1.082
C4E 52 A, E 57 A 52 C 2.5 0.0175 1.4 1
V4Y 68 S, Y 73 S 68 V 1.9 0.0162 1.528 1.079
N4R 70 D, R 75 D 70 N 5.5 7.34E-09 1.587 1.155
N4K 130 N, E 135 E 135 K -0.8 6.92E-05 2.329 1.678
N4E 52 A, E 57 A 52 N -5.9 0.0127 1.664 1.317
Q5N 4 Q, D 10 D 10 N 6.8 0.0361 1.696 1.16
E5N 64 E, D 70 D 70 N 5.5 0.00257 1.615 1.36
E5V 10 D, N 16 D 10 E 3.8 0.0025 1.615 1.3
E5N 94 D, N 100 D 94 E -1.2 0.0025 1.615 1.36
R5P 46 R, A 52 A 52 P -5.4 0.0499 1.37 1.217
R5P 91 K, P 97 K 91 R 0.5 0.0499 1.37 1.217
R5I 46 R, A 52 A 52 I 6.2 0.0299 1.429 1.206
R5Y 46 R, A 52 A 52 Y -7.6 0.0176 1.483 1.116
L5P 56 L, H 62 H 62 P 4.1 0.0009 1.59 1.316
L5P 107 L, Q 113 Q 113 P -0.6 0.0009 1.59 1.316
L5L 80 Q, K 86 Q 80 L 1 0.0001 2.102 1.618
K6E 3 K, D 10 D 10 E 3.8 0.027 2.111 1.712
K6E 87 K, D 94 D 94 E -1.2 0.027 2.111 1.712
N6I 45 N, A 52 A 52 I 6.2 0.007 1.554 1.2
L6L 67 L, V 74 V 74 L 3.7 0.0002 2.008 1.684
L6L 52 A, L 59 A 52 L 4.3 0.0002 2.008 1.684
L6K 80 Q, K 87 Q 80 L 1 0.0017 1.884 1.47
N6T 45 N, A 52 A 52 T -2.7 0.01419 1.491 1.261
I7I 66 I, V 74 V 74 I 2.4 0.0043 1.618 1.241
I7I 74 V, I 82 V 74 I 2.4 0.0043 1.618 1.241
L7K 52 A, K 60 A 52 L 4.3 0.0395 1.67 1.355
L7I 74 V, I 82 V 74 L 3.7 0.0018 1.704 1.346
K7E 86 K, D 94 D 94 E -1.2 0.0045 1.971 1.573
K7K 52 A, K 60 A 52 K -19.5 0.0003 2.059 1.632
G7N 126 G, D 134 D 134 N 6.4 3.92E-07 1.934 1.473
Y7K 52 A, K 60 A 52 Y -7.6 0.0202 1.705 1.262
N7T 44 N, A 52 A 52 T -2.7 0.0001 1.767 1.215
N7V 16 N, A 24 A 24 V 3.2 0.0005 1.632 1.311
N7V 44 N, A 52 A 52 V 7.8 0.0005 1.632 1.311
F7K 52 A, K 60 A 52 F -1.5 0.0211 1.636 1.237
R7K 91 K, K 99 K 91 R 0.5 0.0111 1.446 1.2

Algorithm 1. Thermostability improving mutation suggestion algorithm.

Input. Protein sequence, P-fam ID, and thermophilic and mesophilic

distinguishing AXB patterns for the P-fam ID

Output. Mutation list

for all AXB patterns for the P-fam ID do:

    if AXBaveragethermophilic>AXBaveragemesophilic then:

      find AXY or ZXB patterns in the target sequence where Y is

      not B or Z is not A

      add Y → B or Z → A to mutation list

    end

end

return mutation list

Applications

The database developed in this work can be used for building protein thermostability mutation libraries using different approaches like CC and also comparison of the target sequence with its homologues with higher thermostability [17, 28, 29]. In addition, it can be used for systemic analysis of modulating factors of thermostability [3032] for different families, while thermostability modulating factors can vary from family to family [3]. Furthermore, it is noteworthy that while the thermophilic sequence belongs to microorganisms that are tolerant to harsh conditions in general and not only to temperature, these data can be used for optimization of a target sequence for new applications under other harsh conditions than temperature, like intense pH and high concentration of salts. Altogether, this database provides the most important needed data for sequence-based protein engineering and analysis for researchers to develop new analysis and engineering tools in the field of thermal stability. This database is not only useful for general industrial and research purposes but also applicable for drug design [17, 33, 34]

Conclusions

Here we present the first release of ProtDataTherm database that contains more than 14 million protein sequences and structures belonging to microorganisms with different preferred living temperatures. All sequences and structures are labeled as psychrophilic, mesophilic and thermophilic. For ease of use, the sequences are classified based on their Pfam IDs. Users can find homologous sequences for their protein of interest by knowing its Pfam ID. This database can be applied not only for probing stability modulating factors within protein families but also for knowledge-based protein stability engineering.

Availability

This database is available at http://profiles.bs.ipm.ir/softwares/protdatatherm. The database can be accessible free of charge for academic users on demand.

Acknowledgments

This work is supported in part by a grant from the Institute for Research in Fundamental Sciences (IPM), Tehran, Iran (grant number: BS-1394-01-14), and The Natural Sciences and Engineering Research Council of Canada (NSERC).

Data Availability

All relevant data are within the paper.

Funding Statement

This work is supported in part by a grant from the Institute for Research in Fundamental Sciences (IPM), Tehran, Iran (grant number: BS-1394-01-14), and The Natural Sciences and Engineering Research Council of Canada (NSERC) to Amir Sanati Nezhad. http://www.ipm.ac.ir/; http://www.nserc-crsng.gc.ca/index_eng.asp.

References

  • 1.Elleuche S, Schäfers C, Blank S, Schröder C, Antranikian G. Exploration of extremophiles for high temperature biotechnological processes. Current Opinion in Microbiology. 2015;25:113–9. doi: 10.1016/j.mib.2015.05.011 [DOI] [PubMed] [Google Scholar]
  • 2.Gromiha MM, Oobatake M, Sarai A. Important amino acid properties for enhanced thermostability from mesophilic to thermophilic proteins. Biophysical Chemistry. 1999;82(1):51–67. [DOI] [PubMed] [Google Scholar]
  • 3.Pezeshgi Modarres H, Dorokhov BD, Popov VO, Ravin NV, Skryabin KG, Dal Peraro M. Understanding and engineering thermostability in DNA ligase from Thermococcus sp. 1519. Biochemistry. 2015;54(19):3076–85. doi: 10.1021/bi501227b [DOI] [PubMed] [Google Scholar]
  • 4.Panigrahi P, Sule M, Ghanate A, Ramasamy S, Suresh C. Engineering proteins for thermostability with iRDP web server. PloS One. 2015;10(10):e0139486 doi: 10.1371/journal.pone.0139486 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Chaparro-Riggers JF, Polizzi KM, Bommarius AS. Better library design: data-driven protein engineering. Biotechnology Journal. 2007;2(2):180–91. doi: 10.1002/biot.200600170 . [DOI] [PubMed] [Google Scholar]
  • 6.Pantoliano MW, Whitlow M, Wood JF, Dodd SW, Hardman KD, Rollence ML, et al. Large increases in general stability for subtilisin bpn' through incremental changes in the free-energy of unfolding. Biochemistry. 1989;28(18):7205–13. doi: 10.1021/Bi00444a012 WOS:A1989AQ25000012. [DOI] [PubMed] [Google Scholar]
  • 7.Polizzi KM, Chaparro-Riggers JF, Vazquez-Figueroa E, Bommarius AS. Structure-guided consensus approach to create a more thermostable penicillin G acylase. Biotechnology Journal. 2006;1(5):531–6. doi: 10.1002/biot.200600029 . [DOI] [PubMed] [Google Scholar]
  • 8.Vazquez-Figueroa E, Chaparro-Riggers J, Bommarius AS. Development of a thermostable glucose dehydrogenase by a structure-guided consensus concept. Chembiochem: a European Journal of Chemical Biology. 2007;8(18):2295–301. doi: 10.1002/cbic.200700500 . [DOI] [PubMed] [Google Scholar]
  • 9.Lehmann M, Pasamontes L, Lassen SF, Wyss M. The consensus concept for thermostability engineering of proteins. Biochimica et Biophysica Acta. 2000;1543(2):408–15. . [DOI] [PubMed] [Google Scholar]
  • 10.Lehmann M, Loch C, Middendorf A, Studer D, Lassen SF, Pasamontes L, et al. The consensus concept for thermostability engineering of proteins: further proof of concept. Protein Engineering. 2002;15(5):403–11. . [DOI] [PubMed] [Google Scholar]
  • 11.Anbar M, Gul O, Lamed R, Sezerman UO, Bayer EA. Improved thermostability of Clostridium thermocellum Endoglucanase Cel8A by using consensus-guided mutagenesis. Appl Environ Microb. 2012;78(9):3458–64. doi: 10.1128/Aem.07985-11 WOS:000302807500048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Blum JK, Ricketts MD, Bommarius AS. Improved thermostability of AEH by combining B-FIT analysis and structure-guided consensus method. J Biotechnol. 2012;160(3–4):214–21. doi: 10.1016/j.jbiotec.2012.02.014 WOS:000306652300016. [DOI] [PubMed] [Google Scholar]
  • 13.Vazquez-Figueroa E, Yeh V, Broering JM, Chaparro-Riggers JF, Bommarius AS. Thermostable variants constructed via the structure-guided consensus method also show increased stability in salts solutions and homogeneous aqueous-organic media. Protein Eng Des Sel. 2008;21(11):673–80. doi: 10.1093/protein/gzn048 WOS:000260144000006. [DOI] [PubMed] [Google Scholar]
  • 14.Lehmann M, Wyss M. Engineering proteins for thermostability: the use of sequence alignments versus rational design and directed evolution. Curr Opin Biotech. 2001;12(4):371–5. doi: 10.1016/S0958-1669(00)00229-9 WOS:000170296300007. [DOI] [PubMed] [Google Scholar]
  • 15.Lehmann M, Pasamontes L, Lassen SF, Wyss M. The consensus concept for thermostability engineering of proteins. Bba-Protein Struct M. 2000;1543(2):408–15. doi: 10.1016/S0167-4838(00)00238-7. WOS:000166500900011. [DOI] [PubMed] [Google Scholar]
  • 16.Lehmann M, Loch C, Middendorf A, Studer D, Lassen SF, Pasamontes L, et al. The consensus concept for thermostability engineering of proteins: further proof of concept. Protein engineering. 2002;15(5):403–11. doi: 10.1093/protein/15.5.403 WOS:000175911000008. [DOI] [PubMed] [Google Scholar]
  • 17.Modarres HP, Mofrad M, Sanati-Nezhad A. Protein thermostability engineering. RSC Advances. 2016;6(116):115252–70. [Google Scholar]
  • 18.Ohage EC, Graml W, Walter MM, Steinbacher S, Steipe B. beta-Turn propensities as paradigms for the analysis of structural motifs to engineer protein stability. Protein Sci. 1997;6(1):233–41. WOS:A1997WD20100025. doi: 10.1002/pro.5560060125 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Söhngen C, Podstawka A, Bunk B, Gleim D, Vetcininova A, Reimer LC, et al. BacDive–The Bacterial Diversity Metadatabase in 2016. Nucleic Acids Research. 2016;44(D1):D581–D5. doi: 10.1093/nar/gkv983 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K, et al. Database resources of the national center for biotechnology information. Nucleic Acids Research. 2011;39(suppl 1):D38–D51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths‐Jones S, et al. The Pfam protein families database. Nucleic Acids Research. 2004;32(suppl 1):D138–D41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Bernstein FC, Koetzle TF, Williams GJ, Meyer EF, Brice MD, Rodgers JR, et al. The protein data bank. European Journal of Biochemistry. 1977;80(2):319–24. [DOI] [PubMed] [Google Scholar]
  • 23.Consortium U. UniProt: a hub for protein information. Nucleic acids research. 2014:gku989. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Van Rossum G, Drake FL. Python language reference manual: Network Theory; 2003.
  • 25.Cock PJ, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25(11):1422–3. doi: 10.1093/bioinformatics/btp163 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Huang S-L, Wu L-C, Liang H-K, Pan K-T, Horng J-T, Ko M-T. PGTdb: a database providing growth temperatures of prokaryotes. Bioinformatics. 2004;20(2):276–8. [DOI] [PubMed] [Google Scholar]
  • 27.Bava KA, Gromiha MM, Uedaira H, Kitajima K, Sarai A. ProTherm, version 4.0: thermodynamic database for proteins and mutants. Nucleic Acids Research. 2004;32(suppl 1):D120–D1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Xiao ZH, Bergeron H, Grosse S, Beauchemin M, Garron ML, Shaya D, et al. Improvement of the thermostability and activity of a pectate lyase by single amino acid substitutions, using a strategy based on melting-temperature-guided sequence alignment. Appl Environ Microb. 2008;74(4):1183–9. doi: 10.1128/Aem.02220-07 WOS:000253221500029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Jochens H, Aerts D, Bornscheuer UT. Thermostabilization of an esterase by alignment-guided focussed directed evolution. Protein Engineering Design and Selection. 2010;23(12):903–9. [DOI] [PubMed] [Google Scholar]
  • 30.Kumwenda B, Litthauer D, Bishop ÖT, Reva O. Analysis of protein thermostability enhancing factors in industrially important thermus bacteria species. Evolutionary Bioinformatics. 2013;9:327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Ding Y, Cai Y, Han Y, Zhao B. Comparison of the structural basis for thermal stability between archaeal and bacterial proteins. Extremophiles. 2012;16(1):67–78. doi: 10.1007/s00792-011-0406-z [DOI] [PubMed] [Google Scholar]
  • 32.Paiardini A, Sali R, Bossa F, Pascarella S. " Hot cores" in proteins: Comparative analysis of the apolar contact area in structures from hyper/thermophilic and mesophilic organisms. BMC Structural Biology. 2008;8(1):1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Hwang I, Park S. Computational design of protein therapeutics. Drug Discovery Today: Technologies. 2008;5(2):e43–e8. [DOI] [PubMed] [Google Scholar]
  • 34.Kontermann RE. Strategies for extended serum half-life of protein therapeutics. Curr Opin Biotech. 2011;22(6):868–76. doi: 10.1016/j.copbio.2011.06.012 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

All relevant data are within the paper.


Articles from PLoS ONE are provided here courtesy of PLOS

RESOURCES