ProtDataTherm: A database for thermostability analysis and engineering of proteins

Hassan Pezeshgi Modarres; Mohammad R Mofrad; Amir Sanati-Nezhad

doi:10.1371/journal.pone.0191222

. 2018 Jan 29;13(1):e0191222. doi: 10.1371/journal.pone.0191222

ProtDataTherm: A database for thermostability analysis and engineering of proteins

Hassan Pezeshgi Modarres ^1,^2,³, Mohammad R Mofrad ^2,⁴, Amir Sanati-Nezhad ^1,^5,^*

Editor: Eugene A Permyakov⁶

PMCID: PMC5788348 PMID: 29377907

Abstract

Protein thermostability engineering is a powerful tool to improve resistance of proteins against high temperatures and thereafter broaden their applications. For efficient protein thermostability engineering, different thermostability-classified data sources including sequences and 3D structures are needed for different protein families. However, no data source is available providing such data easily. It is the first release of ProtDataTherm database for analysis and engineering of protein thermostability which contains more than 14 million protein sequences categorized based on their thermal stability and protein family. This database contains data needed for better understanding protein thermostability and stability engineering. Providing categorized protein sequences and structures as psychrophilic, mesophilic and thermophilic makes this database useful for the development of new tools in protein stability prediction. This database is available at http://profiles.bs.ipm.ir/softwares/protdatatherm. As a proof of concept, the thermostability that improves mutations were suggested for one sample protein belonging to one of protein families with more than 20 mesophilic and thermophilic sequences and with known experimentally measured ΔT of mutations available within ProTherm database.

Introduction

Thermophilic and hyper thermophilic microorganisms have become attractive to scientists specifically after reporting the microorganisms living at temperatures higher than 75°C (1). The extracted enzymes from such high temperature tolerating microorganisms have been studied to understand modulating factors of their improved thermostability and then to use it as a guidance for improving thermostability of proteins with lower thermal stability for biotechnological applications [1]. The knowledge about the preferred living temperature of microorganisms can help to approximate thermostability criteria of their expressed proteins and a direct relationship between the growth temperature of microorganisms and the melting point of their corresponding proteins [2]. Currently available data on homologous proteins are valuable for engineering of proteins to gain higher stability by for example introducing more salt-bridges or strengthening the hydrophobic cores within protein structure [3]. Although structure-based protein engineering, known as rational engineering or rational design, is the most popular methodology for thermostability engineering of proteins, the limited number of available protein structures is still a challenge to prevalent utilization of the methodology [4]. On the other hand, because of modern advances in DNA sequencing technologies, the number of sequenced proteins belonging to different families is growing rapidly [3, 5]. Advances in applications of protein sequences for protein engineering could assist the existing routine structure-based rational methods. The consensus concept (CC) is the most popular sequence-based protein engineering approach to extract thermo-stabilizing mutations out of homologous sequences [6–17]. In CC approach, a multiple sequence alignment (MSA) is first made and then non-consensus residues are substituted by the most frequently occurring amino acids [5]. However, there is no guarantee that all suggested mutations induced by CC approach can increase thermostability [9, 14, 16, 18]. To detect thermo-stabilizing mutations with higher probability, one can take the advantage of comparing the target sequence with homologues tolerant at higher temperatures [3]. To make it feasible for different families of proteins, one needs to have access to other proteins from the same family with a higher thermal stability. However, the main challenge using this method is the difficulty in finding homologues with a label showing the thermostability category. To overcome this challenge, we developed a comprehensive database that contains protein sequences that belongs to different microorganisms and clustered based on the Pfam ID. The user can find the Pfam ID of a protein of interest and find its homologues, categorized as psychrophilic, mesophilic and thermophilic. In addition to sequences, PDB IDs are also provided if a 3D structure is available for the Pfam ID of interest.

Materials and methods

First, a database was made for microorganisms such that each microorganism is categorized based on its growth temperature (GT) using BacDive [19] and NCBI [20] databases. For every microorganism, all available sequences with their corresponding sequence information, including Pfam ID [21] and PDB ID [22], if available, were obtained from UniProt database [23]. All the process was conducted using python programming language [24], incorporating Biopython module [25] (Fig 1).

In our database, all protein sequences have two labels: Pfam IDs and thermostability category. To facilitate the use of the database for thermostability analysis and engineering, sequences are clustered based on their Pfam IDs. For each Pfam ID cluster, we can find proteins from the same family labeled with their thermostability category. Therefore, for a target protein sequence, the user can find the corresponding Pfam ID from the Pfam database [21] and uses the Pfam ID as the primary input to search over the database. For each Pfam ID family, we categorized sequences based on their Uniprot IDs as psychrophilics (GT< 20°C), mesophilics (20°C < GT < 40°C), and thermophilic (40°C <GT). For each protein family, the available PDB structures are shown and categorized like sequences. All sequence IDs, protein family IDs, and PDB IDs, are UniProt, Pfam and RCSB IDs, respectively.

For the case study, first, Pfams containing more than 20 mesophilic and thermophilic sequences were found. Then, for pattern analysis, the AXB patterns were considered in each sequence where A and B can be any of 20 standard amino acids and X is a separation number between 0 and 10. Therefore, A0B means all double amino acid compositions that are subsequent like VE, and A1B patterns are all double amino acid compositions that there is one amino acid between them. For example, all patterns with Ala as the first amino acid, Val as the second, and with only one amino acid spacing between Ala and Val from the 20 standard amino acids are considered as A1V. The condition 0 = <X = <10 was used for the spacing values. Furthermore, for any of sequences in mesophilic and thermophilic sequences, the number of occurring AXB patterns were counted and saved for each sequence. Finally, we have a group of data for both mesophilic and thermophilic sequences with the corresponding patterns. Therefore, for a given AXB (e.g. V4H pattern), there is one group of numbers for mesophilic and thermophilic categories with their corresponding average number. The Rank Sum test with critical p-value of 0.05 was used to detect AXB patters and distinguish mesophilic sequences from thermophilic sequences.

Results and discussion

A PHP webpage is designed as the user interface to access the database. The user can find the Pfam ID for a protein of interest (e.g. using Pfam database) and search it in the first page of the website (Fig 2, panel A). The results are then presented in the next page including all available sequences and structures within the database for the submitted Pfam ID (Fig 2, panel B). The database contains more than 14 million protein sequences and PDB structures for 9962 protein family, categorized based on their thermal stability as psychrophilic, mesophilic and thermophilic (Table 1). Totally, there are 14155392 protein sequences and 30950 PDB structures available in the database. For 957 members of protein families there is at least one PDB structure available for a thermophilic protein that can be used for structural comparison between mesophilic and thermophilic proteins (Table 1). In addition, for 3355 protein families there are at least 20 sequences belonging to thermophilic proteins as well as 3046 protein families with at least 20 sequences belonging to psychrophilic proteins. For such protein families, we can use amino acid content comparison between psychrophilic/mesophilic and mesophilic/thermophilic proteins to gain protein family-based specific knowledge of thermostability modulating factors.

Fig 2 — A) Users can enter the Pfam ID as input at the first page. B), All available sequences and structures are presented for different classes at the result page.

Table 1. The distribution of protein sequences and structures over the three classes of thermostability.

Mesophilic sequences	13111756
Thermophilic sequences	661072
Psychrophilic sequences	382564
Mesophilic structures	23069
Thermophilic structures	7741
Psychrophilic structures	140
Pfams with at least one Mesophilic structure	2306
Pfams with at least one Thermophilic structure	957
Pfams with at least one Psychrophilic structure	82
Pfams with at least 20 Thermophilic sequence	3355
Pfams with at least 20 Psychrophilic sequence	3046

Open in a new tab

Other databases

Two databases, namely PGTdb [26] and Protherm [27], are presently available to provide data concerning protein thermostability. To the knowledge of authors, the PGTdb database is not presently available while it was the only resource that could provide experimental information about thermostability classification of protein sequences based on GT of their corresponding organisms (psychrophilic, mesophilic and thermophilic). On the other hand, ProTherm database provides thermodynamics data for mutagenesis but only for a limited number of proteins. Our database contains much higher number of microorganisms, protein sequences and PDB structures. This database categorizes all the sequences for different Pfam families according to their thermostability criteria and provides easier access to the needed data for analysis and engineering of protein families.

Case study: Pattern recognition for protein engineering

One important goal of all thermostability analysis is to understand how one can take advantage of the knowledge from analysis of the differences between two categories, engineer mesophilics by minimum number of mutations, and enhance protein thermostability towards thermophilic sequences. Here, as a case study, we selected a protein belonging to one of those protein families with more than 20 mesophilic and thermophilic sequences where its ΔT of mutations is experimentally available within ProTherm database. In the ProTherm database, ribonuclease H from Escherichia Coli (strain K12) (with PDB_ID of 2RN2, solved using X-ray diffraction, resolution 1.48Å) was selected. Ribonuclease belongs to Pfam ID of PF00075, with the reported ΔT upon mutation using thermal experiments and is amongst the proteins with the highest number of reported thermodynamic measurements for the effect of mutations on its stability.

An algorithm (Algorithm 1) is designed to suggest thermostability improving mutations: for all AXB patterns with meaningful population difference between mesophilic and thermophilic sequences in the family (Pfam ID of PF00075) (see methods for definition of meaningful population difference), we chose those AXB patterns that have a higher average number of repeats than mesophilic within thermophilic category. We then found AXY patterns in the target sequence (ribonuclease H from Escherichia Coli) that the Y is not equal to B in the pattern. For these selected patterns, we suggest Y→B mutation. The same approach was used for ZXB to suggest Z→A mutations. If the mutation was available in the ProTherm database, the ΔT value was checked. If ΔT > 0, the suggested mutation was considered as a successful thermostability improving suggestion and if ΔT < 0, it was defined as a failed suggestion. The results are shown in Table 2 where 72% of the suggested mutations can improve thermostability. This result confirms that the proposed method can be considered as a sequence-based thermostability engineering method only if we have categorized sequences as thermophilic and mesophilic for protein family of the target proteins. The accuracy of the suggested mutations for thermostability engineering is expected to be improved over such a database by recruiting more complicated methods like machine learning techniques. However, further studies with incorporation of more proteins from diverse range of protein families should be conducted to better evaluate the accuracy of this method.

Table 2. Ave_The: Average of the number of patterns for thermophilic sequences, Ave_Mes: Average of the number of patterns for mesophilic sequences.

Pattern	Positions on Sequence	Mutation	ΔT	P_value	Ave_The	Ave_Mes
ER	61 E, H 62	H 62 R	1.3	0.0032	1.782	1.474
LR	74 V, R 75	V 74 L	3.7	0.0106	1.773	1.492
LE	134 D, E 135	D 134 L	5.5	0.0007	1.918	1.437
NK	95 K, K 96	K 95 N	3.2	0.00841	1.951	1.43
SI	52 A, I 53	A 52 S	-5.8	0.04146	1.579	1.21
SG	10 D, G 11	D 10 S	9.2	0.0083	1.836	1.63
KI	52 A, I 53	A 52 K	19.5	0.0398	2.385	1.69
EG	10 D, G 11	D 10 E	3.8	0.012	1.74	1.376
F1E	8 F, D 10	D 10 E	3.8	0.0199	1.463	1.242
F1S	8 F, D 10	D 10 S	9.2	0.0186	1.767	1.284
C2N	41 R, N 44	R 41 C	1.6	0.0002	1.282	1.052
A2Y	70 D, Y 73	D 70 A	3.8	0.004	1.647	1.304
E2Y	70 D, Y 73	D 70 E	1.8	0.0331	1.583	1.12
E2C	10 D, C 13	D 10 E	3.8	0.008	1.409	1
L2N	49 L, A 52	A 52 N	-5.9	0.0323	1.617	1.301
L2N	67 L, D 70	D 70 N	5.5	0.0323	1.617	1.301
V2K	119 E, K 122	E 119 V	2.7	0.0379	1.635	1.264
N3I	130 N, D 134	D 134 I	4.6	0.0281	1.667	1.246
N3N	130 N, D 134	D 134 N	6.4	0.0004	1.658	1.265
N3E	130 N, D 134	D 134 E	3.1	0.0353	1.757	1.557
N3V	130 N, D 134	D 134 V	4.1	0.0031	1.541	1.299
N3V	70 D, V 74	D 70 N	5.5	0.0031	1.541	1.299
R3V	91 K, K 95	K 91 R	0.5	0.0005	1.554	1.26
V3Y	24 A, Y 28	A 24 V	3.2	0.0419	1.638	1.136
E3V	48 E, A 52	A 52 V	7.8	0.0133	2.023	1.852
E3V	64 E, S 68	S 68 V	1.9	0.0133	2.023	1.852
E3V	70 D, V 74	D 70 E	1.8	0.0133	2.023	1.852
E3V	94 D, V 98	D 94 E	-1.2	0.0133	2.023	1.852
Y3L	52 A, L 56	A 52 Y	-7.6	0.0146	1.636	1.082
C4E	52 A, E 57	A 52 C	2.5	0.0175	1.4	1
V4Y	68 S, Y 73	S 68 V	1.9	0.0162	1.528	1.079
N4R	70 D, R 75	D 70 N	5.5	7.34E-09	1.587	1.155
N4K	130 N, E 135	E 135 K	-0.8	6.92E-05	2.329	1.678
N4E	52 A, E 57	A 52 N	-5.9	0.0127	1.664	1.317
Q5N	4 Q, D 10	D 10 N	6.8	0.0361	1.696	1.16
E5N	64 E, D 70	D 70 N	5.5	0.00257	1.615	1.36
E5V	10 D, N 16	D 10 E	3.8	0.0025	1.615	1.3
E5N	94 D, N 100	D 94 E	-1.2	0.0025	1.615	1.36
R5P	46 R, A 52	A 52 P	-5.4	0.0499	1.37	1.217
R5P	91 K, P 97	K 91 R	0.5	0.0499	1.37	1.217
R5I	46 R, A 52	A 52 I	6.2	0.0299	1.429	1.206
R5Y	46 R, A 52	A 52 Y	-7.6	0.0176	1.483	1.116
L5P	56 L, H 62	H 62 P	4.1	0.0009	1.59	1.316
L5P	107 L, Q 113	Q 113 P	-0.6	0.0009	1.59	1.316
L5L	80 Q, K 86	Q 80 L	1	0.0001	2.102	1.618
K6E	3 K, D 10	D 10 E	3.8	0.027	2.111	1.712
K6E	87 K, D 94	D 94 E	-1.2	0.027	2.111	1.712
N6I	45 N, A 52	A 52 I	6.2	0.007	1.554	1.2
L6L	67 L, V 74	V 74 L	3.7	0.0002	2.008	1.684
L6L	52 A, L 59	A 52 L	4.3	0.0002	2.008	1.684
L6K	80 Q, K 87	Q 80 L	1	0.0017	1.884	1.47
N6T	45 N, A 52	A 52 T	-2.7	0.01419	1.491	1.261
I7I	66 I, V 74	V 74 I	2.4	0.0043	1.618	1.241
I7I	74 V, I 82	V 74 I	2.4	0.0043	1.618	1.241
L7K	52 A, K 60	A 52 L	4.3	0.0395	1.67	1.355
L7I	74 V, I 82	V 74 L	3.7	0.0018	1.704	1.346
K7E	86 K, D 94	D 94 E	-1.2	0.0045	1.971	1.573
K7K	52 A, K 60	A 52 K	-19.5	0.0003	2.059	1.632
G7N	126 G, D 134	D 134 N	6.4	3.92E-07	1.934	1.473
Y7K	52 A, K 60	A 52 Y	-7.6	0.0202	1.705	1.262
N7T	44 N, A 52	A 52 T	-2.7	0.0001	1.767	1.215
N7V	16 N, A 24	A 24 V	3.2	0.0005	1.632	1.311
N7V	44 N, A 52	A 52 V	7.8	0.0005	1.632	1.311
F7K	52 A, K 60	A 52 F	-1.5	0.0211	1.636	1.237
R7K	91 K, K 99	K 91 R	0.5	0.0111	1.446	1.2

Open in a new tab

Algorithm 1. Thermostability improving mutation suggestion algorithm.

Input. Protein sequence, P-fam ID, and thermophilic and mesophilic

distinguishing AXB patterns for the P-fam ID

Output. Mutation list

for all AXB patterns for the P-fam ID do:

if ${A X B}_{a v e r a g e}^{t h e r m o p h i l i c} > {A X B}_{a v e r a g e}^{m e s o p h i l i c}$ then:

find AXY or ZXB patterns in the target sequence where Y is

not B or Z is not A

add Y → B or Z → A to mutation list

end

return mutation list

Applications

The database developed in this work can be used for building protein thermostability mutation libraries using different approaches like CC and also comparison of the target sequence with its homologues with higher thermostability [17, 28, 29]. In addition, it can be used for systemic analysis of modulating factors of thermostability [30–32] for different families, while thermostability modulating factors can vary from family to family [3]. Furthermore, it is noteworthy that while the thermophilic sequence belongs to microorganisms that are tolerant to harsh conditions in general and not only to temperature, these data can be used for optimization of a target sequence for new applications under other harsh conditions than temperature, like intense pH and high concentration of salts. Altogether, this database provides the most important needed data for sequence-based protein engineering and analysis for researchers to develop new analysis and engineering tools in the field of thermal stability. This database is not only useful for general industrial and research purposes but also applicable for drug design [17, 33, 34]

Conclusions

Here we present the first release of ProtDataTherm database that contains more than 14 million protein sequences and structures belonging to microorganisms with different preferred living temperatures. All sequences and structures are labeled as psychrophilic, mesophilic and thermophilic. For ease of use, the sequences are classified based on their Pfam IDs. Users can find homologous sequences for their protein of interest by knowing its Pfam ID. This database can be applied not only for probing stability modulating factors within protein families but also for knowledge-based protein stability engineering.

Availability

This database is available at http://profiles.bs.ipm.ir/softwares/protdatatherm. The database can be accessible free of charge for academic users on demand.

Acknowledgments

This work is supported in part by a grant from the Institute for Research in Fundamental Sciences (IPM), Tehran, Iran (grant number: BS-1394-01-14), and The Natural Sciences and Engineering Research Council of Canada (NSERC).

Data Availability

All relevant data are within the paper.

Funding Statement

References

1.Elleuche S, Schäfers C, Blank S, Schröder C, Antranikian G. Exploration of extremophiles for high temperature biotechnological processes. Current Opinion in Microbiology. 2015;25:113–9. doi: 10.1016/j.mib.2015.05.011 [DOI] [PubMed] [Google Scholar]
2.Gromiha MM, Oobatake M, Sarai A. Important amino acid properties for enhanced thermostability from mesophilic to thermophilic proteins. Biophysical Chemistry. 1999;82(1):51–67. [DOI] [PubMed] [Google Scholar]
3.Pezeshgi Modarres H, Dorokhov BD, Popov VO, Ravin NV, Skryabin KG, Dal Peraro M. Understanding and engineering thermostability in DNA ligase from Thermococcus sp. 1519. Biochemistry. 2015;54(19):3076–85. doi: 10.1021/bi501227b [DOI] [PubMed] [Google Scholar]
4.Panigrahi P, Sule M, Ghanate A, Ramasamy S, Suresh C. Engineering proteins for thermostability with iRDP web server. PloS One. 2015;10(10):e0139486 doi: 10.1371/journal.pone.0139486 [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Chaparro-Riggers JF, Polizzi KM, Bommarius AS. Better library design: data-driven protein engineering. Biotechnology Journal. 2007;2(2):180–91. doi: 10.1002/biot.200600170 . [DOI] [PubMed] [Google Scholar]
6.Pantoliano MW, Whitlow M, Wood JF, Dodd SW, Hardman KD, Rollence ML, et al. Large increases in general stability for subtilisin bpn' through incremental changes in the free-energy of unfolding. Biochemistry. 1989;28(18):7205–13. doi: 10.1021/Bi00444a012 WOS:A1989AQ25000012. [DOI] [PubMed] [Google Scholar]
7.Polizzi KM, Chaparro-Riggers JF, Vazquez-Figueroa E, Bommarius AS. Structure-guided consensus approach to create a more thermostable penicillin G acylase. Biotechnology Journal. 2006;1(5):531–6. doi: 10.1002/biot.200600029 . [DOI] [PubMed] [Google Scholar]
8.Vazquez-Figueroa E, Chaparro-Riggers J, Bommarius AS. Development of a thermostable glucose dehydrogenase by a structure-guided consensus concept. Chembiochem: a European Journal of Chemical Biology. 2007;8(18):2295–301. doi: 10.1002/cbic.200700500 . [DOI] [PubMed] [Google Scholar]
9.Lehmann M, Pasamontes L, Lassen SF, Wyss M. The consensus concept for thermostability engineering of proteins. Biochimica et Biophysica Acta. 2000;1543(2):408–15. . [DOI] [PubMed] [Google Scholar]
10.Lehmann M, Loch C, Middendorf A, Studer D, Lassen SF, Pasamontes L, et al. The consensus concept for thermostability engineering of proteins: further proof of concept. Protein Engineering. 2002;15(5):403–11. . [DOI] [PubMed] [Google Scholar]
11.Anbar M, Gul O, Lamed R, Sezerman UO, Bayer EA. Improved thermostability of Clostridium thermocellum Endoglucanase Cel8A by using consensus-guided mutagenesis. Appl Environ Microb. 2012;78(9):3458–64. doi: 10.1128/Aem.07985-11 WOS:000302807500048. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Blum JK, Ricketts MD, Bommarius AS. Improved thermostability of AEH by combining B-FIT analysis and structure-guided consensus method. J Biotechnol. 2012;160(3–4):214–21. doi: 10.1016/j.jbiotec.2012.02.014 WOS:000306652300016. [DOI] [PubMed] [Google Scholar]
13.Vazquez-Figueroa E, Yeh V, Broering JM, Chaparro-Riggers JF, Bommarius AS. Thermostable variants constructed via the structure-guided consensus method also show increased stability in salts solutions and homogeneous aqueous-organic media. Protein Eng Des Sel. 2008;21(11):673–80. doi: 10.1093/protein/gzn048 WOS:000260144000006. [DOI] [PubMed] [Google Scholar]
14.Lehmann M, Wyss M. Engineering proteins for thermostability: the use of sequence alignments versus rational design and directed evolution. Curr Opin Biotech. 2001;12(4):371–5. doi: 10.1016/S0958-1669(00)00229-9 WOS:000170296300007. [DOI] [PubMed] [Google Scholar]
15.Lehmann M, Pasamontes L, Lassen SF, Wyss M. The consensus concept for thermostability engineering of proteins. Bba-Protein Struct M. 2000;1543(2):408–15. doi: 10.1016/S0167-4838(00)00238-7. WOS:000166500900011. [DOI] [PubMed] [Google Scholar]
16.Lehmann M, Loch C, Middendorf A, Studer D, Lassen SF, Pasamontes L, et al. The consensus concept for thermostability engineering of proteins: further proof of concept. Protein engineering. 2002;15(5):403–11. doi: 10.1093/protein/15.5.403 WOS:000175911000008. [DOI] [PubMed] [Google Scholar]
17.Modarres HP, Mofrad M, Sanati-Nezhad A. Protein thermostability engineering. RSC Advances. 2016;6(116):115252–70. [Google Scholar]
18.Ohage EC, Graml W, Walter MM, Steinbacher S, Steipe B. beta-Turn propensities as paradigms for the analysis of structural motifs to engineer protein stability. Protein Sci. 1997;6(1):233–41. WOS:A1997WD20100025. doi: 10.1002/pro.5560060125 [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Söhngen C, Podstawka A, Bunk B, Gleim D, Vetcininova A, Reimer LC, et al. BacDive–The Bacterial Diversity Metadatabase in 2016. Nucleic Acids Research. 2016;44(D1):D581–D5. doi: 10.1093/nar/gkv983 [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K, et al. Database resources of the national center for biotechnology information. Nucleic Acids Research. 2011;39(suppl 1):D38–D51. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths‐Jones S, et al. The Pfam protein families database. Nucleic Acids Research. 2004;32(suppl 1):D138–D41. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Bernstein FC, Koetzle TF, Williams GJ, Meyer EF, Brice MD, Rodgers JR, et al. The protein data bank. European Journal of Biochemistry. 1977;80(2):319–24. [DOI] [PubMed] [Google Scholar]
23.Consortium U. UniProt: a hub for protein information. Nucleic acids research. 2014:gku989. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Van Rossum G, Drake FL. Python language reference manual: Network Theory; 2003.
25.Cock PJ, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25(11):1422–3. doi: 10.1093/bioinformatics/btp163 [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Huang S-L, Wu L-C, Liang H-K, Pan K-T, Horng J-T, Ko M-T. PGTdb: a database providing growth temperatures of prokaryotes. Bioinformatics. 2004;20(2):276–8. [DOI] [PubMed] [Google Scholar]
27.Bava KA, Gromiha MM, Uedaira H, Kitajima K, Sarai A. ProTherm, version 4.0: thermodynamic database for proteins and mutants. Nucleic Acids Research. 2004;32(suppl 1):D120–D1. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Xiao ZH, Bergeron H, Grosse S, Beauchemin M, Garron ML, Shaya D, et al. Improvement of the thermostability and activity of a pectate lyase by single amino acid substitutions, using a strategy based on melting-temperature-guided sequence alignment. Appl Environ Microb. 2008;74(4):1183–9. doi: 10.1128/Aem.02220-07 WOS:000253221500029. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Jochens H, Aerts D, Bornscheuer UT. Thermostabilization of an esterase by alignment-guided focussed directed evolution. Protein Engineering Design and Selection. 2010;23(12):903–9. [DOI] [PubMed] [Google Scholar]
30.Kumwenda B, Litthauer D, Bishop ÖT, Reva O. Analysis of protein thermostability enhancing factors in industrially important thermus bacteria species. Evolutionary Bioinformatics. 2013;9:327. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Ding Y, Cai Y, Han Y, Zhao B. Comparison of the structural basis for thermal stability between archaeal and bacterial proteins. Extremophiles. 2012;16(1):67–78. doi: 10.1007/s00792-011-0406-z [DOI] [PubMed] [Google Scholar]
32.Paiardini A, Sali R, Bossa F, Pascarella S. " Hot cores" in proteins: Comparative analysis of the apolar contact area in structures from hyper/thermophilic and mesophilic organisms. BMC Structural Biology. 2008;8(1):1. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Hwang I, Park S. Computational design of protein therapeutics. Drug Discovery Today: Technologies. 2008;5(2):e43–e8. [DOI] [PubMed] [Google Scholar]
34.Kontermann RE. Strategies for extended serum half-life of protein therapeutics. Curr Opin Biotech. 2011;22(6):868–76. doi: 10.1016/j.copbio.2011.06.012 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

All relevant data are within the paper.

[pone.0191222.ref001] 1.Elleuche S, Schäfers C, Blank S, Schröder C, Antranikian G. Exploration of extremophiles for high temperature biotechnological processes. Current Opinion in Microbiology. 2015;25:113–9. doi: 10.1016/j.mib.2015.05.011 [DOI] [PubMed] [Google Scholar]

[pone.0191222.ref002] 2.Gromiha MM, Oobatake M, Sarai A. Important amino acid properties for enhanced thermostability from mesophilic to thermophilic proteins. Biophysical Chemistry. 1999;82(1):51–67. [DOI] [PubMed] [Google Scholar]

[pone.0191222.ref003] 3.Pezeshgi Modarres H, Dorokhov BD, Popov VO, Ravin NV, Skryabin KG, Dal Peraro M. Understanding and engineering thermostability in DNA ligase from Thermococcus sp. 1519. Biochemistry. 2015;54(19):3076–85. doi: 10.1021/bi501227b [DOI] [PubMed] [Google Scholar]

[pone.0191222.ref004] 4.Panigrahi P, Sule M, Ghanate A, Ramasamy S, Suresh C. Engineering proteins for thermostability with iRDP web server. PloS One. 2015;10(10):e0139486 doi: 10.1371/journal.pone.0139486 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0191222.ref005] 5.Chaparro-Riggers JF, Polizzi KM, Bommarius AS. Better library design: data-driven protein engineering. Biotechnology Journal. 2007;2(2):180–91. doi: 10.1002/biot.200600170 . [DOI] [PubMed] [Google Scholar]

[pone.0191222.ref006] 6.Pantoliano MW, Whitlow M, Wood JF, Dodd SW, Hardman KD, Rollence ML, et al. Large increases in general stability for subtilisin bpn' through incremental changes in the free-energy of unfolding. Biochemistry. 1989;28(18):7205–13. doi: 10.1021/Bi00444a012 WOS:A1989AQ25000012. [DOI] [PubMed] [Google Scholar]

[pone.0191222.ref007] 7.Polizzi KM, Chaparro-Riggers JF, Vazquez-Figueroa E, Bommarius AS. Structure-guided consensus approach to create a more thermostable penicillin G acylase. Biotechnology Journal. 2006;1(5):531–6. doi: 10.1002/biot.200600029 . [DOI] [PubMed] [Google Scholar]

[pone.0191222.ref008] 8.Vazquez-Figueroa E, Chaparro-Riggers J, Bommarius AS. Development of a thermostable glucose dehydrogenase by a structure-guided consensus concept. Chembiochem: a European Journal of Chemical Biology. 2007;8(18):2295–301. doi: 10.1002/cbic.200700500 . [DOI] [PubMed] [Google Scholar]

[pone.0191222.ref009] 9.Lehmann M, Pasamontes L, Lassen SF, Wyss M. The consensus concept for thermostability engineering of proteins. Biochimica et Biophysica Acta. 2000;1543(2):408–15. . [DOI] [PubMed] [Google Scholar]

[pone.0191222.ref010] 10.Lehmann M, Loch C, Middendorf A, Studer D, Lassen SF, Pasamontes L, et al. The consensus concept for thermostability engineering of proteins: further proof of concept. Protein Engineering. 2002;15(5):403–11. . [DOI] [PubMed] [Google Scholar]

[pone.0191222.ref011] 11.Anbar M, Gul O, Lamed R, Sezerman UO, Bayer EA. Improved thermostability of Clostridium thermocellum Endoglucanase Cel8A by using consensus-guided mutagenesis. Appl Environ Microb. 2012;78(9):3458–64. doi: 10.1128/Aem.07985-11 WOS:000302807500048. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0191222.ref012] 12.Blum JK, Ricketts MD, Bommarius AS. Improved thermostability of AEH by combining B-FIT analysis and structure-guided consensus method. J Biotechnol. 2012;160(3–4):214–21. doi: 10.1016/j.jbiotec.2012.02.014 WOS:000306652300016. [DOI] [PubMed] [Google Scholar]

[pone.0191222.ref013] 13.Vazquez-Figueroa E, Yeh V, Broering JM, Chaparro-Riggers JF, Bommarius AS. Thermostable variants constructed via the structure-guided consensus method also show increased stability in salts solutions and homogeneous aqueous-organic media. Protein Eng Des Sel. 2008;21(11):673–80. doi: 10.1093/protein/gzn048 WOS:000260144000006. [DOI] [PubMed] [Google Scholar]

[pone.0191222.ref014] 14.Lehmann M, Wyss M. Engineering proteins for thermostability: the use of sequence alignments versus rational design and directed evolution. Curr Opin Biotech. 2001;12(4):371–5. doi: 10.1016/S0958-1669(00)00229-9 WOS:000170296300007. [DOI] [PubMed] [Google Scholar]

[pone.0191222.ref015] 15.Lehmann M, Pasamontes L, Lassen SF, Wyss M. The consensus concept for thermostability engineering of proteins. Bba-Protein Struct M. 2000;1543(2):408–15. doi: 10.1016/S0167-4838(00)00238-7. WOS:000166500900011. [DOI] [PubMed] [Google Scholar]

[pone.0191222.ref016] 16.Lehmann M, Loch C, Middendorf A, Studer D, Lassen SF, Pasamontes L, et al. The consensus concept for thermostability engineering of proteins: further proof of concept. Protein engineering. 2002;15(5):403–11. doi: 10.1093/protein/15.5.403 WOS:000175911000008. [DOI] [PubMed] [Google Scholar]

[pone.0191222.ref017] 17.Modarres HP, Mofrad M, Sanati-Nezhad A. Protein thermostability engineering. RSC Advances. 2016;6(116):115252–70. [Google Scholar]

[pone.0191222.ref018] 18.Ohage EC, Graml W, Walter MM, Steinbacher S, Steipe B. beta-Turn propensities as paradigms for the analysis of structural motifs to engineer protein stability. Protein Sci. 1997;6(1):233–41. WOS:A1997WD20100025. doi: 10.1002/pro.5560060125 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0191222.ref019] 19.Söhngen C, Podstawka A, Bunk B, Gleim D, Vetcininova A, Reimer LC, et al. BacDive–The Bacterial Diversity Metadatabase in 2016. Nucleic Acids Research. 2016;44(D1):D581–D5. doi: 10.1093/nar/gkv983 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0191222.ref020] 20.Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K, et al. Database resources of the national center for biotechnology information. Nucleic Acids Research. 2011;39(suppl 1):D38–D51. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0191222.ref021] 21.Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths‐Jones S, et al. The Pfam protein families database. Nucleic Acids Research. 2004;32(suppl 1):D138–D41. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0191222.ref022] 22.Bernstein FC, Koetzle TF, Williams GJ, Meyer EF, Brice MD, Rodgers JR, et al. The protein data bank. European Journal of Biochemistry. 1977;80(2):319–24. [DOI] [PubMed] [Google Scholar]

[pone.0191222.ref023] 23.Consortium U. UniProt: a hub for protein information. Nucleic acids research. 2014:gku989. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0191222.ref024] 24.Van Rossum G, Drake FL. Python language reference manual: Network Theory; 2003.

[pone.0191222.ref025] 25.Cock PJ, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25(11):1422–3. doi: 10.1093/bioinformatics/btp163 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0191222.ref026] 26.Huang S-L, Wu L-C, Liang H-K, Pan K-T, Horng J-T, Ko M-T. PGTdb: a database providing growth temperatures of prokaryotes. Bioinformatics. 2004;20(2):276–8. [DOI] [PubMed] [Google Scholar]

[pone.0191222.ref027] 27.Bava KA, Gromiha MM, Uedaira H, Kitajima K, Sarai A. ProTherm, version 4.0: thermodynamic database for proteins and mutants. Nucleic Acids Research. 2004;32(suppl 1):D120–D1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0191222.ref028] 28.Xiao ZH, Bergeron H, Grosse S, Beauchemin M, Garron ML, Shaya D, et al. Improvement of the thermostability and activity of a pectate lyase by single amino acid substitutions, using a strategy based on melting-temperature-guided sequence alignment. Appl Environ Microb. 2008;74(4):1183–9. doi: 10.1128/Aem.02220-07 WOS:000253221500029. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0191222.ref029] 29.Jochens H, Aerts D, Bornscheuer UT. Thermostabilization of an esterase by alignment-guided focussed directed evolution. Protein Engineering Design and Selection. 2010;23(12):903–9. [DOI] [PubMed] [Google Scholar]

[pone.0191222.ref030] 30.Kumwenda B, Litthauer D, Bishop ÖT, Reva O. Analysis of protein thermostability enhancing factors in industrially important thermus bacteria species. Evolutionary Bioinformatics. 2013;9:327. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0191222.ref031] 31.Ding Y, Cai Y, Han Y, Zhao B. Comparison of the structural basis for thermal stability between archaeal and bacterial proteins. Extremophiles. 2012;16(1):67–78. doi: 10.1007/s00792-011-0406-z [DOI] [PubMed] [Google Scholar]

[pone.0191222.ref032] 32.Paiardini A, Sali R, Bossa F, Pascarella S. " Hot cores" in proteins: Comparative analysis of the apolar contact area in structures from hyper/thermophilic and mesophilic organisms. BMC Structural Biology. 2008;8(1):1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0191222.ref033] 33.Hwang I, Park S. Computational design of protein therapeutics. Drug Discovery Today: Technologies. 2008;5(2):e43–e8. [DOI] [PubMed] [Google Scholar]

[pone.0191222.ref034] 34.Kontermann RE. Strategies for extended serum half-life of protein therapeutics. Curr Opin Biotech. 2011;22(6):868–76. doi: 10.1016/j.copbio.2011.06.012 [DOI] [PubMed] [Google Scholar]

PERMALINK

ProtDataTherm: A database for thermostability analysis and engineering of proteins

Hassan Pezeshgi Modarres

Mohammad R Mofrad

Amir Sanati-Nezhad

Roles

Abstract

Introduction

Materials and methods

Fig 1. Flowchart of the database formation.

Results and discussion

Fig 2. The view of the webpage.

Table 1. The distribution of protein sequences and structures over the three classes of thermostability.

Other databases

Case study: Pattern recognition for protein engineering

Table 2. Ave_The: Average of the number of patterns for thermophilic sequences, Ave_Mes: Average of the number of patterns for mesophilic sequences.

Applications

Conclusions

Availability

Acknowledgments

Data Availability

Funding Statement

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

ProtDataTherm: A database for thermostability analysis and engineering of proteins

Hassan Pezeshgi Modarres

Mohammad R Mofrad

Amir Sanati-Nezhad

Roles

Abstract

Introduction

Materials and methods

Fig 1. Flowchart of the database formation.

Results and discussion

Fig 2. The view of the webpage.

Table 1. The distribution of protein sequences and structures over the three classes of thermostability.

Other databases

Case study: Pattern recognition for protein engineering

Table 2. Ave_The: Average of the number of patterns for thermophilic sequences, Ave_Mes: Average of the number of patterns for mesophilic sequences.

Applications

Conclusions

Availability

Acknowledgments

Data Availability

Funding Statement

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases