Abstract
SOG1 is a crucial plant-specific NAC domain family transcription factor and functions as the central regulator of DNA damage response, acting downstream of ATM and ATR kinases. In this study, various in-silico approaches have been employed for the characterization of SOG1 transcription factor in a comparative manner with its orthologues from various plant species. Amino acid sequences of more than a hundred SOG1 or SOG1-like proteins were retrieved and their relationship was determined through phylogenetic and motif analyses. Various physiochemical properties and secondary structural components of SOG1 orthologues were determined in selective plant species including Arabidopsis thaliana, Oryza sativa, Amborella trichopoda, and Physcomitrella patens. Furthermore, fold recognition or threading and homology-based three-dimensional models of SOG1 were constructed followed by subsequent evaluation of quality and accuracy of the generated protein models. Finally, extensive DNA-Protein and Protein-Protein interaction studies were performed using the HADDOCK server to give an insight into the mechanism of how SOG1 binds with the promoter region of its target genes or interacts with other proteins to regulate the DNA damage responses in plants. Our docking analysis data have shown the molecular mechanism of SOG1′s binding with 5′-CTT(N)7AAG-3′ and 5′-(N)4GTCAA(N)4-3′ consensus sequences present in the promoter region of its target genes. Moreover, SOG1 physically interacts and forms a thermodynamically stable complex with NAC103 and BRCA1 proteins, which possibly serve as coactivators or mediators in the transcription regulatory network of SOG1. Overall, our in-silico study will provide meaningful information regarding the structural and functional characterization of the SOG1 transcription factor.
Keywords: SOG1, NAC domain family transcription factor, Phylogenetic and motif analyses, Homology-based three-dimensional model construction, HADDOCK server, DNA-Protein and Protein-Protein interaction study
1. Introduction
Because of their sessile lifestyle plants are inescapably exposed to various kinds of environmental stresses throughout their lifetime. These environmental stress factors along with some endogenous factors cause DNA damage in plants. To maintain genome integrity, plants have developed a sophisticated and coordinated cellular process known as DNA damage response or DDR50, 56, 27. DDR is a signal transduction process that senses damage in the genomic DNA and subsequently activates various DNA damage responses39. One of the major downstream effectors of the DDR signaling cascade in plants is a NAC domain transcription factor, SUPPRESSOR OF GAMMA RESPONSE 1 or SOG1.56, 27, 39
In the model plant system Arabidopsis thaliana, the SOG1 protein is composed of 449 amino acids and is encoded by a 1350 bp open reading frame (ORF)26, 27. The full-length AtSOG1 protein is divisible into three domains including the N-terminal extension, the conserved NAC domain, and the C-terminal transactivation domain57, 26, 27. The conserved NAC domain is involved in DNA binding and the C-terminal transactivation domain is associated with transcription regulation of the target genes.57, 26, 27 However, the function of N-terminal extension remains unclear. In addition, five serine-glutamine or SQ motifs are also present in the C-terminal transactivation domain. Hyperphosphorylation of the SQ motifs present in the trans-regulatory domain plays a crucial role in the activation of the SOG1 transcription factor59, 58. The distribution of SOG1 orthologues and SOG1-like proteins is well-diversified among various plant groups, from the evolutionary oldest moss to highly evolved monocot plants.59, 57 Among the various plant groups, SOG1-orthologue or SOG1-like protein is completely absent in various algal groups. Although SOG1 is a crucial transcription factor that regulates various facets of plants’ DNA damage responses under different abiotic and genotoxic stress conditions, the structure–function properties of this important transcription factor have not been studied extensively.
A recent study by our group has described the importance of the N-terminal conserved NAC domain in maintaining the secondary structural stability of bacterially expressed 6X His-tagged recombinant AtSOG1 protein under salinity stress conditions using various biophysical approaches including.26 However, information regarding the structural aspects of SOG1 protein including various physicochemical properties, details of primary structure, and possible secondary structural components along with their variation across different plant orthologues is rather limited.
The tertiary structure of a protein represents the overall three-dimensional organization of its polypeptide chain in space11. Three-dimensional structures of proteins provide meaningful insights into their functional relevance at cellular and molecular levels and have been shown to have a wide array of implications in biological science research.49 Like other proteins, the three-dimensional structure of a transcription factor is very important for studying its function in transcription regulation2. The mode of action of a transcription factor is to recognize and bind to a segment of DNA in the promoter and/or enhancer region of its target genes. A change in the three-dimensional conformation or tertiary structure of a transcription factor may impact the DNA binding.32 SOG1 is an important transcription factor regulating various aspects of plants’ DNA damage response. However, no insightful information is available regarding the three-dimensional structure of the SOG1 transcription factor.
In plants, SOG1 plays an analogous role to mammalian p53 protein and acts as the central regulator of DNA damage responses by mediating the functions of ATM and ATR kinases.55, 31, 5, 27, 28 SOG1 has been shown to regulate the expression of more than a hundred genes including KRP6, SMR4, SMR5, SMR7, WEE1, BRCA1, and RAD51, which are shown to be involved in cell cycle checkpoint regulation and DSB repair pathways, under DNA damaging conditions.31, 5 Among these, SOG1 directly binds with the promoter of AtBRCA1 and AtRAD51 genes using specific consensus DNA binding sequences following treatment with aluminum and zeocin respectively.44, 31, 5 However, information on the predicted ligand binding sites of SOG1 and the possible mechanism of binding of this important transcription factor to its target genes’ promoter is rather limited.
Along with the direct binding of transcription factors to its target genes, complex and multidimensional transcription regulation also involves an intricate interaction between transcription factors and transcription factor binding proteins. These proteins include a mediator complex and several cofactors, which enhance the transcription activity6, 24, 14, 36, 19. In mammalian cells, DNA damage leads to various molecular events that eventually phosphorylate and activate p538. p53 is the transcription factor mostly responsible for DNA damage response in mammalian cells. The plant homologue of mammalian p53 is SOG1 which is also activated via ATM-mediated phosphorylation and then regulates several downstream DNA damage responses after sensing the DNA damage56, 59, 28. In the mammalian system, BRCA1 physically interacts with p53 through the C-terminal transactivation domain and enhances its transcriptional activity.60 No direct evidence is available regarding the involvement of SOG1 in protein–protein interaction for the enhancement of transcriptional activity. Moreover, information regarding the protein–protein interaction network of SOG1 is much limited. It would be very interesting to explore the interaction network of SOG1 along with its possible physical interaction with other proteins and the underlying mechanism behind this interaction.
A multidisciplinary approach is required for the transcription factors' complete structural and functional characterization. For functional analysis of the SOG1 transcription factor, various structural aspects at various levels including primary, secondary, and tertiary should be unfolded properly. Herein, computational biology may play an appreciable role in understanding the overall properties of the SOG1 transcription factor comparatively with its other orthologues. In our current study, we have employed various in-silico approaches to understanding the possible structural organization of SOG1 orthologues. In addition, we have also tried to give an insight into the mechanism of how SOG1 binds with the promoter region of its target genes or interacts with other proteins.
2. Material and methods
2.1. Retrieval of the protein sequences
Amino acid sequences of a total of 103 SOG1 or SOG1-like proteins from various plant species have been retrieved from The National Center for Biotechnology Information (NCBI) (https://www.ncbi.nlm.nih.gov) and UniProt (https://www.uniprot.org). The proteins mentioned as “hypothetical proteins,” “probable SOG1”, SOG1-partial, and “unnamed protein” were sieved to rule out any kind of ambiguity in selecting appropriate SOG1 protein sequences. All the amino acid sequences were saved in FASTA format (Supplementary Information) and utilized for further computational analysis.
2.2. Determination of various physicochemical properties
Amino acid sequences of various SOG1 or SOG1-like proteins were further analyzed by using the ExPASy–ProtParam tool (https://web.expasy.org/protparam) for the determination of various physicochemical characteristic features.15 The ExPASy–ProtParam tool allows the computation of various physical and chemical parameters for a user-entered amino acid sequence. The computed parameters include amino acid composition, atomic composition, molecular weight (MW), extinction coefficient, estimated half-life, instability index, aliphatic index (AI), grand average of hydropathicity (GRAVY), and the total number of negative as well as positively charged residues (TNR and TPR respectively).
2.3. Analysis of primary structure for motif search
The core amino acids that form the primary structures of the proteins are extracted and listed by using the ExPASy–ProtParam tool. Motif-based sequence analysis tool MEME Suite 5.4.1 (https://meme-suite.org) was employed to detect the presence of various conserved motifs in SOG1 or SOG1-like proteins selected from various plant species. The MEME Suite supports motif-based analysis of DNA, RNA, and protein sequences which provides motif discovery algorithms.3 MEME-ChIP was performed using the classic mode of motif discovery. The default width of the motif in the MEME server was set to 6 and 200 as a minimum and maximum, respectively.
2.4. Prediction of secondary structural organization
Prediction of secondary structural organization of the selected SOG1 protein orthologues was performed using the “Self-optimized Prediction Method or “SOPMA” online tool (https://npsa-prabi.ibcp.fr). This improved SOPM method (SOPMA) has been shown to correctly predict approximately 69.5 % of amino acids for a three-state description of the secondary structure which includes alpha-helix, beta-sheet, and random coil.16 The PDBsum tool provides pictorial overviews of the macromolecular structures (https://www.ebi.ac.uk) showing the possible arrangements of long α-helices and large β-sheets in this resolution. Ramachandran plots were generated by the PROCHECK tool (https://servicesn.mbi.ucla.edu/PROCHECK).
2.5. Analysis and validation of the three-dimensional structure of SOG1-orthologues
The retrieved amino acid sequences of various SOG1 or SOG1-like proteins from NCBI (https://www.ncbi.nlm.nih.gov) and UniProt (https://www.uniprot.org) were initially checked for their tertiary structure already available in the protein mode portal (https://www.proteinmodelportal.org). As the proteins showed minimal sequence identity with already existing proteins in the protein mode portal, the sequence was submitted to one of the best protein modeling servers named I-TASSER (https://zhanglab.ccmb.med.umich.edu/ITASSER). The I-TASSER server is based on the threading principle to build possible tertiary structure models of a protein from the given primary amino acid sequence. The best structure was selected based on the C-score value, from the five models generated. The overall quality of the constructed three-dimensional models has been verified with the ERRAT online platform (https://www.doe-mbi.ucla.edu/errat).9 ProSA-web was exploited to assess the Z score and energy plots (https://prosa.services.came.sbg.ac.at/prosa.php).51
In addition to I-TASSER, three-dimensional structures of SOG1 or SOG1-like proteins were also generated by a homology-based protein modeling server, SWISS-MODEL Workspace (https://swissmodel.expasy.org).41 This online server utilizes experimentally determined structures of related family members as templates to predict the three-dimensional structure of a protein. Appropriate templates for each SOG1 orthologue protein from different plant species were selected from the 50 templates obtained per search based on various parameters including sequence similarity, query coverage, global model quality estimation (GMQE), and quaternary structure quality estimation (QSQE).49 Hence, one specific template for each protein was chosen based on target-template alignment to build final protein models. The generated protein models were validated using various parameters including QMEAN and QMEAN4 in the SWISS-MODEL web server. QMEAN comprised of three different assessment scoring functions (QMEAN local quality, DisCo and Brane) to understand the geometrical features of the protein model (https://swissmodel.expasy.org/qmean) (Benkert et al., 2009). Finally, QMEAN 4 was used to fit cumulative QMEAN value on a global scale of 0 to 1 range (https://swissmodel.expasy.org/qmean/). The predicted structures were visualized in the open-source PyMOL 1.3 software and a web-based structure viewer iCn3D (https://www.ncbi.nlm.nih.gov/Structure/icn3d/full.html).40, 48
2.6. Prediction of DNA and protein binding sites of SOG1
The functional relevance of the amino acid sequence of a protein is determined by the presence of a specific and conserved stretch of amino acids which are involved in interaction with consensus DNA sequence and with the various domains of other proteins. Possible DNA binding sites of SOG1 were identified by a freely available web-based server FTSite (https://ftsite.bu.edu/).30 Prediction of specific regions of SOG1 protein and its possible interacting partners involved in Protein-protein interactions (PPIs) was done by the “iLoops” web server (https://sbi.imim.es/iLoops.php).33
2.7. Study of Protein-DNA and Protein-Protein interaction
For investigating the molecular mechanism of interaction of the SOG1 transcription factor with the promoter region of its target genes and with other proteins, extensive DNA-protein and Protein-protein docking was performed using HADDOCK web server version 2.2 (https://haddock.science.uu.nl/). The HADDOCK docking server is based on ambiguity-driven docking that specifically measures the docking interfaces for DNA-protein and protein–protein complexes based on experimental knowledge in the form of ambiguous interaction restraints. Ambiguous interaction restraints (AIR) have been generated on the basis of active and passive amino acid residues present in the interacting partners. For DNA-protein interactions, the PDB structure of SOG1 obtained from SWISS-MODEL web server was used as receptor and possible SOG1-binding consensus sequences including Cis1 (GACTTGTTGAAGAAGCC)31 and Cis2 (CTCCGGTCAATCAC) served as the ligands. For examining the binding specificity, mutated versions of Cis1 (GATCCGTTGAAGGGACC) and Cis2 (CTCCGACTGGTCAC) were used. The HADDOCK docking server-generated docking complexes were subjected to Molecular Dynamics (MD) simulation analyses by GROMACS 5.0 software package. The final stable complex that was obtained was then analyzed using NUPROPLOT software (standalone version) to study the actual interacting sites of the protein and cognate DNA molecule.
2.8. Computer system
The computer systems used for various in-silico analyses include Microsoft Windows 7 and 11 operating systems with 4 and 8 GB RAM and Intel Core i3 and i5 processors.
3. Results
3.1. SOG1 protein is widely distributed in the plant kingdom
Earlier studies have indicated the presence of 117 NAC genes in Arabidopsis thaliana which are included in 21 subfamilies59. Among the various members of the NAC domain transcription factors, SOG1, also known as NAC008 is of special interest, as it is one of the major players in DNA damage response pathways in plants, controlling several aspects of DNA damage responses including cell cycle checkpoint regulation, induction of endoreduplication, activation of DNA damage repair pathways and programmed cell death.27 To find out when SOG1 protein was acquired in plants in the course of evolution and its distribution throughout the plant kingdom, we first searched for the orthologues of the SOG1 protein in various plant groups using NCBI (https://www.ncbi.nlm.nih.gov), PhyloGenes (https://www.phylogenes.org), UniProt (https://www.uniprot.org/) and EnsemblPlants (https://plants.ensembl.org). More than a hundred orthologues of SOG1 and SOG1-like proteins were found among various plant groups starting from lycopods (Selaginella moellendorffii), through moss (Physcomitrella patens), ancient angiosperms (Amborella trichopoda) to the modern flowering plants including eudicots (Arabidopsis thaliana) and monocots (Oryza sativa). Interestingly, no SOG1 and SOG1-like proteins were detected in unicellular or multicellular algal members. These observations are similar to those of some earlier studies.59, 57 A very recent study has demonstrated the presence of two copies of SOG1 protein including SOG1a and SOG1b in Physcomitrella patens.39 We have selected P. patens SOG1a (previously known as SOL1) for our study, as it is more similar to the SOG1 protein present in higher plants.18, 39 Phylogenetic analyses by Mega XI software using the primary amino acid sequence of 103 SOG1 orthologues retrieved from NCBI and UniProt databases have revealed that SOG1 proteins from various plant species are distributed in three groups (Fig. 1A). Among the three groups, Cluster 1 contains the ancient angiosperm Amborella trichopoda and Cluster 2 contains the lower group of plants including Selaginella moellendorffii and Physcomitrella patens along with other flowering plants. Cluster 3, on the other hand, possesses only members of eudicots and monocots (Fig. 1A).
Fig. 1.
(A) Sequence homology based Phylogenetic tree of selected 103 SOG1 protein orthologues from various plant species. The phylogenetic tree was constructed in Mega XI using the maximum likelihood algorithm with a bootstrap test with 1000 replications. (B) Organization of various domains in Arabidopsis thaliana SOG1 protein.
We next performed motif analyses of some selective members from the above-mentioned three clusters of the phylogenetic tree by using the online Motif discovery server MEME Suite 5.4.1 (https://meme-suite.org) to examine whether there is any difference in the diversity of various motifs present in the SOG1 orthologues. MEME Suite analyses have revealed that the NAM (NAC) domain is present in all SOG1-orthologues (Fig. 2A-C). Furthermore, each of these SOG1-orthologues has been found to possess an N-terminal extension of approximately 40–45 amino acid residues, which is another feature of SOG1 that distinguishes it from other NAC-domain containing proteins (Fig. 2A-C). It is interesting to note that the moss Physcomitrella patens does not contain any N-terminal extension domain (Fig. 2A). The MEME suite tool found that five significant consensus and conserved C-terminal SQ motifs (E value ≤ 0.05) are present in SOG1-orthologues of all eudicot and monocot members (Fig. 2A-C). We also found two additional SQ motifs in the C-terminal transactivation domain of the SOG1-orthologues found in monocots along with the five conserved SQ motifs, which may be phosphorylated in response to DNA damage (Fig. 2A). On the other hand, the ancient flowering plant Amborella lacks the third SQ motif in the C-terminal domain, but it possesses four additional SQ motifs in its C-terminal transcription regulatory region (Fig. 2A). In contrast, only two SQ motifs can be found in the C-terminal region in P. patens, but their positions are different from that of the SOG1 protein present in the members of the angiosperm. Besides the conserved motifs, SOG1 orthologues also possess other consensus functional motifs (E value ≤ 0.05), which may play different functional roles (Fig. 2A-C). Together, the results of these alignments lead us to propose that SOG1 had been found to be acquired in the earlier land plants (mosses). In addition, SOG1-orthologues present in various angiosperm species having a wide array of growth conditions, show significant variations in motif diversity, which suggests the involvement of the SOG1 transcription factor in different cellular functions along with DNA damage responses and is possibly important for the survival of higher plants under varying environmental and growth conditions.
Fig. 2.
Motif analysis of SOG1-orthologues (A) Phylogenetic Maximum Likelihood tree of the selective SOG1 protein orthologues along with the motif distributions of corresponding protein. The phylogenetic tree (left panel) was constructed based on selective SOG1-orthologues from each cluster of Fig. 1 using the maximum likelihood methods with 1000 bootstrap replications. (B) Different color boxes represent different types of motifs as illustrated, and the scale represents amino acid length. (C) Various conserved motifs including the N-terminal extension, NAM domain and SQ motifs of the SOG1 protein orthologues were obtained using the MEME software.
3.2. Determination of primary structure of SOG1-orthologues
Based on the presence of SOG1 orthologues in various plant groups with variations in motif distribution, we next examined the structural diversity of SOG1-orthologues at primary, secondary, and tertiary levels. We selected SOG1-orthologues from Arabidopsis thaliana (eudicot), Oryza sativa (monocot), Amborella trichopoda (ancestor of flowering plants), and Physcomitrella patens (moss) for in-silico structural analyses. ExPASy–ProtParam tool (https://web.expasy.org/protparam) has been employed to compare various physicochemical properties of SOG1-orthologues from the above-mentioned four plant species, which are summarized in Table 1. The primary level structural stability of SOG1 protein from the four plant species was determined by analyzing various indices including instability index, aliphatic index, and the grand average of hydropathicity (GRAVY) index. From our results it is evident that the instability index value lies above 40 (>40) in the case of all four SOG1 proteins, indicating their relatively unstable nature (Table 1). Besides the instability index, the aliphatic index values lie between 65 and 75 in all SOG1-orthologues, and in contrast to the instability index, maximum and minimum aliphatic indices were detected in Oryza sativa (75.25) and Physcomitrella patens (65.25), respectively (Table 1). In addition, the isoelectric point (pI) expresses the pH at which amino acid, peptide, or protein becomes static and does not find to migrate in an electric field. The ExPASy–ProtParam tool analyses have further revealed that the theoretical pI values for all the SOG1 orthologues lie below 7, suggesting that the proteins are negatively charged and predicted to be acidic (Table 1). The calculated hydropathicity (GRAVY) index value was found to be negative in all SOG1 orthologues (Table 1).
Table 1.
Comparison of various physicochemical properties of SOG1-orthologues from different plant species.
| Species | No Amino acid residues | MW | pI | Instability Index | Aliphatic Index | GRAVY |
|---|---|---|---|---|---|---|
| Arabidopsis thaliana_SOG1 | 449 | 50288.84 |
4.91 | 45.13 | 70.09 | −0.712 |
| Oryza sativa_SOG1 | 418 | 46875.30 |
5.12 |
42.15 | 73.12 | −0.702 |
| Amborella trichopoda_SOG1 | 433 | 48675.19 |
5.06 |
49.00 | 64.83 | −0.738 |
| Physcomitrium patens_SOG1 | 483 | 53475.78 |
5.11 |
53.58 | 63.04 | −0.581 |
Proteins differ from one another by their structures, primarily in their sequences of amino acids. The abundance of certain amino acids determines different structural organization of proteins. We have used ExPASy–ProtParam tool to determine the frequency of various amino acids present in the SOG1 orthologues from various plant species. The results have indicated that primarily eleven amino acids are predominant in the primary structure of SOG1-orthologues (Fig. 3A). The average percentage is displayed as follows:Alanine (5.3 %), Arginine (4.275), Aspartate (7.95 %), Glutamic acid (8.65), Glycine (7.225 %), Isoleucine (4.1 %), Leucine (5.35 %), Lysine (6.7 %), Proline (6.1 %), Serine (7.825 %), and Threonine (5.125 %) (Fig. 3A). All four proteins contain a prominent percentage of Serine and Threonine residues, which contribute to the SQ motifs present in the C-terminal domain of the proteins. Interestingly, all four proteins contain a very low percentage of Cysteine residues (approximately 2 %), which typically contribute to Di-sulfide bonding (Fig. 3A).
Fig. 3.
Predication of primary and secondary structural components (A) Comparison of percentage of different amino acids in SOG1 protein orthologues of various plant species. (B) Comparison of predicted secondary structural components in SOG1-orthologues from the above-mentioned species.
3.3. Determination of secondary structural components of SOG1 orthologues
Analyses of the amino acid sequences using the SOPMA tool revealed that all SOG1 orthologues from various plant species are dominated by alpha-helices, beta-strands, and random coils (Fig. 3B; Fig. 4A-D). Fig. 3B represents the comparative percentage of various secondary structural components of SOG1 orthologues obtained from the secondary structural annotation analysis by SOPMA (https://npsa-prabi.ibcp.fr/cgi-bin/secpred_gor4.pl), from which it is clear that random coil (approximately 52.8–64.67 %) is predominantly present in all the plant species followed by alpha helix (approximately 12–33 %) and beta-strands (14–24 %). Interestingly, maximum random coil structures are present in the N-terminal conserved NAC domain of the SOG1 protein in all the plant species (Fig. 3B). In addition, the SOG1 orthologue present in Oryza sativa has been found to possess maximum alpha-helix and least beta-strands (Fig. 3B). According to the analysis by GOR4, other secondary structures including 310 helices, pi helix, Beta Bridge, bend region, ambiguous states, and other states are absent in all SOG1 or SOG1-like proteins. Furthermore, PDBsum analyses (https://PDBsum/thornton-srv/databases/cgi-bin/pdbsum) have revealed that four beta-hairpin structures are present in SOG1 orthologues of all three angiosperm plant species including, Arabidopsis thaliana (between Arg 130-Lys 141, Lys 158-Val140, Gly168-Gly177, Val 181-Leu192, Oryza sativa (between Thr 143-Gly 144, Arg 158-Val 161, Leu 167- Thr 179, Glu 186-His 196) and Amborella trichopoda (between Lys 139-His 141, Val151-Ile 163, Lys 169-Arg 179, Asp 188-His 198) (Fig. 3B; Fig. 4A-C). On the other hand, the SOG1 protein of the moss Physcomitrella patens has been found to possess five beta-hairpin structures (between Gln38-Gly39, Phe122-Gln123, Ser167-Phe170, Gln191-Ser192, Glu201-Ile207) (Fig. 3B; Fig. 4D).
Fig. 4.
Analysis of Secondary Structural components (A-D) Schematic wiring diagram of SOG1-orthologues showing key secondary structural components (E-H) Ramachandran plots obtained from PROCHECK showing the distribution of amino acids in the allowed and disallowed regions with their phi/psi angles.
The PROCHECK tool was utilized to generate the Ramachandran plot (Phi/Psi) of all SOG1 orthologue proteins. Stereochemical analysis of psi and Phi dihedral angles by PROCHECK showed the presence of 76.98 % (Arabidopsis thaliana), 75.98 % (Oryza sativa), 82.78 % (Amborella trichopoda), and 80.38 % (Physcomitrella patens) amino acid residues of different SOG1-orthologues within the most favored region, represented by red color region in the Ramachandran plot, indicating native and stable conformation of the proteins (Fig. 4E-G). The brown-colored region covering 17.9 %, 19.7 %, 16.2 %, and 16.48 % of amino acid residues of the SOG1-orthologues from Arabidopsis thaliana, Oryza sativa, Amborella trichopoda and Physcomitrella patens, respectively are found to be in the additionally allowed region (Fig. 4E-H). The dark yellow-colored regions contain approximately 1–3 % of the constituent amino acid residues and are considered to be in the generously allowed regions (Fig. 4E-H). In addition, among the four SOG1 orthologues, only in Physcomitrella and Amborella SOG1 proteins, approximately 3.7 and 2.2 % of the total amino acid residues are shown to fall in the disallowed region of the Ramachandran Plot (Fig. 4G and H). This is possibly due to steric hindrance or clashes between the atoms of the amino acid residues.
3.4. Analysis of the generated three-dimensional tertiary structure of SOG1 orthologues
For predicting the three-dimensional structure of SOG1 orthologue proteins from their primary amino acid sequences, we first utilized the I-TASSER server (https://zhanglab.ccmb.med.umich.edu/I-TASSER), which is an important online server for automated protein structure prediction. The I-TASSER server first developed five models for each SOG1 orthologue by utilizing the full-length primary amino acid sequences provided (Fig. 5A-D). Among the five models, the model with the highest C-score was selected for each plant species (-1.46 for Arabidopsis thaliana, −1.36 for Oryza sativa, −1.44 for Amborella trichopoda, and −1.42 for Physcomitrella patens) for further analyses. Next, we have employed homology modeling-based protein-modeling server SWISS-MODEL Workspace (https://swissmodel.expasy.org) to visualize the three-dimensional structure of the SOG1 protein orthologues by utilizing experimentally determined structures of related family members as templates (Bordoli et al., 2009). Analyses by SWISS-MODEL Workspace revealed the presence of 16, 17, 20, and 26 template models for the SOG1 protein present in Arabidopsis thaliana, Oryza sativa, and Amborella trichopoda and, Physcomitrella patens respectively, and all the SOG1-orthologues were found to form a homodimer in their oligo state conformation (consists of Chain A and B) (Fig. 6A-D). The best fit models generated by model-template alignment in Fig. 7A-D were based on the top five templates. The templates for each protein were selected for the model building based on the highest quality of sequence identity, best E value, and the maximum number of query sequences covered. The resulting models were further assessed by examining the cumulative QMEAN4 (QMEAN local quality, DisCo, Brane) score (https://swissmodel.expasy.org/qmean). In the case of all SOG1 orthologues from various plants, the maximum homology was observed with two NAC domain family transcription factors. In the context of QMEAN4 global scores, the Z-score indicates overall model quality and measures the deviation of the total energy of the predicted structure for an energy distribution derived from random conformation. Earlier studies have revealed that larger QMEAN scores indicate better models whereas negative scores refer to unstable models.4 Our results have indicated that the QMEAN Z-score values of SOG1 orthologue proteins were found to be 0.52 ± 0.05 (Arabidopsis thaliana), 0.50 ± 0.05 (Oryza sativa), 0.50 ± 0.05 (Amborella trichopoda) and 0.48 ± 0.11 (Physcomitrella patens), respectively (Fig. 6E-H), suggesting that all structures are clearly within the expected quality range as they have deviated less than 1 standard deviation from the mean score in similar sized high-quality proteins from the reference dataset. Furthermore, QMEAN4 is a composite score-based linear combination of four statistical potential terms-local geometry, distance-dependent interaction, agreement of the predicted secondary structure and solvent accessibility, and solvation potential calculation. The global model quality estimation (GMQE) for QMEAN4 ranged between 0 and 1.7 When we analyzed the selective four proteins for QMEAN4, the GQME value was found to be 0.22 (Arabidopsis thaliana), 0.24 (Oryza sativa), 0.20 (Amborella trichopoda), and 0.19 (Physcomitrella patens), respectively, which was fitted under the GMQE scale.
Fig. 5.
Determination of three-dimensional structures by fold recognition or threading using I-TASSER (A-D, left panel) Predicted three-dimensional structures of SOG1-orthologues from Arabidopsis thaliana, Oryza sativa, Amborella trichopoda, and Physcomitrella patens, respectively, obtained from I-TASSER server using the full-length amino acid sequences. The right panels show the enlarged view of the conserved NAM domain of the respective proteins.
Fig. 6.
Homology-modeling for obtaining template-based three-dimensional models of SOG1-orthologues (A-D) Homology-modeling-based predicted three-dimensional models of SOG1-orthologues using SWISS-MODEL Workspace. All the proteins were shown to form homodimers in their oligo state. (E-H) Quality comparison of the obtained models, the QMEAN Z-score values for the above-mentioned proteins as obtained from SWISS-MODEL Workspace.
Fig. 7.
Evaluation of structure quality of the three-dimensional model of SOG1-orthologues obtained from I-TASSER server using the ProSA-web service. (A-D) ProSA-web z-scores of SOG1-orthologues from Arabidopsis thaliana, Oryza sativa, Amborella trichopoda, and Physcomitrella patens, respectively, determined by X-ray crystallography (light blue) or NMR spectroscopy (dark blue) with respect to their length. The z-scores of SOG1-orthologues are indicated by large black dots. (E-H) Energy plot of SOG1-orthologues. Residue energies averaged over a sliding window are plotted as a function of the central residue in the window. A window size of 80 is used due to the large size of the protein chain (default: 40). (I-L) Jmol Ca trace of SOG1-orthologues. Residuesare colored from blue to red in the order of increasing residue energy. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
The models obtained from the SWISS-MODEL Workspace server were then verified using the online ERRAT platform (Fig. 7A-L) (https://www.doe-mbi.ucla.edu/errat). According to the ERRAT server, error values below 95 % are considered to be within the rejection limit.9, 10 Analyses using the ERRAT server showed an overall quality factor > 96 in the case of all proteins, a result expected for crystallographic models with resolutions > 2.5 Å. The values obtained in the ERRAT server were further normalized with the protein size and quantitatively assessed by high-resolution X-ray crystallography. Online server ProSA (https://prosa.services.came.sbg.ac.at/prosa.php) was used to obtain the Z-score for all four proteins. The Z-score indicates the overall model quality of a protein and it also measures the total energy deviation of the respective protein structure. The Z-score of the protein is represented by a dark black point in the ProSA server plot (Fig. 7A-D). The Z-score values obtained from the ProSA server are −5.53 (Arabidopsis thaliana), −5.9 (Oryza sativa), −4.99 (Amborella trichopoda), and −5.59 (Physcomitrella patens), respectively and all the values are well within the permissible range, suggesting highly stable structures for all SOG1 orthologues. When the normalized Z-scores were plotted on the “Y”-axis against the number of constituent amino acid residues on the “X”-axis, the Z-score values were found to be well within the allowed region of the graph in case of all the four proteins. These obtained Z-scores of the model proteins are found to be close to the score of the templates, suggesting that the predicted models are reliable and very close to the experimentally determined structure.
Taken together, after complete assessment and validation of the predicted models for all SOG1 protein orthologues using various three-dimensional structure prediction tools, it was found that the quality of all the predicted three-dimensional models was good and reliable for all the proteins.
3.5. Molecular docking analysis of SOG1 transcription factor with the consensus SOG1-binding motifs
SOG1 is an important transcription factor and acts as the positive regulator of DNA damage response in plants.55, 31, 5, 28 SOG1 has been shown to regulate the expression of more than hundreds of genes including KRP6, SMR4, SMR5, SMR7, WEE1, BRCA1, and RAD51 which are involved in cell cycle checkpoint regulation and DSB repair pathways, under DNA damaging conditions.31, 5 Moreover, these investigations have further revealed that SOG1 binds to the -CTT(N)7 AAG- consensus motif present within the 1-kb promoter region of its target genes31, 5. The same study by Ogita et al31specifically showed that SOG1 directly binds with the promoter of AtRAD51, an important gene of the Homologous Recombination (HR) repair pathway, using the consensus SOG1-binding CTTGTTGAAGAAG motif present approximately −80 to −51 bp upstream of the transcription start site following treatment with DSB inducing agent zeocin31. In addition, SOG1 is a NAC-domain family transcription factor and the N-terminal conserved NAC domain is involved in binding with the promoter region of its target genes.59, 27 So, besides the -CTT(N)7AAG- consensus motif, we also searched for other putative NAC domain-binding motifs in the promoter region of the AtSOG1 target genes, including cell cycle regulatory and DSB repair pathway genes. In-silico analysis of ∼ 1 kb promoter fragment of various AtSOG1-target genes using PlantPAN 3.0 online software (https://PlantPAN.itps.ncku.edu.tw) revealed the presence of NAC domain binding 5′-(N)4GTCAA(N)4-3′ sequence within 1-kb promoter region upstream of the transcriptional start site of various SOG1 target genes including SMR5, SMR7, WEE1, BRCA1, RAD54 and RAD51. In addition, PlantPAN 3.0 online software also indicated that SOG1 is a potent transcription factor that can bind to this sequence. However, information regarding the molecular mechanism of how the AtSOG1 protein binds with the promoter region of its target genes is relatively limited. Based on this background we were interested in studying the mechanism of AtSOG1′s binding with its target gene promoter by using in-silico DNA-protein docking approaches. So, we have utilized the Arabidopsis thaliana SOG1 protein model obtained from SWISS-MODEL Workspace along with two oligonucleotide sequences including “5-GACTTGTTGAAGAAGCC-3” and “5- CTCCGGTCAATCAC-3” as Cis1 and Cis2, respectively to carry out various in-silico DNA-protein docking analyses.
Online ligand-binding site prediction server FT Site (https://ftsite.bu.edu) predicted that there are three binding sites present in Arabidopsis thaliana SOG1 protein. Each binding site is constituted of specific amino acid residues (Fig. 8A and B). Site 1 is composed of Lys at 182nd (Chain A), Arg at 156th (Chain B), Lys at 182nd (Chain A), Phe at 209th (Chain A) and Gln at 212th (Chain B), Thr at 154th (Chain A), Lys at 170th (Chain A), Tyr at 190th (Chain B), Ile at 172nd (Chain B), and Pro at 66th (Chain A) position. Binding site 2 is constituted of Asp at 65th (Chain A), His at 188th (Chain A), Lys at 207th (Chain B), Met at 173rd (Chain B), Lys at 171st (Chain B), Tyr at 190th (Chain A), Lys at 170th (Chain B), Ser at 67th (Chain A), Asp at 68th (Chain A), Trp at 55th (Chain A), and Asp at 65th (Chain B) position. Site 3 is composed of Lys at 63rd (Chain A), Phe at 64th (Chain A), Pro at 213th (Chain B), Thr at 119th (Chain B), and Val at 186th (Chain A), position respectively. Interestingly, all the binding site is present around the N-terminal conserved NAC domain. The three regions of binding sites ‘mesh’ representation using PyMol are shown in Fig. 8A and B. Next, we performed extensive DNA-protein docking of Cis1 and Cis2 consensus motifs with AtSOG1 protein by utilizing the online HADDOCK webserver ver.2.2 (https://hdock.phys.hust.edu.cn), and the top five DNA-protein complexes were generated. The stability of resulting docked DNA-protein complexes indicated the strength of interaction and less HADDOCK score (more negative) is indicative of more stability. The HADDOCK score is expressed as the sum of various parameters including van der Waals energy, electrostatic energy, desolvation energy, and restraint violation energies.47 On the other hand, the Z-score shows the dependability of the particular docked complexes from the cluster. Therefore, global energy values of docking results act as parameters to determine the ease of interaction between the oligo sequences and SOG1 protein. The lower the energy value is more the stability of the complex and thus more potent will be the interaction. From the predicted Cis1-SOG1 and Cis2-SOG1 complexes in the HADDOCK server, Cluster 2 and Cluster 1 were found to be the best-docked complex having the lowest HADDOCK score of −82.82 +/- 1.78 and −98.4 +/- 3.2 respectively, indicating the formation of a slightly more stable complex with Cis2 (Fig. 8C and F; Supplementary Table 1 and Supplementary Table 2). The calculated Z-scores were found to be −1.8 and −1.6, respectively for the above-mentioned clusters (Supplementary Table 1 and Supplementary Table 2).
Fig. 8.
Molecular Docking Analysis of SOG1 with consensus SOG1-binding sequences. (A and B) Prediction of ligand (DNA) binding sites in Arabidopsis thaliana SOG1 protein for its functional characterization. Pink, green and blue colored mesh represents the respective first, second and third binding sites for COP1 protein predicted by FT Site server. (C-D) Illustration of 3-dimensional representations of the SOG1-Cis1 and SOG1-Cis2 docked complexes using the HADDOCK online docking server. (E and F) Three-dimensional representations of the docked complexes of mutated Cis1 and Cis2 oligo sequences. (G) Three-dimensional representations of the docked complex of Cis2 oligo sequence with mutated version of SOG1 protein. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
To obtain more specific and reliable information regarding the mechanism of binding of SOG1 with Cis1 and Cis2 oligonucleotide sequences, the resulting DNA-protein complexes were subjected to simulations of 100 ns (ns). In the case of both Cis1-SOG1 and Cis2-SOG1 complexes, the structures were found to be stable throughout the simulation analysis. Comparative binding free energy estimation by following the MM-PBSA method revealed average binding free energies of −374.24 kJ/ mol and −417.66 kJ/mol, for Cis1 and Cis2 consensus SOG1-binding motifs, respectively, suggesting the formation of more thermodynamically stable DNA-protein complex between AtSOG1 and Cis2 with greater binding affinity as compared to AtSOG1 and Cis1. This also supports the HADDOCK score data. Simulation analyses have further revealed that both Cis1-SOG1 and Cis2-SOG1 complexes were stabilized by H-bond and hydrophobic interactions between each oligonucleotide sequence and both A and B chains of SOG1 homodimer (Table 2). The Cis1-SOG1 complex was stabilized by the formation of hydrogen bonds with the amino acid residues Lys158 (Chain A), Thr154 (Chain A), Phe209 (Chain A), Gly155 (Chain B), and Arg156 (Chain B) along with four hydrophobic interactions with amino acid residues Tyr210 (Chain A), Pro66 (Chain A), Lys207 (Chain B) and Gln211(Chain A), respectively (Table 2). On the other hand, the most commonly found amino acid residues of AtSOG1 involved in the formation of H-bond with Cis2 oligo were identified to be Thr at 154th (Chain A), Gly at 155th (Chain A), Arg at 156th (Chain A), Lys at 182nd (Chain A) and Gln at 212th (Chain b) along with four hydrophobic interactions with Tyr210A, Lys158A, Pro213B and Phe209A (Table 2). Interestingly, both Cis1 and Cis2 oligos have been shown to interact with SOG1 protein at binding site 1, and the Glycine residue present at the 155th position of either Chain A or Chain B is found to be crucial for binding (Table 2).
Table 2.
Amino acid residues participated in hydrogen bond and hydrophobic interactions in SOG1-DNA interactions.
|
Pre-simulated |
Post-simulated |
|||
|---|---|---|---|---|
| Hydrogen Bonds | Hydrophobic interactions | Hydrogen Bonds | Hydrophobic interactions | |
| Cis1-SOG1 | Ile172A-DG3 (2.771 Å)Lys182B-DT (2.176 Å)Lys158A-DG3 (2.263 Å)Lys170A-DA (2.883 Å)Gly155B-DG3 (2.314 Å)Lys153B-DG (2.304 Å)Arg156B-DA (3.071 Å)Gln211A-DC (1.893 Å)Thr154A-DA (2.569 Å)Phe209A-DC5 (2.360 Å) |
Tyr210A Tyr210B Lys207B Pro66A Pro213A Gln212B Gln211B |
Lys158A-DG3 (2.263 Å)Lys182B-DT (2.176 Å)Gly155B-DG3 (2.314 Å)Thr154A-DA (2.569 Å)Arg156B-DA (3.071 Å)Phe209A-DC5 (2.360 Å) |
Tyr210A Pro66A Lys207B Gln211B |
| Cis2-SOG1 | Thr154A-DC (2.535 Å)Gly155B-DC3 (3.485 Å)Lys182A-DT (2.827 Å)Lys182A-DG (3.190 Å)Arg156A-DT (2.571 Å)Ile172B-DG3 (3.588 Å)Gln212B-DG (1.922 Å)Gln212B-DC (1.638 Å) |
Tyr210A Lys158A Pro213A Pro213B Phe209A |
Thr154A-DC (2.535 Å)Arg156A-DT (2.571 Å)Lys182A-DT (2.827 Å)Gln212B-DG (1.922 Å)Gln212B-DC (1.638 Å) Gly155B-DC3(3.485 Å |
Tyr210A Lys158A Pro213B Phe209A |
| Cis1_Mutated SOG1 | Gln211A-DC (2.665 Å)Phe209A-DC5 (2.846 Å)Lys182B-DT (3.224 Å) |
Tyr210A Pro213A |
Phe209A-DC5 (2.846 Å)Lys182B-DT (3.224 Å) |
Tyr210A Pro213A |
| Cis2_Mutated SOG1 | Lys182A-DT (3.264 Å)Gln212B-DG (1.864 Å)Gln212B-DC (2.452 Å) |
Lys158A Pro213A |
Lys182A-DT (3.264 Å)Gln212B-DG (1.864 Å) |
Pro213A |
| Mutated Cis1_SOG1 | Arg138B-7A (4.859 Å)Lys153B-7A (4.071 Å)Gly155B-8A (3.775 Å) |
Lys182B | Arg138B-7A (4.859 Å)Lys153B-7A (4.071 Å) Gly155B-8A (3.775 Å |
Lys182B |
| Mutated Cis2_SOG1 | His152A-3A (2.684 Å)Gly177A-2A (4.316 Å) |
Lys137A | His152A-3A (2.684 Å)Gly177A-2A (4.316 Å) |
Lys137A |
To further substantiate the importance of the amino acid residues, present in binding site 1 of SOG1 in promoter binding, we then substituted Gly (GGA)-155 to Arg (AGA) and then performed the docking analysis using the same Cis1 and Cis2 oligo sequences. The substitution caused a significant increase in binding free energies (-268.214 kJ/ mol for Cis1 and −366.82 kJ/mol for Cis2), along with a reduction in the number of H-bonds and hydrophobic interactions, resulting in weak DNA-protein complexes as evidenced by the increase in HADDOCK score (-52.45 +/- 4.6 for Cis1-mutated SOG1 and −56.64 +/- 4.8 for Cis2-mutated SOG1) (Supplementary Table 3). In addition, we have also examined the sequence specificity of the consensus Cis1 and Cis2 motifs for binding with the SOG1 transcription factor. To achieve this, we performed DNA-protein docking analyses by using mutated versions of Cis1 and Cis2 (GATCCGTTGAAGGGACC and CTCCGACTGGTCAC, respectively) (Fig. 8D and F). Significant increase in HADDOCK score (-60.45 +/- 3.8 for Cis1 and −62.36 +/- 4.0 for Cis2) and average binding free energies (-106.285 kJ/ mol for Cis1 and −96.044 kJ/mol for Cis2 consensus motif, respectively) were detected after the alteration of the nucleotides in the consensus motifs. This is accompanied by a significant reduction in the number of H-bonds and hydrophobic interactions, ultimately resulting in weak and unstable DNA-protein complexes.
3.6. Analysis of the possible physical interaction network of SOG1 transcription factor by molecular docking
Proteins and their functional interactions form the backbone of the cellular machinery. As SOG1 is the central regulator of DNA damage response in plants, its connectivity network needs to be considered for the understanding of its biological function. To predict the possible interacting partners of SOG1, STRING analysis was performed. STRING analyses revealed a complex interacting network of SOG1 with possible physical interaction with various important players of the DNA damage signaling network including NAC103 and BRCA1 (Fig. 9A and B). However, no experimental data is available for verifying these interactions.
Fig. 9.
Protein-protein interaction network of SOG1. (A and B) STRING analyses showing possible interacting partners of SOG1. (C-E) Various three-dimensional representations of SOG1-NAC103 docked complex were obtained from the HADDOCK online docking server. (F-H) Various three-dimensional representations of SOG1-BRCA1 docked complex were obtained from the HADDOCK online docking server.
After the selection of the putative interacting partners, we next analyzed the possible interaction sites present in the above-mentioned proteins using the online web server iLoops (https://sbi.imim.es/iLoops.php). The “iLoops” web server uses the local structural feature to predict a possible physical interaction between two proteins.33 The “iLoops” web server data revealed that the SOG1 transcription factor contains three positive interaction signatures (68–80, 108–114, and 149–177), all of which are situated in the conserved NAC domain. BRCA1 protein contains two positive interaction signatures (26 – 36 and 35–46), both of which are present on the N-terminal zinc-finger domain, also known as the RING domain. Earlier studies have also demonstrated that the RING domain of BRCA1 is involved in binding with BIRD1.35 Similar to SOG1, the NAC103 transcription factor also possesses three positive interaction signature sequences (48–54, 82–100, and 87–115, respectively) in the conserved NAC domain. So, based on this analysis we examined the possible physical interaction of SOG1 with the above-mentioned proteins and also tried to unravel the amino acid residues of SOG1 responsible for the interaction through extensive protein–protein docking analysis by employing the online HADDOCK web server (https://hdock.phys.hust.edu.cn), as mentioned in the “Materials and Methods” section. The balanced outputs were preferred from the docking results as this mode takes into account all possible modes of interactions. After a successful docking run, the possible four clusters for both SOG1-NAC103 and SOG1-BRCA1 have been ranked by the HADDOCK server. The clusters were ranked based on the average HADDOCK score of the top 4 members of each cluster, which is a weighted sum of electrostatics, van der Waals, restraints energy, buried surface area, and empirical desolvation energy terms .13 The best clusters for the SOG1-NAC103 and SOG1-BRCA1 interactions were found to be Cluster1 (Fig. 9C-E) and Cluster 2 (Fig. 9F-H), respectively (Supplementary Table 4 and Supplementary Table 5). The Cluster 1 docked complex for SOG1-NAC103 showed a HADDOCK score of −109.3 +/- 4.6 with an RMSD value of 1.66+/- 0.2 and Cluster2 of SOG1-BRCA1 exhibited a HADDOCK score of −102.4 +/- 4.4 with RMSD value of 1.12 +/- 0.8, suggesting that both NAC103 and BRCA1-RING domain can form stable protein–protein complexes with the SOG1 transcription factor (Fig. 9C–H; Supplementary Table 4 and Supplementary Table 5). This is further supported by significantly low binding energy for both the interactions (-422.3 Kcal/mole for NAC103 and −448.7 Kcal/mole for the RING domain of BRCA1 protein respectively).
Simulation runs of the above-mentioned protein–protein interaction data have further revealed that SOG1-NAC103 and SOG1-BRCA1 complexes are stabilized by various non-bonded interactions including hydrogen bonding, hydrophobic interactions and also some ionic interactions. Moreover, specific amino acid residues are involved in these interactions. Interaction of SOG1 and NAC103 involves the interaction of Gly68, Pro69, Asn135, Ser136, Pro141, Ile142, Lys151, Arg173 of SOG1 with Arg156, Gln212, Pro213, Lys109, Thr183, Gln212 of NAC103 protein. On the other hand, the interaction between SOG1 and RING domain of BRCA1 protein involves the interaction of Gly68, Pro69, Asn135, Ser136, Pro141, Ile142, Lys151, Arg173 of SOG1 with Arg156, Gln212, Pro213, Lys109, Thr183, Gln212 of BRCA1 protein, respectively. Taken together these results suggest that the SOG1 transcription factor physically interacts with the NAC103 transcription factor; and the RING domain of the BRCA1 protein.
4. Discussion
The evolution of the plant kingdom has resulted in a wide range of structural and functional complexity, from the preliminary unicellular algal mats, through multicellular green and red algae, bryophytes, lycopods, and ferns with terrestrial habitat, to the more complex gymnosperms and flowering plants (angiosperms).45 Analyses of available protein databases have suggested the presence of SOG1-orthologues in more than a hundred plant species belonging to various plant groups including mosses, lycopods, ancient angiosperms, eudicots, and monocots, suggesting a wide distribution of this transcription factor (Fig. 1A). ATM or ATR-mediated hyperphosphorylation of the SQ motifs present in the trans-regulatory domain in response to DNA damage plays a crucial role in the activation of the SOG1 transcription factor.56, 27 In our study, motif analyses of the SOG1 protein orthologues have indicated that all orthologues possess the conserved N-terminal NAC domain, but variation could be detected in the C-terminal domain, which contains the SQ motifs and is responsible for the transcriptional activity of SOG1 (Fig. 1B). In majority of plant species, five SQ motifs are present in the C-terminal domain. A higher number of SQ motifs are present in the monocots and the number of SQ motifs is reduced in the lower groups of plants (Fig. 1B). Moreover, variation in the SQ motif is also detected in plant species growing in different habitats and environmental conditions. Together, these results have suggested a degree of conservation in the structure of SOG1 protein throughout the plant kingdom, indicating towards conservation of basic DNA damage response mechanism in the plant kingdom. Furthermore, variation detected in the C-terminal transactivation domain is possibly acquired in the course of evolution to cope with various environmental conditions.
The physicochemical properties of a protein are determined by the analogous properties of the amino acids in it. Important functional characteristics of proteins include their molecular weight, isoelectric point, aliphatic index, and hydropathicity index (GRAVY). Analysis of physicochemical properties of SOG1 protein orthologues has revealed an instability index value > 40 in all four plants with maximum and minimum values detected in the SOG1-orthologues of Physcomitrella (53.58) and Oryza (40.05), respectively (Table 1), suggesting that the stability of SOG1 protein has increased with the course of time and evolutionarily advanced plants possess a more stable SOG1 protein. Besides the instability index, the aliphatic index stands for the relative volume occupied by the aliphatic side chains of the constituent amino acids of a given protein. An increased aliphatic index is indicative of the increased thermostability of globular proteins.21 Our results have shown relatively high aliphatic indices for SOG1-orthologues with maximum and minimum values observed for Oryza sativa (monocot) and Physcomitrella patens (moss), respectively (Table 1). This high aliphatic index of SOG1 protein in Oryza indicates that this protein is more thermostable in monocots as compared to other SOG1 orthologues and the thermostability of the SOG1 protein has increased during evolution. The grand average of hydropathicity index (GRAY) is used to represent the hydrophobicity value of a peptide22. Positive GRAY values indicate hydrophobic; negative values mean hydrophilic.15 Our results have shown that SOG1-orthologues have negative GRAY values, indicating their hydrophobic nature. Parallel with this observation, the amino acid composition of SOG1 proteins also revealed that this transcription factor is mainly dominated by non-polar amino acids including Alanine (5.3 %), Glycine (7.225 %), Isoleucine (4.1 %), Leucine (5.35 %) and Proline (6.1 %), which further substantiate the obtained GRAY value (Fig. 3A). Furthermore, the pI value of all SOG1-orthologues is below 7 (Table 1), indicating that the SOG1 protein in all four species is negatively charged and acidic. This is further supported by the presence of a significant percentage of negatively charged amino acids including Aspartate (7.95 %) and Glutamate (8.65) (Fig. 3A.).
Analysis of the secondary structural components by the GOR4 online server indicated that all the SOG1 orthologue proteins present in various plant species are dominated by random coil structures, indicating significant evolutionary conserveness (Fig. 4). Interestingly, maximum random coil structures are present in the N-terminal conserved NAC domain of the SOG1 or SOG1-like proteins in all the plant species. Previous studies have indicated that different secondary structural elements of proteins including α-helices and β-strands differ in their ability to withstand mutations due to the occurrence of different numbers of non-covalent interactions within these secondary structural units, thus impacting the stability of the protein1. Due to their higher numbers of inter-residue contacts, the α-helical structures can accumulate more mutations as compared to β-strands without committing any structural change.1 Our results have revealed that the SOG1 orthologue present in Oryza sativa (monocot) has been found to possess more α-helices and the least β-strand structures as compared to other plant species including Arabidopsis thaliana (eudicot), and Amborella trichopoda (ancestor of flowering plants) and Physcomitrella patens (moss) (Fig. 3B, Fig. 4A-D). Furthermore, Ramachandran Plot generated in PROCHECK further revealed that in Oryza sativa (monocot), the maximum percentage of SOG1-constituent amino acids is found to be within the “favorable” regions of the plot as compared to the other three plants (Fig. 4E-H). Together these analyses suggest increasing stability of the SOG1 protein in the course of evolution and the protein has evolved in such a way as to reduce the effect of mutations, as a congruent byproduct of adaptive robustness to various adverse environmental conditions.
The tertiary structure of a protein refers to the overall three-dimensional arrangement of its polypeptide chain in space. Protein molecules undergo folding with the help of molecular chaperones in such a way as to gain maximum stability and lowest energy state resulting in a defined overall three-dimensional shape. Three-dimensional structures of proteins provide valuable insights into their function on a molecular level and inform a broad spectrum of applications in life science research. For tertiary structure analysis, we have utilized the I-TASSER protocol, which is based on iterative fragment assembly simulations and is utilized for automated protein modeling.37, 54 I-TASSER generates a full-length structural model of a protein from the primary amino acid sequence provided and the best models are selected based on C-score (between −5 to 2), and TM score (>0.5).54 In our study, the three-dimensional models of SOG1 orthologues from Arabidopsis thaliana, Oryza sativa, Amborella trichopoda, and Physcomitrella patens generated by I-TASSER showed significantly high C-score and TM score values of more than −1.5 and 0.5, respectively. This suggests that all the models are expected to be of good quality.
Homology modeling has become a crucial technique in structural biology. It facilitates narrowing down the gap between known protein sequences and experimentally determined protein structures.49 In our study, in the case of all SOG1-orthologues, the GMQE and QSQE values were found to be fitted within the range. Moreover, the calculated Z-score value for each protein was also within the permissible limit (Fig. 6E-H). Furthermore, the selected structural models of SOG1 orthologues from four plant species have been shown to form a homodimer in their oligo state (Fig. 6A-D). Together these analyses have provided a meaningful clue that SOG1 protein can form stable three-dimensional structures, which is important for performing their functional role. The quality of the models obtained by I-TASSER was verified in the ProSA program (Protein Structure Analysis), which is an established tool for the verification and refinement of structural protein models.51 The z-score obtained in ProSA program is indicative of the overall quality of the three-dimensional protein model and also estimates the total energy deviation regarding the energy distribution derived from random conformations.42, 43, 51 A group of structures from various sources including X-ray crystallography and NMR are distinguished by different colors. In our study, the z-score values for the SOG1-orthologues from all four plant species are clearly within the range of scores typically found for proteins of similar size belonging to one of these groups (Fig. 7A-D).
Transcription Factors (TFs) bind to the promoter of their target genes by using a consensus DNA binding sequence present in the promoter region of its target genes to control the expression of the target genes.20 Previous studies have indicated that SOG1 binds to the -CTT(N)7 AAG- consensus motif present within the 1-kb promoter region of its target genes including BRCA1 and RAD51.31, 5 In addition, PlantPAN 3.0 online software (https://PlantPAN.itps.ncku.edu.tw) has further revealed the presence of conserved NAC domain binding 5′-(N)4GTCAA(N)4-3′ sequence within the 1-kb promoter region of various SOG1 target genes. Previous studies have indicated that NAC domain family transcription factors including SOG1 have been shown to form homodimers or heterodimers that can recognize DNA sequences in their target gene promoter.52, 12, 29, 53, 17, 31 In our study, the SWISS-MODEL Workspace data have shown that SOG1 protein forms a homodimer in their oligo state conformation. Furthermore, DNA-protein molecular docking analyses have revealed the formation of thermodynamically stable complexes between SOG1 and both 5′-CTT(N)7 AAG-3′ and 5′-(N)4GTCAA(N)4-3′ consensus sequences (Fig. 8C and D). Earlier studies have suggested that the interaction between a transcription factor and its target gene promoter is mainly facilitated by various non-bonded contacts including hydrogen bonds, Van der Waals interactions, hydrophobic bonds, and salt bridges.25, 46 Similar to this observation, our DNA-protein complex simulations data have also revealed that the interactions between SOG1 and both 5′-CTT(N)7 AAG-3′ and 5′-(N)4GTCAA(N)4-3′ consensus sequences formed stable DNA-protein complexes, which were primarily stabilized by H-bonding and hydrophobic interactions (Table 2).
Earlier studies have indicated that in the knockout mutant line of the AtSOG1 gene (sog1-1), where Gly residue at the 155th position is substituted with Arg, the SOG1 protein becomes non-functional.34, 55 Similarly, when Gly residue at the 155th position was substituted with Arg in amino acid sequence of SOG1 protein, the three-dimensional structure of the protein through the SWISS-MODEL workspace was not changed, but interestingly, the DNA-protein complexes resulting from the interaction of both Cis1 and Cis2 with the mutated SOG1 protein were found to be less stable with increasing HADDOCK score and binding free energy and with less number of non-bonded interactions as compared to that with wild type SOG1 protein (Fig. 8G; Table 2 and Supplementary Table 3). In addition, alterations in the nucleotide sequence of Cis1 and Cis2 decreased the binding affinity for SOG1 protein with the formation of less stable SOG1-Cis1 and SOG1-Cis2 complexes, indicating the binding specificity of SOG1 in the promoter of its target genes’ promoter.
Transcription regulation is a complex and elaborate process involving several stages. Along with the direct binding of transcription factors to their target genes, interactions between transcription factors and several transcription accessory proteins are also crucial for transcription regulation. These proteins include dimerization partners, mediators, several cofactors and TF activity-modulating enzymes (such as phosphatases and kinases), which enhance the transcription regulatory activity of a transcription factor.6, 24, 14, 36, 19 STRING analyses have revealed that the SOG1 transcription factor is associated with a complex network of many proteins including NAC103 and BRCA1 (Fig. 9A and B). Previous studies have indicated that there is a functional interaction between NAC103 and the SOG1 transcription factor in the downstream of SOG1-mediated DNA damage response signaling pathway.38 The same study has shown that SOG1 possibly binds with the promoter of the NAC103 gene to regulate its expression. But possible physical interaction of SOG1 with NAC103 protein has not been investigated. Our Protein-protein docking data have shown the formation of a stable SOG1-NAC103 heterodimer complex (Fig. 9C-E; Supplementary Table 4). Moreover, the conserved N-terminal DNA binding NAC domain of both these transcription factors is found to be involved in this physical interaction. The amino acid residues involved in this interaction were also identified for both the proteins. As we already know that NAC-domain family transcription factors tend to form homodimers and heterodimers.52, 29, 53, 17 This possible heterodimer of SOG1-NAC103 resulting due to physical interaction of the two proteins may play a crucial role in regulating the expression of various DNA damage response genes under DNA-damaging conditions.
On the other hand, in mammalian cells, DNA damage leads to several molecular events that eventually phosphorylate and activate p53. In the mammalian system, BRCA1 physically interacts with p53 through the C-terminal transactivation domain and enhances its transcriptional activity.60 A plant homologue of mammalian BRCA1 has also been reported from Arabidopsis thaliana.23 In plants, BRCA1 has been shown to physically interact with BARD1.35 Our Protein-protein docking data have shown that the RING domain of BRCA1 can form a thermodynamically stable protein–protein complex with the NAC domain of the SOG1 transcription factor and the interaction has been stabilized by the formation of H-bonds and hydrophobic interactions (Fig. 9E-G; Supplementary Table 5). So, this physical interaction may play a vital role in mediating DNA damage response in plants, although further experimental validation of these predicated interactions is required in this aspect.
Author contribution
KM conceived the idea, performed the in-silico studies, analyzed the data and wrote the manuscript.
Conflict of interest
The Author declares no conflict of interest.
Availability of data and materials
The amino acid sequences used in this study for computational analyses are available at NCBI database.
Funding
Not applicable.
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
CRediT authorship contribution statement
Kalyan Mahapatra: Conceptualization, Data curation, Investigation, Writing – original draft, Writing – review & editing.
Footnotes
Supplementary data to this article can be found online at https://doi.org/10.1016/j.jgeb.2023.100333.
Appendix A. Supplementary data
The following are the Supplementary data to this article:
References
- 1.Abrusán G., Marsh J.A. Alpha helices are more robust to mutations than Beta strands. PLoS Comput Biol. 2016;12:e1005242. doi: 10.1371/journal.pcbi.1005242. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Andel F., Ladurner A.G., Inouye C., et al. Three-dimensional structure of the human TFIID-IIA-IIB complex. Science. 1999;286:2153–2156. doi: 10.1126/science.286.5447.2153. [DOI] [PubMed] [Google Scholar]
- 3.Bailey T.L., Johnson J., Grant C.E., Noble W.S. The MEME suite. Nucleic Acids Res. 2015;43:W39–W49. doi: 10.1093/nar/gkv416. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Benkert P., Tosatto S.C.E., Schomburg D. QMEAN: A comprehensive scoring function for model quality assessment. Proteins. 2008;71:261–277. doi: 10.1002/prot.21715. [DOI] [PubMed] [Google Scholar]
- 5.Bourbousse C., Vegesna N., Law J.A. SOG1 activator and MYB3R repressors regulate a complex DNA damage network in arabidopsis. Proc Natl Acad Sci USA. 2018;115 doi: 10.1073/pnas.1810582115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Brivanlou A.H., Darnell J.E. Signal transduction and the control of gene expression. Science. 2002;295:813–818. doi: 10.1126/science.1066355. [DOI] [PubMed] [Google Scholar]
- 7.Centre for Biotechnology, Rahman MdA, Chaturvedi N, et al. Computational protein structure modeling and analysis of UV-B stress protein in Synechocystis PCC 6803. Bioinformation 2013; 9: 639–644. 10.6026/97320630009639. [DOI] [PMC free article] [PubMed]
- 8.Chen J. The cell-cycle arrest and apoptotic functions of p53 in tumor initiation and progression. Cold Spring Harb Perspect Med. 2016;6 doi: 10.1101/cshperspect.a026104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Colovos C., Yeates T.O. Verification of protein structures: Patterns of nonbonded atomic interactions. Protein Sci. 1993;2:1511–1519. doi: 10.1002/pro.5560020916. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Elengoe A., Naser M., Hamdan S. Modeling and docking studies on novel mutants (K71L and T204V) of the ATPase domain of human heat shock 70 kDa protein 1. IJMS. 2014;15:6797–6814. doi: 10.3390/ijms15046797. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Engelking LR. Protein Structure. In: Textbook of Veterinary Physiological Chemistry. Elsevier, 2015. pp 18–25.
- 12.Ernst H.A., Nina Olsen A., Skriver K., et al. Structure of the conserved domain of ANAC, a member of the NAC family of transcription factors. EMBO Rep. 2004;5:297–303. doi: 10.1038/sj.embor.7400093. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Fernández-Recio J., Totrov M., Abagyan R. Identification of protein-protein interaction sites from docking energy landscapes. J Mol Biol. 2004;335:843–865. doi: 10.1016/j.jmb.2003.10.069. [DOI] [PubMed] [Google Scholar]
- 14.Fontaine F., Overman J., François M. Pharmacological manipulation of transcription factor protein-protein interactions: opportunities and obstacles. Cell Regeneration. 2015;4(4):2. doi: 10.1186/s13619-015-0015-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Gasteiger E., Hoogland C., Gattiker A., et al. In: The Proteomics Protocols Handbook. Walker J.M., editor. Humana Press; Totowa, NJ: 2005. Protein identification and analysis tools on the ExPASy server; pp. 571–607. [Google Scholar]
- 16.Geourjon C., Deléage G. SOPMA: significant improvements in protein secondary structure prediction by consensus prediction from multiple alignments. Bioinformatics. 1995;11:681–684. doi: 10.1093/bioinformatics/11.6.681. [DOI] [PubMed] [Google Scholar]
- 17.Gladman N.P., Marshall R.S., Lee K.-H., Vierstra R.D. The proteasome stress regulon is controlled by a pair of NAC transcription factors in arabidopsis. Plant Cell. 2016;28:1279–1296. doi: 10.1105/tpc.15.01022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Goffová I., Vágnerová R., Peška V., et al. Roles of RAD 51 and RTEL 1 in telomere and rDNA stability in Physcomitrella patens. Plant J tpj. 2019 doi: 10.1111/tpj.14304. [DOI] [PubMed] [Google Scholar]
- 19.Göös H., Kinnunen M., Salokas K., et al. Human transcription factor protein interaction networks. Nat Commun. 2022;13:766. doi: 10.1038/s41467-022-28341-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Hettich J., Gebhardt J.C.M. Transcription factor target site search and gene regulation in a background of unspecific binding sites. J Theor Biol. 2018;454:91–101. doi: 10.1016/j.jtbi.2018.05.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Ikai A. Thermostability and aliphatic index of globular proteins. The Journal of Biochemistry. 1980 doi: 10.1093/oxfordjournals.jbchem.a133168. [DOI] [PubMed] [Google Scholar]
- 22.Kyte J., Doolittle R.F. A simple method for displaying the hydropathic character of a protein. J Mol Biol. 1982;157:105–132. doi: 10.1016/0022-2836(82)90515-0. [DOI] [PubMed] [Google Scholar]
- 23.Lafarge S. Characterization of Arabidopsis thaliana ortholog of the human breast cancer susceptibility gene 1: AtBRCA1, strongly induced by gamma rays. Nucleic Acids Res. 2003;31:1148–1155. doi: 10.1093/nar/gkg202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Li X., Wang W., Wang J., et al. Proteomic analyses reveal distinct chromatin-associated and soluble transcription factor complexes. Mol Syst Biol. 2015;11(775) doi: 10.15252/msb.20145504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Luscombe N.M. Amino acid-base interactions: a three-dimensional analysis of protein-DNA interactions at an atomic level. Nucleic Acids Res. 2001;29:2860–2874. doi: 10.1093/nar/29.13.2860. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Mahapatra K., Roy S. An insight into the folding and stability of Arabidopsis thaliana SOG1 transcription factor under salinity stress in vitro. Biochem Biophys Res Commun. 2019;515:531–537. doi: 10.1016/j.bbrc.2019.05.183. [DOI] [PubMed] [Google Scholar]
- 27.Mahapatra K., Roy S. An insight into the mechanism of DNA damage response in plants- role of SUPPRESSOR OF GAMMA RESPONSE 1: An overview. Mutation Research/fundamental and Molecular Mechanisms of Mutagenesis. 2020;819–820 doi: 10.1016/j.mrfmmm.2020.111689. [DOI] [PubMed] [Google Scholar]
- 28.Mahapatra K., Roy S. SOG1 transcription factor promotes the onset of endoreduplication under salinity stress in arabidopsis. Sci Rep. 2021;11:11659. doi: 10.1038/s41598-021-91293-1. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
- 29.Mitsuda N., Hisabori T., Takeyasu K., Sato M.H. VOZ; isolation and characterization of novel vascular plant transcription factors with a one-zinc finger from Arabidopsis thaliana. Plant Cell Physiol. 2004;45:845–854. doi: 10.1093/pcp/pch101. [DOI] [PubMed] [Google Scholar]
- 30.Ngan C.-H., Hall D.R., Zerbe B., et al. FTSite: high accuracy detection of ligand binding sites on unbound protein structures. Bioinformatics. 2012;28:286–287. doi: 10.1093/bioinformatics/btr651. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Ogita N., Okushima Y., Tokizawa M., et al. Identifying the target genes of SUPPRESSOR OF GAMMA RESPONSE 1, a master transcription factor controlling DNA damage response in arabidopsis. Plant J. 2018;94:439–453. doi: 10.1111/tpj.13866. [DOI] [PubMed] [Google Scholar]
- 32.Phillips T., Hoopes L. Transcription factors and transcriptional control in eukaryotic cells. Nature Education. 2008;1(1):119. [Google Scholar]
- 33.Planas-Iglesias J., Marin-Lopez M.A., Bonet J., et al. iLoops: a protein–protein interaction prediction server based on structural features. Bioinformatics. 2013;29:2360–2362. doi: 10.1093/bioinformatics/btt401. [DOI] [PubMed] [Google Scholar]
- 34.Preuss S.B., Britt A.B. A DNA-damage-induced cell cycle checkpoint in arabidopsis. Genetics. 2003;164:323–334. doi: 10.1093/genetics/164.1.323. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Reidt W., Wurz R., Wanieck K., et al. A homologue of the breast cancer-associated gene BARD1 is involved in DNA repair in plants. EMBO J. 2006;25:4326–4337. doi: 10.1038/sj.emboj.7601313. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Rivera-Reyes R., Kleppa M.-J., Kispert A. Proteomic analysis identifies transcriptional cofactors and homeobox transcription factors as TBX18 binding proteins. PLoS One. 2018;13:e0200964. doi: 10.1371/journal.pone.0200964. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Roy A., Kucukural A., Zhang Y. I-TASSER: a unified platform for automated protein structure and function prediction. Nat Protoc. 2010;5:725–738. doi: 10.1038/nprot.2010.5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Ryu T.H., Go Y.S., Choi S.H., et al. SOG 1-dependent NAC 103 modulates the DNA damage response as a transcriptional regulator in arabidopsis. Plant J. 2019;98:83–96. doi: 10.1111/tpj.14201. [DOI] [PubMed] [Google Scholar]
- 39.Sakamoto A.N., Sakamoto T., Yokota Y., et al. SOG1, a plant-specific master regulator of DNA damage responses, originated from nonvascular land plants. Plant Direct. 2021;5 doi: 10.1002/pld3.370. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Schrodinger L. The PyMOL Molecular Graphics System, Version 1.3r1. 2010.
- 41.Schwede T. SWISS-MODEL: an automated protein homology-modeling server. Nucleic Acids Res. 2003;31:3381–3385. doi: 10.1093/nar/gkg520. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Sippl M.J. Recognition of errors in three-dimensional structures of proteins. Proteins. 1993;17:355–362. doi: 10.1002/prot.340170404. [DOI] [PubMed] [Google Scholar]
- 43.Sippl M.J. Knowledge-based potentials for proteins. Curr Opin Struct Biol. 1995;5:229–235. doi: 10.1016/0959-440X(95)80081-6. [DOI] [PubMed] [Google Scholar]
- 44.Sjogren C.A., Bolaris S.C., Larsen P.B. Aluminum-dependent terminal differentiation of the arabidopsis root tip is mediated through an ATR-, ALT2-, and SOG1-regulated transcriptional response. Plant Cell. 2015;27:2501–2515. doi: 10.1105/tpc.15.00172. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Stewart W.N., Rothwell G.W. Cambridge University Press; 1993. Paleobotany and the evolution of plants (2nd ed) [Google Scholar]
- 46.Trerotola M., Antolini L., Beni L., et al. A deterministic code for transcription factor-DNA recognition through computation of binding interfaces. NAR Genomics and Bioinformatics. 2022;4(lqac008) doi: 10.1093/nargab/lqac008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Vangone A., Rodrigues J.P.G.L.M., Xue L.C., et al. Sense and simplicity in HADDOCK scoring: Lessons from CASP-CAPRI round 1: HADDOCK in CASP-CAPRI round 1. Proteins. 2017;85:417–423. doi: 10.1002/prot.25198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Wang J., Youkharibache P., Marchler-Bauer A., et al. iCn3D: From web-based 3D viewer to structural analysis tool in batch mode. Front Mol Biosci. 2022;9 doi: 10.3389/fmolb.2022.831740. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Waterhouse A., Bertoni M., Bienert S., et al. SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res. 2018;46:W296–W303. doi: 10.1093/nar/gky427. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Waterworth W.M., Drury G.E., Bray C.M., West C.E. Repairing breaks in the plant genome: the importance of keeping it together. New Phytol. 2011;192:805–822. doi: 10.1111/j.1469-8137.2011.03926.x. [DOI] [PubMed] [Google Scholar]
- 51.Wiederstein M., Sippl M.J. ProSA-web: interactive web service for the recognition of errors in three-dimensional structures of proteins. Nucleic Acids Res. 2007;35:W407–W410. doi: 10.1093/nar/gkm290. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Xie Q., Frugis G., Colgan D., Chua N.-H. Arabidopsis NAC1 transduces auxin signal downstream of TIR1 to promote lateral root development. Genes Dev. 2000;14:3024–3036. doi: 10.1101/gad.852200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Yamaguchi M., Kubo M., Fukuda H., Demura T. VASCULAR-RELATED NAC-DOMAIN7 is involved in the differentiation of all types of xylem vessels in arabidopsis roots and shoots. Plant J. 2008;55:652–664. doi: 10.1111/j.1365-313X.2008.03533.x. [DOI] [PubMed] [Google Scholar]
- 54.Yang J., Zhang Y. Protein structure and function prediction using I-TASSER. CP in Bioinformatics. 2015;52 doi: 10.1002/0471250953.bi0508s52. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Yoshiyama K., Conklin P.A., Huefner N.D., Britt A.B. Suppressor of gamma response 1 (SOG1) encodes a putative transcription factor governing multiple responses to DNA damage. Proc Natl Acad Sci USA. 2009;106:12843–12848. doi: 10.1073/pnas.0810304106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Yoshiyama K., Sakaguchi K., Kimura S. DNA damage response in plants: Conserved and variable response compared to animals. Biology. 2013;2:1338–1356. doi: 10.3390/biology2041338. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Yoshiyama K.O. SOG1: a master regulator of the DNA damage response in plants. Genes Genet Syst. 2015;90:209–216. doi: 10.1266/ggs.15-00011. [DOI] [PubMed] [Google Scholar]
- 58.Yoshiyama K.O., Kaminoyama K., Sakamoto T., Kimura S. Increased phosphorylation of ser-gln sites on SUPPRESSOR OF GAMMA RESPONSE1 strengthens the DNA damage RESPONSE in Arabidopsis thaliana. Plant Cell. 2017;29:3255–3268. doi: 10.1105/tpc.17.00267. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Yoshiyama K.O., Kimura S., Maki H., et al. The role of SOG1, a plant-specific transcriptional regulator, in the DNA damage response. Plant Signal Behav. 2014;9:e28889. doi: 10.4161/psb.28889. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Zhang H., Somasundaram K., Peng Y., et al. BRCA1 physically associates with p53 and stimulates its transcriptional activity. Oncogene. 1998;16:1713–1721. doi: 10.1038/sj.onc.1201932. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The amino acid sequences used in this study for computational analyses are available at NCBI database.









