Abstract
The Cancer/Testis Antigens (CTAs) are a group of heterogeneous proteins that are typically expressed in the testis but aberrantly expressed in several types of cancer. Although overexpression of CTAs is frequently associated with advanced disease and poorer prognosis, the significance of this correlation is unclear since the functions of the CTAs in the disease process remain poorly understood. Here, employing a bioinformatics approach, we show that a majority of CTAs are intrinsically disordered proteins (IDPs). IDPs are proteins that, under physiological conditions in vitro, lack rigid 3D structures either along their entire length or in localized regions. Despite the lack of structure, most IDPs can transition from disorder to order upon binding to biological targets and often promote highly promiscuous interactions. IDPs play important roles in transcriptional regulation and signaling via regulatory protein networks and are often associated with dosage sensitivity. Consistent with these observations, we find that several CTAs can bind DNA, and their forced expression appears to increase cell growth implying a potential dosage-sensitive function. Furthermore, the CTAs appear to occupy ‘hub’ positions in protein regulatory networks that typically adopt a ‘scale-free’ power law distribution. Taken together, our data provide a novel perspective on the CTAs implicating them in processing and transducing information in altered physiological states in a dosage-sensitive manner. Identifying the CTAs that occupy hub positions in protein regulatory networks would allow a better understanding of their functions as well as the development of novel therapeutics to treat cancer.
Keywords: Cancer/Testis Antigens, Intrinsically Disordered Proteins, Dosage Sensitivity, Cancer
Introduction
Intrinsically disordered proteins (IDPs) are proteins that, under physiological conditions in vitro, lack rigid 3D structures either along their entire length or in localized regions. Despite the lack of structure, the IDPs appear to play important biological roles in transcriptional regulation and signaling via cellular protein networks [Uversky and Dunker, 2010]. A comprehensive study of protein interaction networks in multiple eukaryotic organisms from yeast to human demonstrated that hub proteins, defined as those that interact with ≥5 partners in a protein interaction network, are significantly more disordered than end proteins, defined as those that interact with far fewer partners [Patil et al., 2010]. Furthermore, a binary classification of hubs and ends into ordered and disordered subclasses showed a significant enrichment of entirely disordered proteins and a significant depletion of entirely ordered proteins in hubs relative to ends [Haynes et al., 2006] underscoring the role of IDPs in signaling. Another interesting feature of the IDPs is their ability to undergo disorder-to-order transitions upon binding to their biological target (coupled folding and binding) in order to perform their function [Tompa and Csermely, 2004]. Structural flexibility and plasticity are believed to represent a major functional advantage for the IDPs enabling them to interact with a broad range of binding partners such as, proteins, nucleic acids and small molecules [Tompa and Csermely, 2004].
Intrinsic disorder also appears to be an important determinant of dosage sensitivity. IDPs are prone to initiate promiscuous molecular interactions when overexpressed suggesting that this is the likely cause of the resulting toxicity/pathology. Indeed, recent studies in model organisms provide compelling evidence supporting this causality [Vavouri et al., 2009]. Interestingly, the same properties are strongly associated with dosage sensitive oncogenes, suggesting that mass action driven molecular interactions may be a frequent cause of cancer [Vavouri et al., 2009]. In fact, numerous IDPs are also associated with several other human diseases [Uversky et al., 2008] underscoring the tight association between intrinsic protein disorder, promiscuity, and dosage sensitivity.
The Cancer/Testis Antigens (CTAs) are a heterogeneous group of proteins that are typically expressed in the testis with little or no expression in most somatic tissues. However, they are aberrantly expressed in several cancers [Scanlan et al., 2004], and recent genetic studies in the fruit fly have demonstrated a causal link between CTA expression and tumorigenesis [Janic et al., 2010]. Based on their chromosomal location, the CTAs can be conveniently divided into two broad groups: The CT-X Antigens located on the X chromosome and non-X CT antigens located on the autosomes. Interestingly, most if not all, CT-X Antigens lack orthologues in lower mammals and are found only in the primates where they constitute several subfamilies of homologous genes organized in discrete clusters along the X chromosome [Stevenson et al., 2007]. However, unlike the non-X CT Antigens, the functions of a majority of the CT-X Antigens are poorly understood although their overexpression is frequently associated with advanced disease and poorer prognosis [Suyama et al., 2010 and cf therein].
Given that intrinsic disorder is an important determinant of dosage sensitivity we asked if the CTAs, particularly the CT-X antigens, are IDPs as a result of their perceived pathological effects due to overexpression in advanced disease. Furthermore, our recent observations that the CT-X antigen, Prostate-Associated Gene Protein 4 (PAGE4), that is upregulated in prostate cancer is an IDP and that, its forced expression results in enhanced cell proliferation suggesting its dosage sensitive potential [Zeng et al., 2011], motivated us to undertake the present study.
Materials and Methods
Disorder predictions in the CTAs were done applying the Foldindex [Prilusky et al., 2005] and RONN [Yang et al., 2005] algorithms and in some cases, metaPrDOS [Ishida and Kinoshita, 2008] was also employed in addition. To discern the effect of helical regions on protein disorder prediction, we compiled data using psiPred (http://bioinf.cs.ucl.ac.uk/psipred/) and JPred (http://www.compbio.dundee.ac.uk/www-jpred/) on a set of CT-X and non-X CTAs selected randomly before and after masking these regions. However, we found no difference in the prediction results presumably due to the paucity of helical regions in the disordered portions and therefore, we did not mask them in any of the analyses presented here. Based on the fraction of the sequence that was predicted to be disordered, we classified the CT-X and non-X CTAs into one of three classes: Highly ordered, (0–10% of the sequence is disordered), moderately disordered, (11%–30% of the sequence is disordered) and highly disordered (31%–100% of the sequence is disordered). To normalize for the varying protein lengths, we calculated the number of sequence motifs per 100 amino acids. PEST motifs were predicted using the epestfind algorithm of the EMBOSS package (http://emboss.bioinformatics.nl/cgi-bin/emboss/epestfind). Only motifs with a threshold PEST score >5 were considered. Ubiquitylation sites were predicted using UbPred [Radivojac et al., 2010]. Only the ubiquitylation sites with a high confidence score (range 0.84 ≥ s ≤ 1.00) were considered. CTAs with percent ubiquitylation having a minimum value of 2, was used as a cutoff. Phosphorylation sites were predicted using KinasePhos 2.0 [Wong et al., 2007] which predicts the location of phosphorylation sites on S, T and Y residues with a prediction specificity of 100%. CTAs with per cent phosphorylation having a minimum value of 2, was used as a cutoff. Acetylation sites were predicted using PAIL [Li et al., 2006] which predicts the acetylation sites on lysine residues with a high stringency and threshold score ≥0.5. Again, CTAs with percent acetylation having a minimum value of 3 was used as a cutoff. The probability to bind DNA was predicted using DBSPred [Ahmad et al., 2004] with a sensitivity setting of ‘strict’. Arginine methylation sites were predicted using MEMO [Chen et al., 2006] and sumoylation sites were predicted using SUMOsp 2.0 [Ren et al., 2009]. Protein–protein interactions were predicted using the STRING interaction database [Jensen et al., 2009] at medium confidence setting (0.4-0.7) with no more than 10 interactions. The statistical analyses used to estimate significance were, Wilcoxon rank-sum, two sample T test, and Chi square test as described in the text. The TATA box in the CTA promoter regions and specific sequence motifs in the mRNAs representing various polyadenylation and stability signals were searched by writing PERL scripts for each motif. Data in the CIRCOS plots were displayed by employing specific PERL scripts.
Results
A catalogue of CTAs
As a first step in this direction, we constructed a comprehensive catalog of CTAs from the literature as well as the Cancer/Testis Antigen database (http://www.cta.lncc.br) [Almeida et al., 2009]. We identified 228 unique CTAs (Supplemental Table 1) and mapped them to their respective chromosomal locations. The CIRCOS plot in Fig. 1 provides a detailed and comprehensive visual image of the location and density of the CTAs on each chromosome. Of the 228 CTAs, 120 CTAs (52%) mapped to the X chromosome (the CT-X antigens) while the remaining (non-X CT antigens), were distributed on the 22 autosomes and the Y chromosome. Among the autosomes, there are 10 CTAs (0.3 CTAs/100 genes) on chromosome 1 the most gene-rich chromosome. In contrast, chromosome 21 with only 425 genes has 1.6 CTAs/100 genes – a 5-fold increase over chromosome 1 – making it the most CTA-dense autosome while chromosome 7 with 0.06 CTAs/100 genes is the most CTA-poor chromosome. Among the sex chromosomes, while only 1 CTA is present on the Y chromosome, there are 7.5 CTAs/100 genes on the X chromosome – a 25-fold increase when compared to chromosome 1 but a 125-fold increase over chromosome 7, the most CTA-poor chromosome (Fig. 1 and Supplemental Table 2).
Figure 1. CIRCOS plot showing the organization and disorder content of the Cancer/Testis Antigens.

The following information is presented going from the outside to the inside of the CIRCOS circles: Text Track - shows the names of all the CTAs. The highly ordered Cancer/Testis Antigens (CTAs) (0-10% disorder) are indicated in red. The moderately disordered Cancer/Testis Antigens (CTAs) (11-30% disorder) are indicated in green and highly disordered CTAs (31-100%) are shown in blue. Each track is drawn in order of its position on the respective chromosome. Scale Track – scale is reduced to 1e-6 and is shown in multiples of 10. Ideogram Track – colored track with numbers of the chromosomes. Scatter Plot – the CTAs are represented as solid circles based on their position on the chromosomes and colored to correspond with the chromosome they belong. The Track is divided into 13 lines and 12 spaces between them to show position of chromosomes 1-22, and the X and Y (innermost line indicates 0). Highlight Track – the transparent colored track showing the number of CT genes and the total number of genes on each chromosome.
A majority of the CTAs are intrinsically disordered proteins
We applied two different algorithms namely, FoldIndex [Prilusky et al., 2005] and Regional Order Neural Network (RONN) [Yang et al., 2005], to predict protein disorder. FoldIndex implements an algorithm to make a calculation based on average net charge and average hydrophobicity of the sequence to predict whether a given sequence is ordered or disordered. In contrast, RONN uses a neural network technique to predict whether any given residue is likely to be ordered or disordered in the context of the surrounding amino acid sequence. Although the physical properties of amino acids are the fundamental basis in determining disorder, the neural network used in RONN avoids explicit parameterization of amino acids in such a manner. Instead it uses non-gapped sequence alignment to measure ‘distances’ between windows of sequence for the unknown protein and windowed sequences for known folded proteins derived from the Protein Database (PDB). Therefore, FoldIndex and RONN represent two fundamentally different approaches to disorder prediction and while both methods have their strengths and weaknesses, they perform well when compared to other disorder prediction methods. Nonetheless, FoldIndex performs particularly well for fully ordered or fully disordered sequences, while RONN is more successful in identifying partially disordered sequences. Thus, employing these two prediction models, we classified the CTAs either separately or collectively into three groups based on the extent of disorder: highly ordered, moderately disordered and highly disordered (see Materials and Methods).
As shown in Fig. 2A, a vast majority of the CTAs (>90%) belong to the intrinsically disordered class of proteins regardless of the prediction method (χ2 = NS). When examined separately, both prediction methods (Foldindex, Fig. 2B, and RONN, Fig. 2C) demonstrated that the CT-X rather than the non-X CT antigens were significantly more disordered (χ2: p<0.0001). However, in either case, the majority of both the CT-X and non-X CT antigens were in the highly disordered group. The details of the disorder predictions for the CTAs in each group both by FoldIndex and RONN are presented in Supplemental Tables 3-6, respectively.
Figure 2. Predicting disorder in the Cancer/Testis Antigens.

Protein disorder was determined in all the Cancer/Testis Antigens (CTAs) using both Foldindex and RONN (A). The CTAs were divided into 3 groups based on the extent of disorder: highly ordered (0-10% disorder), moderately disordered (11-30% disorder), and highly disordered (31-100% disorder). Protein disorder prediction was also done separately on the CT-X and non-X groups using Foldindex (B) and RONN (C). Standard errors were calculated and all reported differences were found to be statistically significant (Chi square test: NS for A, p<0.0001 for B & C). NS = Not significant.
Despite the strong agreement between the prediction methods in the vast majority of the cases we observed some differences either in the extent of disorder or the regions of disorder in a few instances. In such cases we used an additional prediction method namely, metaPrDOS. metaPrDOS which uses a meta approach, does not predict disordered regions from amino acid sequence directly but predicts them by integrating the results of eight distinct prediction methods [Ishida and Kinoshita, 2008]. However, the results predicted by metaPrDOS were similar to those predicted separately by Foldindex or RONN and therefore, we applied Foldindex to predict disorder in all subsequent analyses presented here. However, data were also obtained by subjecting the CTAs to similar analyses by RONN and are presented in the Supplemental Online Material.
Regulation of intracellular CTA concentrations
Given that the altered abundance of several IDPs is associated with perturbed cellular signaling that may lead to pathological conditions such as cancer, it is important to understand how the cellular concentrations of IDPs are precisely regulated. Indeed, recent studies on the yeast and human proteome have revealed that there is an evolutionarily conserved tight control of synthesis and clearance of most IDPs [Edwards et al., 2009; Gsponer et al., 2008]. We therefore examined the CTAs at the genomic, transcript and protein levels to discern how their intracellular concentrations may be regulated and whether the regulation correlates with protein disorder.
CTA concentrations may not be regulated at the transcript level
We first examined genomic sequences encoding the CTAs for the presence of a TATA box in the promoter region. However, a preliminary analysis suggested that contrary to previous observations [Gsponer et al., 2008], there did not appear to be a correlation between the presence/absence of the TATA box and protein disorder and hence, we did not undertake a detailed analysis. Next we looked at the transcript level and examined the sequences associated with mRNA stability/turnover. For the presence of polyadenylation signals (PAS), we searched for the following motifs that have been reported in the literature: 5’AGUAAA3’ (PAS 1); 5’AAUAAA/AUUAAA/AAUAAA3’ (PAS 2); 5’UAUAAA3’ (PAS 3); 5’CAUAAA3’ (PAS 4); 5’GAUAAA3’(PAS 5) [Beaudoing et al., 2000]. For RNA stability we searched for multiple signals including, PUM-binding sites (5’UGUACAUA/UAUA/AAUA3’) [Galgano et al., 2008], U-rich motif(s) (URM) (5’UUUUAAA/UUUGUUU3’) [Bolognani et al., 2010], the stability sequence (5’UAUUUAU3’) [Wiklund et al., 2002], cytoplasmic polyadenylation element (CPE) (5’UUUUUAU3’) [Morgan et al., 2010], and the heptanucleotide AU-rich element ARE motif (5’UAUUUAU3’) [Barreau et al., 2005], both in the entire transcript as well as in only the 3’ untranslated regions. Human PUM1 and PUM2 are members of the Puf family an evolutionarily conserved family of RNA-binding proteins related to the Pumilio proteins of Drosophila and the fem-3 mRNA binding factor proteins of C. elegans. The encoded proteins contain a sequence-specific RNA binding domain and serve as translational regulators of specific mRNAs by binding to their 3’ untranslated regions [Spassov and Jurecic, 2002]. However, in contrast to previous observations [Gsponer et al., 2008], we did not observe any significant correlation between the presence/absence of these motifs in the mRNA and the extent of disorder in the CTAs encoded by them (Supplemental Tables 7-46).
CTA concentrations may be regulated at the protein level
Next, we examined the CTA protein sequences to discern sequence motifs characteristic of protein turnover/stability. The PEST sequence is thought to be a hallmark of protein degradation and stability [Rechsteiner and Rogers, 1996]. Employing the epestfind algorithm (http://emboss.bioinformatics.nl/cgi-bin/emboss/epestfind), we observed a significant increase in the number of PEST motifs that was directly proportional to the amount of disorder (χ2: p<0.0006) (Fig. 3A). Separating the CTAs into CT-X and non-X also showed a similar trend; there was a significant correlation between the number of PEST motifs and extent of disorder (Wilcoxon Rank Sum Test: p=0.0231 and p=0.0164, respectively) (Fig. 3B & C). Overall, however, the CT-X antigens appeared to have a significantly higher fraction of proteins with the PEST motif than did the non-X CTAs (T test: p<0.02) (Fig. 3D). We also performed similar analyses on the CTAs correlated with protein disorder predicted using RONN. Again, the results were comparable to those obtained with Foldindex (Supplemental Fig. 2A-D). The details of the PEST analyses by both disorder prediction methods are presented in Supplemental Tables 47-50.
Figure 3. Correlation between presence of PEST sequence and extent of disorder in the Cancer/Testis Antigens.

Percent Cancer/Testis Antigens (CTAs) with PEST sequence/100 amino acids (A). Percent CTAs with PEST sequences seen in the 3 disordered groups of CT-X (B), and non-X CT Antigens (C), respectively. CTAs were segregated into CT-X and non-X CT Antigens and percent CTAs with PEST sequences were plotted (D). The Foldindex algorithm was applied to group the CTAs. Standard errors were calculated and all reported differences were found to be statistically significant (Chi square test: p<0.001 for A, Wilcoxon Rank Sum test (RS): p<0.05 for B & C and T test: p<0.05 for D).
Ubiquitylation is another covalent protein modification that is frequently associated with proteasome-mediated degradation [Welchman et al., 2005]. By employing the UbPred algorithm [Radivojac et al., 2010], we observed a significant correlation between the occurrence of the consensus ubiquitylation site and CTA disorder content (χ2: p=0.001) (Fig. 4A). When analyzed separately, both in CT-X and non-X CTAs, there was a significant association between the presence of the ubiquitylation site and extent of disorder (Wilcoxon Rank Sum Test: p<0.0001) (Fig. 4B & C). However, there was no difference between the two groups, CT-X and non-X CTAs, when considered in the absence of disorder content (T test: NS) (Fig. 4D). We also performed similar analyses on the CTAs correlated with protein disorder predicted using RONN. Again, the results were comparable to those obtained with Foldindex (Supplemental Fig. 3A-D). The details of the ubiquitylation analyses by both disorder prediction methods are presented in Supplemental Tables 51-54. Considered together, these data on the messenger RNA and protein turnover/stability suggested that unlike most IDPs [Edwards et al., 2009; Gsponer et al., 2008], the CTAs do not appear to be regulated at the mRNA synthesis or stability level but instead, appear to be regulated at the protein level.
Figure 4. Correlation between presence of ubiquitylation sites and disorder in the Cancer/Testis Antigens.

The percent CTAs with ≥ 2 ubiquitylation sites/100 amino acids are plotted as a function of disorder calculated by Foldindex (A). CT-X and non-X CT Antigens were then plotted separately with respect to disorder (B & C). CTAs were segregated into CT-X and non-X CT Antigens and percent CTAs with percent ubiquitylation sites ≥2 were plotted (D). Standard errors were calculated and all reported differences were found to be statistically significant (Chi square test: p<0.001 for A, Wilcoxon Rank Sum test (RS): p<0.001 for B & C and T test: Not significant for D).
Disordered CTAs are significantly more likely to be modified by phosphorylation and acetylation
Covalent modification by phosphorylation is also frequently observed in IDPs [Iakoucheva et al., 2004] and is thought to play a critical role in their functions [Galea et al., 2008]. Thus, employing the KinasePhos 2.0 algorithm [Wong et al., 2007] that predicts phosphorylation at S, T and Y residues we examined the CTAs for the presence of the respective consensus motifs. As shown in (Fig. 5A), although there was no difference between the highly ordered and moderately disordered CTAs, the highly disordered CTAs were significantly enriched for these motifs (χ2: p=0.0044). In both groups, the highly disordered CTAs were significantly enriched for these motifs (Wilcoxon Rank Sum Test: p=0.0166 and p=3×10-6, respectively) (Fig. 5B-C). Between the two groups however, the CT-X antigens appeared to have significantly more phosphorylation sites than the non-X CT (T test: p<0.02) (Fig. 5D). We also performed similar analyses on the CTAs correlated with protein disorder predicted using RONN. Again, the results were similar to those obtained with Foldindex (Supplemental Fig. 4A-D). The details of the phosphorylation analyses by both disorder prediction methods are presented in Supplemental Tables 55-58.
Figure 5. Correlation between the presence of phosphorylation sites and disorder in the Cancer/Testis Antigens.

Percent Cancer/Testis Antigens (CTAs) with ≥ 2 phosphorylation sites/100 amino acids is plotted with respect to disorder (A). Percent CT-X (B) and non-X CT Antigens (C) with ≥ 2 phosphorylation sites/100 amino acids was plotted with respect to disorder, respectively. Phosphorylation sites were predicted for both CT-X and non-X CT Antigens (D). The Foldindex algorithm was applied to group the CTAs. Standard errors were calculated and all reported differences were found to be statistically significant (Chi square test: p<0.01 for A, Wilcoxon Rank Sum test (RS): p<0.05, p <0.001 for B & C, respectively, and T test: p <0.05 for D).
Protein acetylation at lysine residues that plays an important role in various biological processes [Arif et al., 2010], also appears to be important in modulating the functions of many IDPs [Hansen, 2006; van Dieck et al., 2009]. Thus, we examined the CTAs for potential lysine acetylation employing the PAIL algorithm [Li et al., 2006]. As shown in (Fig.6A), we observed a significant correlation between CTA protein disorder and the presence of acetylated lysines (χ2: p=0.0067). In both groups, the highly disordered CTAs were significantly enriched for these motifs (Wilcoxon Rank Sum Test: p=0.0055 and p<0.0001, respectively) (Fig. 6B & C). Between the two groups however, the CT-X antigens appeared to have significantly less acetylation sites than the non-X CT Antigens (T test: p<0.03) (Fig. 6D). The details of the acetylation analyses by both disorder prediction methods are presented in Supplemental Tables 59-62. The results obtained using RONN are shown in Supplemental Fig. 5A-D. Considered together, covalent modifications of the CTAs by phosphorylation and acetylation may play critical roles in modulating their interactions by altering the local physicochemical properties of the intrinsically disordered CT proteins/regions.
Figure 6. Correlation between presence of acetylation sites and disorder in the Cancer/Testis Antigens.

The percent Cancer/Testis Antigens (CTAs) with ≥ 3 acetylation sites/100 amino acids is plotted with respect to disorder (A). Percent CT-X (B) and non-X CT Antigens (C) with ≥ 3 acetylation sites/100 amino acids was plotted with respect to disorder. Acetylation sites were predicted for both CT-X and non-X CT Antigens (D). The Foldindex algorithm was applied to group the CTAs. Standard errors were calculated and all reported differences were found to be statistically significant (Chi square test: p<0.01 for A, Wilcoxon Rank Sum test (RS): p<0.05, p <0.001 for B & C, respectively and T test: p <0.05 for D).
CTAs lack modifications by arginine methylation and sumoylation
Protein methylation particularly, lysine methylation, is frequently observed in many organisms. Thus, major attention has been focused on lysine methylation because of its role in chromatin remodeling and transcriptional regulation, emerging evidence suggests that arginine methylation may also play an important role in many physiological processes such as signal transduction, mRNA splicing, transcriptional control, DNA repair, and protein translocation [Bedford and Clarke, 2009]. Furthermore, since the covalent marking of proteins by arginine methylation can promote their recognition by binding partners or can modulate their biological activity it was of interest to interrogate the CTAs, many of which are implicated in similar functions, for arginine methylation. To this end, we applied the algorithm MEMO [Chen et al., 2006] that identifies specific arginine residues that are likely to get methylated by protein arginine methyl transferase (PRMT). Indeed, the program predicted several arginine residues as highly likely to be methylated by PRMT. However, we did not observe any significant difference in the extent of arginine methylation and protein disorder (Supplemental Tables 63-66).
SUMOylation is a post-translational modification that is involved in various cellular processes, such as cell cycle regulation, gene transcription, differentiation, cellular localization apoptosis, protein stability, response to stress, and progression through the cell cycle [Hannoun et al., 2010; Mooney et al., 2010]. Since a majority of CTAs are IDPs and therefore participate in many of these physiological processes, we asked if there is a correlation between CTA disorder and SUMOylation employing the SUMOsp2.0 algorithm [Ren et al., 2009]. However, we did not observe any significant correlation (Supplemental Tables 67-70).
Correlation between DNA-binding probability and protein disorder
Recently, Liu et al presented a quantitative theory predicting the role of intrinsic disorder in protein structure and function by applying thermodynamic models of protein interactions in which IDPs are characterized by positive folding free energies [Liu et al., 2009]. The authors used the Gene Ontology classifications “protein binding”, “catalytic activity” and “transcription regulator activity”, and performed genome-wide surveys of both the amount of disorder and the binding affinities in these functional classes for prokaryotic and eukaryotic genomes. Specifically, without assuming any a priori structure-function relationship, their theory predicted that both catalytic and low-affinity binding (Kd ≥ 10-7 M) proteins prefer ordered structures, whereas only high-affinity binding proteins (found mostly in eukaryotes) can tolerate disorder. Furthermore, of particular relevance to both transcription and signal transduction, the theory also explained how increasing disorder can tune the binding affinity to maximize the specificity of promiscuous interactions [Liu et al., 2009].
Thus, we asked if the CTAs may also be associated with transcriptional regulation and hence, bind DNA in light of their disordered structure. To this end we employed the DBS-Pred algorithm [Ahmad et al., 2004] to predict the probability of DNA binding at two different stringencies. As shown in (Fig. 7A), using a cutoff of 50%, we observed a significant correlation between DNA binding prediction probability and extent of protein disorder (χ2: p=0.0001). Similar results were obtained when the CTAs were divided into the CT-X and non-X groups (Fig. 7B & C, respectively). Increasing the stringency (>90% prediction probability) also yielded similar results; a majority of the highly disordered CTAs were predicted to bind DNA with virtually none in the moderately disordered group (Fig. 7D). As expected, there were significantly fewer CTAs with DNA binding probability when the stringency was increased. However, the drop was more pronounced in the non-X group than in the CT-X group (Fig. 7F & E, respectively). Consistent with their potential DNA-binding function, several studies have demonstrated that many of the CT-X antigens are localized in the nucleus [Bai et al., 2005; Westbrook et al., 2004; Zhao et al., 2011]. Taken together, these data suggested that indeed, the CTAs are likely to be involved in transcriptional regulation or other processes such as DNA damage/repair or chromatin remodeling for example, that involve DNA. The details of the DBSPred analyses are presented in Supplemental Tables 71-74.
Figure 7. Correlation between DNA binding prediction probability and disorder in the Cancer/Testis Antigens.

Percent Cancer/Testis Antigens (CTAs) with probability of binding DNA was determined applying DBS-PRED with ≥ 50% probability (A-C) or ≥ 90% probability (D-F) of binding DNA without grouping (A & D) or with grouping the CTAs into CT-X (B & E) and non-X CTAs (C & F), respectively. The Foldindex algorithm was applied to group the CTAs. Standard errors were calculated and all reported differences were found to be statistically significant (Chi square test: p<0.001 for A & D, Wilcoxon Rank Sum test (RS): p<0.001 for B, C, E and F).
CTAs occupy hub positions in protein-protein interaction networks
As mentioned earlier, IDPs typically occupy hub positions in a protein interaction network [Patil et al., 2010]. To determine if indeed this was also the case with the CTAs that we predicted to be disordered, we selected representative members from the CT-X antigens with as yet unknown functions. We determined their putative interactions by querying STRING, a database dedicated to protein-protein interactions that include both physical and functional interactions through the so-called ‘genomic context’ or ‘nonhomology-based’ inference methods [Jensen et al., 2009]. As shown in Fig. 8A-F, most CT-X antigens occupy a hub position. Consistent with the propensity of IDPs to function in transcriptional regulation and/or cellular signaling, the data suggest, but do not necessarily prove, that the CT-X antigens may also fulfill such roles in the cell. Furthermore, many of the CTAs in these networks have previously been demonstrated to participate in transcriptional regulation making our conclusion more tenable.
Figure 8. Protein-protein interactions involving CT-X antigens.

Protein-protein interactions were derived by querying the STRING database. CT-X Antigens with disorder content ranging from 50-100% were randomly selected. As expected, each of the input CT-X antigens occupied hub positions in the network. In 5 of the 6 cases, the input CT-X antigens preferentially interact with other CTAs. (A) MAGEA11 = 49.6% disorder and 30.6% DNA-binding probability. (B) CSAG2 = 52.7% disorder and 58.5% DNA-binding probability. (C) SSX5 = 85% disorder and 98.6% DNA-binding probability. (D) LUZP4 = 91% disorder and 97.4% DNA-binding probability. (E) GAGE1 = 100% disorder and 96.5% DNA-binding probability. (F) PAGE4 = 100% disorder and 99.1% DNA-binding probability.
CTAs and dosage sensitivity
To discern a potential causal link between aberrant CTA expression (increase in concentration) and dosage sensitivity (defined here as increased cell growth phenotype), we examined data from the literature. We compiled data on 41 experiments reporting either siRNA-mediated silencing or overexpression of specific CTAs. As expected, silencing gene expression in cells overexpressing specific CTAs resulted in decreased cell growth, while their forced expression in cells lacking expression, increased growth of the transfected cells (Supplemental Table 75). Together, these independent experiments on a variety of CT-X and non-X CT antigens provide good evidence supporting causality between CTA overexpression and dosage sensitivity in cancer.
Discussion
Many proteins in living cells appear to be involved in the transfer and processing of information. Such proteins are functionally linked via networks to form biochemical ‘circuits’ that perform a variety of simple computational tasks including information amplification, integration, and storage [Bray, 1995]. Emerging evidence applying network theory suggests that the architecture of such networks is not random but instead is ‘scale-free’ with most proteins representing nodes having only a few connections and a relatively fewer proteins occupying ‘hubs’ with tens, hundreds or more links [Almaas E, 2007; Dunker et al., 2005]. Scale-free networks are highly dynamic and grow incrementally. Interestingly, when ‘deciding’ where to establish a link, a new node ‘prefers’ an existing node that already has many connections (hub) over one with fewer links. These two basic mechanisms, growth and preferential attachment, will eventually lead to the system being dominated by hubs.
But what structural and functional attributes of a protein makes it ‘desirable’ for recruitment to a hub position so that it can interact with a large number of diverse targets? A resounding answer appears to be an IDP because of the unique thermodynamic advantage IDPs posses by existing as an ensemble of very different conformations in fast exchange [Uversky, 2002], and their capability to adapt to new demands. Furthermore, because they are typically dosage sensitive, IDPs are more likely to participate in a large number of promiscuous interactions when overexpressed, simply as a consequence of mass action [Marcotte and Tsechansky, 2009; Vavouri et al., 2009].
A hallmark of such inhomogeneous scale-free networks is their resilience. Thus for example, in yeast, although proteins with five or fewer links constitute about 93% of the total number of proteins, only about 21% of them are essential. In contrast, only about 0.7% of the yeast proteins with known phenotypic profiles have more than 15 links, but single deletion of as many as 62% proves lethal implying that highly connected proteins with a central role in the network’s architecture are three times more likely to be essential than proteins with only a small number of links to other proteins [Jeong et al., 2001]. A take home lesson from these observations in yeast is that similar scale-free networks maybe operational in cancer making the disease so resilient. Perhaps our failure to combat the disease in spite of decades of intense research and 40 years of declaring ‘war’ against cancer maybe due to that fact that we are targeting common nodes rather than the critical hubs.
Taken together, our data suggest that the CTAs by occupying hub positions in protein networks could create new nodes with novel functions leading to the observed pathological phenotype in the absence of genetic changes. Further, the data provide a novel perspective on the CTAs implicating them in processing and transducing information in altered physiological states in a dosage sensitive manner. Identifying CTAs that occupy hub positions in protein regulatory networks would allow a better understanding of their functions as well as the development of novel therapeutics to treat cancer.
Supplementary Material
The following information is presented going from the outside to the inside of the CIRCOS circles: Text Track - shows the names of all the CTAs. The highly ordered Cancer/Testis Antigens (CTAs) (0-10% disorder) are indicated in red. The moderately disordered Cancer/Testis Antigens (CTAs) (11-30% disorder) are indicated in green and highly disordered CTAs (31-100%) are shown in blue. A ‘mouse over’ function reveals the following information: name of CTA, chromosome, start and end position of the CTA at nucleotide level, and percent disorder. Each track is drawn in order of its position on the respective chromosome. Scale Track – scale is reduced to 1e-6 and is shown in multiples of 10. Ideogram Track – colored track with numbers of the chromosomes. Mouse over gives the following information: chromosome and its size in megabases. Scatter Plot – the CTAs are represented as solid circles based on their position on the chromosomes and colored to correspond with the chromosome they belong. The Track is divided into 13 lines and 12 spaces between them to show position of chromosomes 1-22, and the X and Y (innermost line indicates 0). Mouse over shows the start and end positions of each CTA on the respective chromosome. Highlight Track – the transparent colored track showing the number of CT genes and the total number of genes on each chromosome.
Percent Cancer/Testis Antigens (CTAs) with PEST sequence/100 amino acids (A). Percent CTAs with PEST sequences seen in the 3 disordered groups of non-X (C), and CT-X Antigens (B), respectively. CTAs segregated into CT-X and non-X CT Antigens (D). The RONN algorithm was applied to group the CTAs. Standard errors were calculated and all reported differences were found to be statistically significant (Chi square test: p<0.001 for A, Wilcoxon Rank Sum test (RS): p<0.05 for B & C and T test: p<0.0001 for D).
The percent CTAs with ≥ 2 ubiquitylation sites/100 amino acids are plotted as a function of disorder calculated by RONN (A). CT-X and non-X CT Antigens were then plotted separately with respect to disorder (B & C). The CTAs were segregated into CT-X and non-X CT Antigens (D) Standard errors were calculated and all reported differences were found to be statistically significant except when we considered the non-X or CT-X status (D) (Chi square test: p<0.001 for A, Wilcoxon Rank Sum test (RS): p<0.001 for B & C and T test: NS for D. NS = Not significant).
Percent Cancer/Testis Antigens (CTAs) with ≥ 2 phosphorylation sites/100 amino acids is plotted with respect to disorder (A). Percent CT-X (B) and non-X CT Antigens (C) with ≥ 2 phosphorylation sites/100 amino acids was plotted with respect to disorder, respectively. Phosphorylation sites were predicted for both CT-X and non-X CT Antigens (D).The RONN algorithm was applied to group the CTAs. Standard errors were calculated and all reported differences were found to be statistically significant (Chi square test: p<0.05 for A, and Wilcoxon Rank Sum test (RS): p<0.05, p <0.001 for B & C, respectively and T test: p <0.05 for D).
The percent Cancer/Testis Antigens (CTAs) with ≥ 3 acetylation sites/100 amino acids is plotted with respect to disorder (A). Percent CT-X (B) and non-X CT Antigens (C) with ≥ 3 acetylation sites/100 amino acids was plotted with respect to disorder. Acetylation sites were predicted for both CT-X and non-X CT Antigens (D). The RONN algorithm was applied to group the CTAs. Standard errors were calculated and all reported differences were found to be statistically significant (Chi square test: NS for A, and Wilcoxon Rank Sum test (RS): NS, p <0.01 for B & C, respectively and T test: p <0.05 for D. NS = Not significant).
Acknowledgments
The authors would like to thank Dr Amita Behal for her help with the bioinformatics analyses. SMM is supported by an American Urological Association Foundation Research Scholarship. This work was supported by NCI SPORE Grant 2P50CA058236-16, the Patrick C Walsh Prostate Cancer Research Fund (PK), a NIDDK O’Brien Grant P50DK082998 and PSOC Grant NCI U54 CA143803 (RGH).
Abbreviations
- CTAs
Cancer/Testis Antigens
- CT-X antigens
Cancer/Testis Antigens on X chromosome
- non-X CT antigens
Cancer/Testis Antigens not on X chromosome
- IDPs
Intrinsically Disordered Proteins
- RONN
Regional Order Neural Network
- PRMT
protein arginine methyl transferase
References
- Ahmad S, Gromiha MM, Sarai A. Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information. Bioinformatics. 2004;20:477–86. doi: 10.1093/bioinformatics/btg432. [DOI] [PubMed] [Google Scholar]
- Almaas E VAaBA. Complex systems and Interdisciplinary Science. Vol. 3. World Scientific publishing Co. Pte. Ltd.; 2007. pp. 1–20. [Google Scholar]
- Almeida LG, Sakabe NJ, deOliveira AR, Silva MC, Mundstein AS, Cohen T, Chen YT, Chua R, Gurung S, Gnjatic S, Jungbluth AA, Caballero OL, Bairoch A, Kiesler E, White SL, Simpson AJ, Old LJ, Camargo AA, Vasconcelos AT. CTdatabase: a knowledge-base of high-throughput and curated data on cancer-testis antigens. Nucleic Acids Res. 2009;37:D816–9. doi: 10.1093/nar/gkn673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Arif M, Senapati P, Shandilya J, Kundu TK. Protein lysine acetylation in cellular function and its role in cancer manifestation. Biochim Biophys Acta. 2010;1799:702–16. doi: 10.1016/j.bbagrm.2010.10.002. [DOI] [PubMed] [Google Scholar]
- Bai S, He B, Wilson EM. Melanoma antigen gene protein MAGE-11 regul ates androgen receptor function by modulating the interdomain interaction. Mol Cell Biol. 2005;25:1238–57. doi: 10.1128/MCB.25.4.1238-1257.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Barreau C, Paillard L, Osborne HB. AU-rich elements and associated factors: are there unifying principles? Nucleic Acids Res. 2005;33:7138–50. doi: 10.1093/nar/gki1012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Beaudoing E, Freier S, Wyatt JR, Claverie JM, Gautheret D. Patterns of variant polyadenylation signal usage in human genes. Genome Res. 2000;10:1001–10. doi: 10.1101/gr.10.7.1001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bedford MT, Clarke SG. Protein arginine methylation in mammals: who, what, and why. Mol Cell. 2009;33:1–13. doi: 10.1016/j.molcel.2008.12.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bolognani F, Contente-Cuomo T, Perrone-Bizzozero NI. Novel recognition motifs and biological functions of the RNA-binding protein HuD revealed by genome-wide identification of its targets. Nucleic Acids Res. 2010;38:117–30. doi: 10.1093/nar/gkp863. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bray D. Protein molecules as computational elements in living cells. Nature. 1995;376:307–12. doi: 10.1038/376307a0. [DOI] [PubMed] [Google Scholar]
- Chen H, Xue Y, Huang N, Yao X, Sun Z. MeMo: a web tool for prediction of protein methylation modifications. Nucleic Acids Res. 2006;34:W249–53. doi: 10.1093/nar/gkl233. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dunker AK, Cortese MS, Romero P, Iakoucheva LM, Uversky VN. Flexible nets. The roles of intrinsic disorder in protein interaction networks. FEBS J. 2005;272:5129–48. doi: 10.1111/j.1742-4658.2005.04948.x. [DOI] [PubMed] [Google Scholar]
- Edwards YJ, Lobley AE, Pentony MM, Jones DT. Insights into the regulation of intrinsically disordered proteins in the human proteome by analyzing sequence and gene expression data. Genome Biol. 2009;10:R50. doi: 10.1186/gb-2009-10-5-r50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Galea CA, Nourse A, Wang Y, Sivakolundu SG, Heller WT, Kriwacki RW. Role of intrinsic flexibility in signal transduction mediated by the cell cycle regulator, p27 Kip1. J Mol Biol. 2008;376:827–38. doi: 10.1016/j.jmb.2007.12.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Galgano A, Forrer M, Jaskiewicz L, Kanitz A, Zavolan M, Gerber AP. Comparative analysis of mRNA targets for human PUF-family proteins suggests extensive interaction with the miRNA regulatory system. PLoS One. 2008;3:e3164. doi: 10.1371/journal.pone.0003164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gsponer J, Futschik ME, Teichmann SA, Babu MM. Tight regulation of unstructured proteins: from transcript synthesis to protein degradation. Science. 2008;322:1365–8. doi: 10.1126/science.1163581. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hannoun Z, Greenhough S, Jaffray E, Hay RT, Hay DC. Post-translational modification by SUMO. Toxicology. 2010;278:288–93. doi: 10.1016/j.tox.2010.07.013. [DOI] [PubMed] [Google Scholar]
- Hansen JC. Linking genome structure and function through specific histone acetylation. ACS Chem Biol. 2006;1:69–72. doi: 10.1021/cb6000894. [DOI] [PubMed] [Google Scholar]
- Haynes C, Oldfield CJ, Ji F, Klitgord N, Cusick ME, Radivojac P, Uversky VN, Vidal M, Iakoucheva LM. Intrinsic disorder is a common feature of hub proteins from four eukaryotic interactomes. PLoS Comput Biol. 2006;2:e100. doi: 10.1371/journal.pcbi.0020100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Iakoucheva LM, Radivojac P, Brown CJ, O’Connor TR, Sikes JG, Obradovic Z, Dunker AK. The importance of intrinsic disorder for protein phosphorylation. Nucleic Acids Res. 2004;32:1037–49. doi: 10.1093/nar/gkh253. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ishida T, Kinoshita K. Prediction of disordered regions in proteins based on the meta approach. Bioinformatics. 2008;24:1344–8. doi: 10.1093/bioinformatics/btn195. [DOI] [PubMed] [Google Scholar]
- Janic A, Mendizabal L, Llamazares S, Rossell D, Gonzalez C. Ectopic expression of germline genes drives malignant brain tumor growth in Drosophila. Science. 2010;330:1824–7. doi: 10.1126/science.1195481. [DOI] [PubMed] [Google Scholar]
- Jensen LJ, Kuhn M, Stark M, Chaffron S, Creevey C, Muller J, Doerks T, Julien P, Roth A, Simonovic M, Bork P, von Mering C. STRING 8--a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res. 2009;37:D412–6. doi: 10.1093/nar/gkn760. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jeong H, Mason SP, Barabasi AL, Oltvai ZN. Lethality and centrality in protein networks. Nature. 2001;411:41–2. doi: 10.1038/35075138. [DOI] [PubMed] [Google Scholar]
- Li A, Xue Y, Jin C, Wang M, Yao X. Prediction of Nepsilon-acetylation on internal lysines implemented in Bayesian Discriminant Method. Biochem Biophys Res Commun. 2006;350:818–24. doi: 10.1016/j.bbrc.2006.08.199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu J, Faeder JR, Camacho CJ. Toward a quantitative theory of intrinsically disordered proteins and their function. Proc Natl Acad Sci U S A. 2009;106:19819–23. doi: 10.1073/pnas.0907710106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marcotte EM, Tsechansky M. Disorder, promiscuity, and toxic partnerships. Cell. 2009;138:16–8. doi: 10.1016/j.cell.2009.06.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mooney SM, Grande JP, Salisbury JL, Janknecht R. Sumoylation of p68 and p72 RNA helicases affects protein stability and transactivation potential. Biochemistry. 2010;49:1–10. doi: 10.1021/bi901263m. [DOI] [PubMed] [Google Scholar]
- Morgan M, Iaconcig A, Muro AF. CPEB2, CPEB3 and CPEB4 are coordinately regulated by miRNAs recognizing conserved binding sites in paralog positions of their 3’-UTRs. Nucleic Acids Res. 2010;38:7698–710. doi: 10.1093/nar/gkq635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Patil A, Kinoshita K, Nakamura H. Hub promiscuity in protein-protein interaction networks. Int J Mol Sci. 2010;11:1930–43. doi: 10.3390/ijms11041930. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Prilusky J, Felder CE, Zeev-Ben-Mordehai T, Rydberg EH, Man O, Beckmann JS, Silman I, Sussman JL. FoldIndex: a simple tool to predict whether a given protein sequence is intrinsically unfolded. Bioinformatics. 2005;21:3435–8. doi: 10.1093/bioinformatics/bti537. [DOI] [PubMed] [Google Scholar]
- Radivojac P, Vacic V, Haynes C, Cocklin RR, Mohan A, Heyen JW, Goebl MG, Iakoucheva LM. Identification, analysis, and prediction of protein ubiquitination sites. Proteins. 2010;78:365–80. doi: 10.1002/prot.22555. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rechsteiner M, Rogers SW. PEST sequences and regulation by proteolysis. Trends Biochem Sci. 1996;21:267–71. [PubMed] [Google Scholar]
- Ren J, Gao X, Jin C, Zhu M, Wang X, Shaw A, Wen L, Yao X, Xue Y. Systematic study of protein sumoylation: Development of a site-specific predictor of SUMOsp 2.0. Proteomics. 2009;9:3409–3412. doi: 10.1002/pmic.200800646. [DOI] [PubMed] [Google Scholar]
- Scanlan MJ, Simpson AJ, Old LJ. The cancer/testis genes: review, standardization, and commentary. Cancer Immun. 2004;4:1. [PubMed] [Google Scholar]
- Spassov DS, Jurecic R. Cloning and comparative sequence analysis of PUM1 and PUM2 genes, human members of the Pumilio family of RNA-binding proteins. Gene. 2002;299:195–204. doi: 10.1016/s0378-1119(02)01060-0. [DOI] [PubMed] [Google Scholar]
- Stevenson BJ, Iseli C, Panji S, Zahn-Zabal M, Hide W, Old LJ, Simpson AJ, Jongeneel CV. Rapid evolution of cancer/testis genes on the X chromosome. BMC Genomics. 2007;8:129. doi: 10.1186/1471-2164-8-129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Suyama T, Shiraishi T, Zeng Y, Yu W, Parekh N, Vessella RL, Luo J, Getzenberg RH, Kulkarni P. Expression of cancer/testis antigens in prostate cancer is associated with disease progression. Prostate. 2010;70:1778–87. doi: 10.1002/pros.21214. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tompa P, Csermely P. The role of structural disorder in the function of RNA and protein chaperones. FASEB J. 2004;18:1169–75. doi: 10.1096/fj.04-1584rev. [DOI] [PubMed] [Google Scholar]
- Uversky VN. Natively unfolded proteins: a point where biology waits for physics. Protein Sci. 2002;11:739–56. doi: 10.1110/ps.4210102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Uversky VN, Dunker AK. Understanding protein non-folding. Biochim Biophys Acta. 2010;1804:1231–64. doi: 10.1016/j.bbapap.2010.01.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Uversky VN, Oldfield CJ, Dunker AK. Intrinsically disordered proteins in human diseases: introducing the D2 concept. Annu Rev Biophys. 2008;37:215–46. doi: 10.1146/annurev.biophys.37.032807.125924. [DOI] [PubMed] [Google Scholar]
- van Dieck J, Teufel DP, Jaulent AM, Fernandez-Fernandez MR, Rutherford TJ, Wyslouch-Cieszynska A, Fersht AR. Posttranslational modifications affect the interaction of S100 proteins with tumor suppressor p53. J Mol Biol. 2009;394:922–30. doi: 10.1016/j.jmb.2009.10.002. [DOI] [PubMed] [Google Scholar]
- Vavouri T, Semple JI, Garcia-Verdugo R, Lehner B. Intrinsic protein disorder and interaction promiscuity are widely associated with dosage sensitivity. Cell. 2009;138:198–208. doi: 10.1016/j.cell.2009.04.029. [DOI] [PubMed] [Google Scholar]
- Welchman RL, Gordon C, Mayer RJ. Ubiquitin and ubiquitin-like proteins as multifunctional signals. Nat Rev Mol Cell Biol. 2005;6:599–609. doi: 10.1038/nrm1700. [DOI] [PubMed] [Google Scholar]
- Westbrook VA, Schoppee PD, Diekman AB, Klotz KL, Allietta M, Hogan KT, Slingluff CL, Patterson JW, Frierson HF, Irvin WP, Jr, Flickinger CJ, Coppola MA, Herr JC. Genomic organization, incidence, and localization of the SPAN-x family of cancer-testis antigens in melanoma tumors and cell lines. Clin Cancer Res. 2004;10:101–12. doi: 10.1158/1078-0432.ccr-0647-3. [DOI] [PubMed] [Google Scholar]
- Wiklund L, Sokolowski M, Carlsson A, Rush M, Schwartz S. Inhibition of translation by UAUUUAU and UAUUUUUAU motifs of the AU-rich RNA instability element in the HPV-1 late 3’ untranslated region. J Biol Chem. 2002;277:40462–71. doi: 10.1074/jbc.M205929200. [DOI] [PubMed] [Google Scholar]
- Wong YH, Lee TY, Liang HK, Huang CM, Wang TY, Yang YH, Chu CH, Huang HD, Ko MT, Hwang JK. KinasePhos 2.0: a web server for identifying protein kinase-specific phosphorylation sites based on sequences and coupling patterns. Nucleic Acids Res. 2007;35:W588–94. doi: 10.1093/nar/gkm322. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang ZR, Thomson R, McNeil P, Esnouf RM. RONN: the bio-basis function neural network technique applied to the detection of natively disordered regions in proteins. Bioinformatics. 2005;21:3369–76. doi: 10.1093/bioinformatics/bti534. [DOI] [PubMed] [Google Scholar]
- Zeng Y, He Y, Yang F, Mooney SM, Getzenberg RH, Orban J, Kulkarni P. The Cancer/Testis Antigen Prostate-associated Gene 4 PAGE4 Is a Highly Intrinsically Disordered Protein. J Biol Chem. 2011;286:13985–94. doi: 10.1074/jbc.M110.210765. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhao R, Tang B, Liu Y, Zhu N. NLS-dependent and insufficient nuclear localization of XAGE-1 splice variants. Oncol Rep. 2011;25:1083–9. doi: 10.3892/or.2011.1175. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
The following information is presented going from the outside to the inside of the CIRCOS circles: Text Track - shows the names of all the CTAs. The highly ordered Cancer/Testis Antigens (CTAs) (0-10% disorder) are indicated in red. The moderately disordered Cancer/Testis Antigens (CTAs) (11-30% disorder) are indicated in green and highly disordered CTAs (31-100%) are shown in blue. A ‘mouse over’ function reveals the following information: name of CTA, chromosome, start and end position of the CTA at nucleotide level, and percent disorder. Each track is drawn in order of its position on the respective chromosome. Scale Track – scale is reduced to 1e-6 and is shown in multiples of 10. Ideogram Track – colored track with numbers of the chromosomes. Mouse over gives the following information: chromosome and its size in megabases. Scatter Plot – the CTAs are represented as solid circles based on their position on the chromosomes and colored to correspond with the chromosome they belong. The Track is divided into 13 lines and 12 spaces between them to show position of chromosomes 1-22, and the X and Y (innermost line indicates 0). Mouse over shows the start and end positions of each CTA on the respective chromosome. Highlight Track – the transparent colored track showing the number of CT genes and the total number of genes on each chromosome.
Percent Cancer/Testis Antigens (CTAs) with PEST sequence/100 amino acids (A). Percent CTAs with PEST sequences seen in the 3 disordered groups of non-X (C), and CT-X Antigens (B), respectively. CTAs segregated into CT-X and non-X CT Antigens (D). The RONN algorithm was applied to group the CTAs. Standard errors were calculated and all reported differences were found to be statistically significant (Chi square test: p<0.001 for A, Wilcoxon Rank Sum test (RS): p<0.05 for B & C and T test: p<0.0001 for D).
The percent CTAs with ≥ 2 ubiquitylation sites/100 amino acids are plotted as a function of disorder calculated by RONN (A). CT-X and non-X CT Antigens were then plotted separately with respect to disorder (B & C). The CTAs were segregated into CT-X and non-X CT Antigens (D) Standard errors were calculated and all reported differences were found to be statistically significant except when we considered the non-X or CT-X status (D) (Chi square test: p<0.001 for A, Wilcoxon Rank Sum test (RS): p<0.001 for B & C and T test: NS for D. NS = Not significant).
Percent Cancer/Testis Antigens (CTAs) with ≥ 2 phosphorylation sites/100 amino acids is plotted with respect to disorder (A). Percent CT-X (B) and non-X CT Antigens (C) with ≥ 2 phosphorylation sites/100 amino acids was plotted with respect to disorder, respectively. Phosphorylation sites were predicted for both CT-X and non-X CT Antigens (D).The RONN algorithm was applied to group the CTAs. Standard errors were calculated and all reported differences were found to be statistically significant (Chi square test: p<0.05 for A, and Wilcoxon Rank Sum test (RS): p<0.05, p <0.001 for B & C, respectively and T test: p <0.05 for D).
The percent Cancer/Testis Antigens (CTAs) with ≥ 3 acetylation sites/100 amino acids is plotted with respect to disorder (A). Percent CT-X (B) and non-X CT Antigens (C) with ≥ 3 acetylation sites/100 amino acids was plotted with respect to disorder. Acetylation sites were predicted for both CT-X and non-X CT Antigens (D). The RONN algorithm was applied to group the CTAs. Standard errors were calculated and all reported differences were found to be statistically significant (Chi square test: NS for A, and Wilcoxon Rank Sum test (RS): NS, p <0.01 for B & C, respectively and T test: p <0.05 for D. NS = Not significant).
