Summary
Peptidyl arginine deiminases (PADIs) catalyze protein citrullination, a post-translational conversion of arginine to citrulline. The most widely expressed member of this family, PADI2, regulates cellular processes that impact several diseases. We hypothesized that we could gain new insights into PADI2 function through a systematic evolutionary and structural analysis. Here, we identify 20 positively selected PADI2 residues, 16 of which are structurally exposed and maintain PADI2 interactions with cognate proteins. Many of these selected residues reside in non-catalytic regions of PADI2. We validate the importance of a prominent loop in the middle domain that encompasses PADI2 L162, a residue under positive selection. This site is essential for interaction with the transcription elongation factor (P-TEFb) and mediates the active transcription of the oncogenes c-MYC, and CCNB1, as well as impacting cellular proliferation. These insights could be key to understanding and addressing the role of the PADI2 c-MYC axis in cancer progression.
Subject areas: Natural sciences, Biological sciences, Biochemistry, Molecular biology, Evolutionary biology, Bioinformatics
Graphical abstract
Highlights
-
•
Positively evolved residues are mainly in the non-catalytic domain of the PADI2
-
•
L162 of PADI2 is a positively evolved residue at the structurally exposed loop
-
•
PADI2-L162 mediates PADI2’s effective interaction with the P-TEFb complex
-
•
PADI2-L162 dictates cell proliferation and c-MYC transcription
Natural sciences; Biological sciences; Biochemistry; Molecular biology; Evolutionary biology; Bioinformatics
Introduction
Members of the peptidyl arginine deiminase (PADI) family catalyze the post-translational calcium-dependent citrullination of arginines, converting them into the non-coded amino acid citrulline.1,2,3,4 Arginine citrullination is a widespread post-translational modification (PTM) that increases the hydrophobicity of proteins and can contribute to fine-tuning physiological processes. For instance, the citrullination of core histones weakens histone–nucleic acid interactions, impacting both chromatin organization and transcription.5 Citrullination can also affect protein folding and consequently protein function.6,7,8,9 The five members of the PADI family have distinct tissue-specific expression patterns: PADI1 is present only in the epidermis and uterus; PADI3, in the epidermis and hair follicles; PADI4, in immune cells, brain, uterus, and bone marrow; PADI6, in ovarian egg cells, embryonic tissues, and testicles; and PADI2, in brain, uterus, spleen, breast, pancreas, skin, and skeletal muscles.2,3 Notably, PADI2 has been associated with multiple pathological states, including autoimmune disorders10 and neurological diseases,11,12 as well as with several cancers (e.g., breast, cervical, liver, lung, ovarian, thyroid, gastric, and prostate cancer).13,14,15,16,17,18,19,20,21,22 The link between PADI2 and various diseases underscores an unmet need to elucidate its molecular mechanisms of action and to understand its pathophysiological function.
PADI family members have distinct substrates and target diverse arginine residues in different proteins.23 All family members are Ca2+-dependent, but PADI2 also relies on ordered calcium binding to its active site for substrate binding and catalysis, as demonstrated in the deimination of arginine 26 in histone H3 (H3R26).24,25,26,27 Importantly, H3R26 citrullination is critical for interaction with the chromatin remodeller SMARCAD1 (SWI/SNF-related matrix-associated actin-dependent regulator of chromatin subfamily A containing DEAD/H box 1) and hence links PADI2 to gene activation in the context of naive pluripotency.28 PADI2 also contributes to chromatin modification that promotes the differentiation of oligodendrocyte precursors and efficient myelination, processes required for motor and cognitive functions.29 Recently, PADI2 was found to citrullinate MEK1, thereby promoting signaling by extracellular signal-regulated protein kinase 1/2 (ERK1/2) signaling in endometrium cancer.22
Recently, we identified another unique function of PADI2: the deimination of arginine 1810 (R1810) on repeat 31 of the C-terminal domain (CTD) of the large subunit of RNA polymerase 2 (RNAPII). This modification potentiates the interaction between the RNAPII-CTD and the positive transcription elongation factor b complex (P-TEFb) and consequently facilitates gene expression that is essential for cell identity.30 Indeed, we reported that the citrullination of R1810 in the CTD of RNAPII facilitates its interaction with the P-TEFb kinase complex and promotes the recruitment of CDK9 to transcription start sites (TSS).30 This contributes to overcoming the RNAPII pausing barrier, highlighting the functional connection between PADI2 and the P-TEFb kinase complex. These studies support the notion that PADI2 can selectively citrullinate arginine residues in specific proteins, which may explain its relevance to several pathophysiological conditions.
The PADI genes are ubiquitous in vertebrates but are absent from yeast, worms, and flies. A recent study focusing on the comprehensive identification of PADI homologs unveiled the evolutionary trajectory of PADIs within the animal lineage.31 PADIs appear to have been introduced from cyanobacteria into animals by horizontal gene transfer (HGT),31 supporting the previous hypothesis of HGT as a mechanism for introducing new genetic material into vertebrate genomes.32,33,34,35 Enzymes mediating citrullination in human parasites and microbes are highly divergent in sequence and have different substrate specificities. These include pPAD, which is an extended agmatine deiminase found in Porphyromonas gingivalis, and Giardia lamblia ADI, an extended form of the free L-arginine deiminase gADI found in this human parasite.36,37 Previous small-scale phylogenetic analyses of the PADI family have shown that PADI2 is the most conserved family member. Mammalian PADIs comprise three structural domains; the N-terminal domain (NTD) (PADI_N, Pfam annotation: PF08526), the middle domain (PADI_M, Pfam annotation: PF08527), and the catalytic C-terminal domain (PADI_C, Pfam annotation: PF03068).38,39,40,41 Of note, PADI1 and PADI3, and PADI4 and PADI6, are more closely related in evolutionary terms than PADI2 and any other family member.6 This finding suggests that PADI2 is most closely related to the ancestral PADI enzyme and that the other PADI enzymes arose through gene duplication more recently in evolution.
The enzymes responsible for essential PTMs (including phosphorylation, acetylation, and glycosylation) are found across all domains of life, suggesting that they were present in the Last Universal Common Ancestor (LUCA).42,43 Similarly, the PADI_C domain was present in the LUCA, indicating an ancient origin of citrullination. The PADI_M domain, which encompasses the sequential calcium-binding sites and maintains allosteric communication with PADI_C,25 is present in cyanobacteria. Indeed, recent work supports the degeneration of the PADI_N domain in cyanobacteria and demonstrates the existence of catalytically active PADI proteins in cyanobacteria.31 The relatively recent appearance of the NTD during the PADI family evolution suggests it has a functional relevance that is unique to higher organisms.
Previously, we found that PADI2 interacts with P-TEFb to maintain the expression of actively transcribed genes.30,44 The P-TEFb complex comprises cyclin-dependent kinase 9 (CDK9) and its regulatory partner cyclin T1 (CCNT1).45 Activation of the P-TEFb complex is required early in transcription to overcome RNAPII pausing and promote the productive phase of transcription elongation.45,46,47,48 Lack of recruitment of the P-TEFb complex and associated components has been linked to several disease conditions,49,50,51 highlighting the importance of this process in healthy physiology. While PADI2 could be a therapeutic target that specifically prevents p-TEFb-mediated expression of genes that promote disease states, more needs to be understood.
Here, we analyze the recent evolution of the PADI protein family. We performed a comparative genomics analysis of PADI2 and identified 20 putative amino acid substitutions, in different species, that might be important for PADI2 structure and functionality in mammals. The majority of the selected amino acids are exposed in the three-dimensional (3D) structure and belong to the non-catalytic NTD or to the middle domain of PADI2. These positions suggest that they can participate in protein-protein interactions. We investigated the functional relevance of positively selected exposed amino acids in the NTD and middle domain of PADI2 by analyzing their impact on PADI2 interactions with the P-TEFb complex. We found that leucine 162 (L162), in the exposed loop of the middle domain of PADI2, underwent positive selection during the initial divergence of primates. Indeed, this residue contributes to interactions between PADI2 and the P-TEFb complex. These results establish a link between the evolutionary selection of key residues in PADI2 and their role in regulating the protein-protein interactions required for functional transcriptional regulation.
Results
Evolutionary analysis of the peptidyl arginine deiminase family
To obtain a global picture of evolutionary relations within the PADI family, we first searched for all PADI family members with sequence homology to human PADI2 in the OMA (Orthologous MAtrix) database52 and collected all the available sequences in mammals (see STAR Methods). After close inspection, we discarded partial sequences and low-quality genomes, leaving 185 PADI orthologs that represent 25 mammalian species and three bird species (Figure S1). We used these sequences to build a multiple sequence alignment (MSA, File S1) and reconstructed a phylogenetic tree using the software RaxML, rooting the tree using the bird sequences as an outgroup (Figure S2, see STAR Methods).
The tree we built is consistent with five PADI genes that resulted from the duplication of a common ancestral sequence. Indeed, each of the five well-defined clusters in the reconstructed tree corresponds to the five different PADI genes (PADI1, green; PADI2 purple; PADI3, blue; PADI4, red; and PADI6, orange), with known species relationships being broadly recapitulated within each of these clusters (Figure S2). Robust bootstrap supports (82–100, Figure S3) for this tree strongly support the hypothesis that the genes within each cluster are orthologous. We observed clustering between PADI1 and PADI3 and between PADI4 and PADI6, with PADI2 being the most distant ortholog among family members. Notably, PADI2 also has shorter branch lengths in comparison to other family members, indicating a higher level of conservation among family members in this cluster. PADI6 family members have the longest branch lengths, indicating this cluster is the most divergent and suggesting these genes arose from an old duplication event. Interestingly, there are only three PADI genes in each bird species, and only one set of clusters within the PADI2 family, suggesting that one or more duplication events took place after the divergence from birds. PADI2 is also the closest subfamily to the remaining PADI outgroup species clusters, indicative of an ancestral position in this family.
Two bird genes (i.e., ANAPL06996 and ANAPL06995 in duck) in the unpainted cluster (Figure S2) did not exhibit clear orthology to specific PADI genes in either Ensembl or the OMA, though some were annotated as PADI1 orthologs. Our data suggest that there was a bird-specific duplication, possibly from PADI1, which is the closest gene in the synteny analysis (Figure S4A). Our species comparison also revealed an inversion of the entire PADI locus that appears to have occurred in the common ancestor of humans and gibbons (Hominoidea; Figure S4B). Most mammalian species have the same gene order as that found in mice (Figure S4C). Overall, our analyses revealed key evolutionary changes at the PADI locus, including gene re-arrangements and duplications within mammals.
Peptidyl arginine deiminase 2 is highly conserved across species
Since PADI2 is likely to be the ancestral copy, we decided to focus on the most recent changes occurring in its sequence. By focusing on a specific phylogenetic selection of species, we aimed to identify changes that have occurred since the common amniote ancestor that occur with good resolution in different mammals. Our dataset contains PADI2 representatives in all three major extant groups in class Mammalia and the principal orders or infraclasses: primates, rodents, carnivores, cetartiodactyla, lagomorphs, proboscidea, and marsupials. We built a multiple sequence alignment with all 28 PADI2 sequences (see STAR Methods, File S1) and computed the ratio of substitution rates at non-synonymous sites and synonymous sites (dN/dS; also termed Ka/Ks) to infer the direction and magnitude of natural selection acting on protein-coding genes.53 The difficulty of estimating the dN/dS ratio grows with the phylogenetic distance between sequences. Maximum likelihood methods were used to correct for multiple substitutions in the same site, and we estimated important parameters, such as the divergence between sequences or the transition/transversion ratio by deducing the most likely values to produce the input data.53
We calculated the dN/dS ratio for all internal branches and leaves in the tree using CodeML (Figure 1). We observed low dN/dS values whereby most of the branches had a value of <0.2. This was consistent with negative selection; in other words, changes in this genetic sequence were actively selected against. This analysis supports the idea that PADI2 is a highly conserved protein, suggesting that it performs critical functions in the organism (Figure 1). However, we observed a relatively high dN/dS in the common ancestor of all primates, possibly indicating the relaxation of selection or the fixation of a few beneficial mutations due to positive selection at this point in evolution. The dN/dS ratio requires a high number of changes to detect a selection effect and our data indicate that dN/dS values are very low in PADI2. Another limitation of this method is that it does not distinguish amino acid substitutions that are chemically similar from others with very different properties. However, this analysis suggests that intriguing selection processes operated during PADI2 evolution.
Detection of positively selected residues in peptidyl arginine deiminase 2
We used an orthogonal approach, the branch site positive selection test, to surpass the limitations of dN/dS analysis and identify specific residues that have undergone selection54,55 This test can discriminate between the relaxation of selection and positive selection and can detect individual residues that are under positive selection in a particular lineage. This test compares a null model in which codon-based dN/dS for all branches can only be ≤1; in the alternative model, the labeled foreground branch may include codons evolving at dN/dS > 1.54 This test can discriminate between the relaxation of selection or positive selection and can detect individual residues that are under positive selection in a particular lineage. After running the branch-site positive selection test on every branch of the PADI2 tree, we detected 20 individual relevant substitutions in different parts of the tree. We removed a few candidates for positively selected amino acids detected by CodeML in Loxodonta africana due to the alignment of a non-homologous region (238–278), possibly due to an assembly error in that species. The residues identified in the different species or branches are shown in Table 1 and Figure 2.
Table 1.
Domain | Alignment position | Sequence position (human) | Secondary structure | Flanking residues | MVORFFIP scores (0–1) | Accessibility |
---|---|---|---|---|---|---|
PADI_N | 70 | E16 | Beta strand | RVEAV | 0.8 | Exposed |
PADI_N | 79 | L25 | Beta strand | TYLWT | 0.0 | Buried |
PADI_N | 107 | V53 | Beta strand | EVVRD | 0.5 | Exposed |
PADI_N | 114 | E60 | Loop | AEEVA | 0.7 | Exposed |
PADI_M | 190 | K136 | Loop | NPKKA | 0.6 | Exposed |
PADI_M | 216 | L162 | Loop | PWLPK | 0.7 | Exposed |
PADI_M | 260 | V201 | Beta strand | EIVLY | 0.2 | Exposed |
PADI_M | 304 | S245 | Loop | GGSAE | 0.6 | Exposed |
PADI_M | 308 | L249 | Beta strand | ELLFF | 0.8 | Exposed |
PADI_M | 322 | S263 | Loop | GFSGL | 0.7 | Exposed |
PADI_M | 326 | S267 | Beta strand | LVSIH | 0.7 | Exposed |
PADI_M | 350 | T289 | Beta strand | TDTVI | 0.6 | Exposed |
PADI_C | 401 | L342 | Loop | QYLNR | 0.7 | Partially buried |
PADI_C | 439 | F380 | Loop | KDFPV | 0.8 | Exposed |
PADI_C | 460 | S401 | Loop | FESVT | 0.8 | Exposed |
PADI_C | 511 | K452 | Helix | FLKAQ | 0.7 | Exposed |
PADI_C | 567 | D507 | Helix capping | QKDGH | 0.5 | Exposed |
PADI_C | 596 | S536 | Helix | NESLV | 0.7 | Exposed |
PADI_C | 598 | V538 | Helix | SLVQE | 0.8 | Partially buried |
PADI_C | 723 | M663 | Helix capping | WHMVP | 0.7 | Partially buried |
The position of the amino acid in the domain, MSA, and PDB file (Supplementary material File S2) are shown in the domain, sequence position, and PDB number, respectively. The flanking residues are shown with the given positively selected amino acid depicted in bold. The type of secondary structure and solvent accessibility (i.e., exposure) is shown in the secondary structure and accessibility column. Finally, the MVORFFIP scores (see STAR Methods) are also indicated.
Positively selected residues in peptidyl arginine deiminase 2 may mediate protein-protein interactions
We next integrated our evolutionary insights into PADI2 with known structural information to study the properties of positively selected residues. To pursue this analysis, we mapped all the positively selected amino acids onto the structure of PADI2. 12 of the 20 positively selected amino acids mapped onto the PADI2_N and PADI2_M domains, while eight mapped to the PADI2_C domain. The latter contains the catalytic pocket and is responsible for the citrullination of arginine residues. (Table 1; Figures 3A and 3B). Of note, only one of the positively selected amino acids (L25) is fully buried (amino acids are numbered as shown in Table 1, File S2, Video S1); most of the other amino acids are fully exposed except for L342, V538, and M663, which are partially buried. In the secondary structure, approximately 50% of the amino acids are located in loops, while the others are either in beta strands or alpha helices (including helix-capping regions) (Table 1).
After evaluating the features of PADI2_N and PADI2_M domains and analyzing each particular amino acid and its potential role based on structural data, we concluded that the likely roles of E16, V53, E60, K136, L162, V201, S245, S249, S267, and T289 are to mediate interactions with putative binding partners. These amino acids are fully exposed, and their chemical properties correspond with classical protein interfaces.56,57 We thus characterized these positively selected amino acids using Multi-VORFFIP (MVORFFIP), a tool that predicts protein-, peptide-, DNA-, and RNA-binding sites.56 Notably, the MVORFFIP predictions for all except V201 and L25 yielded very high scores (>0.7, on a scale of 0–1), supporting the hypothesis that they are interface residues (Figures 3C and 3D; Table 1). S263 is located in a long loop close to the interface between domains, suggesting that it plays a role in hinge motions and geometrical orientation between domains. Given its position in the structure, the likely role of L25 (the only positively selected amino acid that is completely buried) may be to prevent the distortion of a nearby beta-sandwich. Indeed, a larger hydrophobic amino acid at this position would change the packing of the beta-strands.
We found that three positively selected residues in the PADI2_C domain, L342, F380, and S401, are close to the catalytic pocket. These residues are likely to play a role in substrate specificity (L342) and in the dynamics of catalysis (F380 and S401). K452, S536, and V538, however, are located in a helix that is distant from the active site pocket. Judging from the structural microenvironment, these residues could also play a role in protein-protein interactions, in line with the high scores assigned by MVORFFIP (Figure 3D; Table 1).56 Finally, D507 and M663 are both located in helix-capping positions. Conservation in helix capping is important for the stability and integrity of the helix. Residue M663 may also play a role in the packing of the C-terminal tail. A CAPS Database search58 showed that other helices with the same capping structure present a conserved small hydrophobic patch in the same position by including M, V, or L residues. It is noteworthy that M663 is indeed strictly conserved across all human PADIs. These observations suggest that most of these positively selected amino acids may function in facilitating and stabilizing protein-protein interactions.
The middle domain of peptidyl arginine deiminase 2 maintains interactions with the positive transcription elongation factor b complex
We next tested whether residues that have been positively selected in evolution but are not linked to the catalytic domain of PADI2, might interact with other proteins. For this analysis, we examined the P-TEFb kinase complex, which we previously found to interact with PADI2.30 The structures of PADI225 and CDK9-CCNT159 are known individually, and therefore it was possible to derive a structural model of the PADI2–CDK9/CCNT1 complex by protein docking. After deriving the docking ensemble of PADI2-CDK9/CCNT, we ranked the models. Importantly, we used a swarm-based approach (among other metrics) to select the putative model of interaction, reasoning that determining the region of PADI2 would be relevant to its interaction with CDK9/CCNT1 (Figure S4D, File S3, Video S2). Moreover, we derived the structural model using AlphaFold-Multimer.60,61 As shown in Figure S4E (File S4, Video S3), interacting regions in the structural model from AlphaFold-Multimer overlap substantially with those identified by docking (Figure S4D). More specifically, the predicted interface between PADI2 and CDK9-CCNT1 contained PADI2_M, and the hinge region was overrepresented by the number of positively selected residues. Importantly, L162 is present on the highly exposed loop (Figure 4A).
While AlphaFold-Multimer generates a single model, docking is represented by a range of docking poses. This range of models allowed us to identify regions of the predicted interface, i.e., PADI2_M and PADI2_C (Figure 4A), that are over-represented in the docking space. The distribution of the top 200 docking poses (represented using a unique point depicting the center of mass of CDK9/CCNT1) revealed that the region around the hinge between the PADI2_M and PADI2_C was over-represented (Figure 4B, File S5, Video S4). This particular region includes L162, a residue overrepresented in the docking conformers, along with T159 and W161. These three residues are located in a highly exposed region. We therefore selected these residues for experimental validation. Note that L162 in PADI2 was present among the top-ranking interface residues likely to mediate the interaction with CDK9/CCNT1 and is also a positively selected residue, suggesting that it is important for maintaining this interaction.
The loop encompassing positively selected Leu162 is important for cell proliferation and maintaining peptidyl arginine deiminase 2 interactions with the positive transcription elongation factor b complex
Since the P-TEFb kinase complex is involved in transcription elongation and cell proliferation,62,63,64 we experimentally tested whether the L162-encompassing loop in the middle domain of PADI2 affects cell proliferation. In addition to L162, we tested its neighboring amino acids, W161 and T159, which also ranked highly in our evolutionary and interaction prediction analysis and are located in a highly exposed loop region structurally close to the boundary between the PADI2_M and the catalytic domain. We selectively expressed the green fluorescence protein (GFP)-tagged wild-type (WT) PADI2 as well as the single (T159A or W161A or L162A), double (L162A/W161A), or triple (L162A/W161A/T159A) mutant of PADI2. The GFP-positive cells were sorted by FACS to ensure positive cell selection (Figures S5A and S5B). We monitored cell proliferation in HeLa cells expressing the GFP-tagged PADI2 WT and mutants (single, T159A or W161A or L162A, double, L162A/W161A, and triple, L162A/W161A/T159A). Strikingly, we observed that among the three single PADI2 mutants, L162A significantly decreased cell proliferation compared to WT, highlighting the functional relevance of L162 in the middle domain of PADI2 (Figures 5A and S5C). In the same manner, the PADI2 double mutant L162A/W161A, as well as the triple mutant L162A/W161A/T159A, also showed a significant reduction in cell proliferation. These observations highlight the functional role of L162A in this remarkably exposed loop in the middle domain of PADI2.
Therefore, we focused on the L162A, L162A/W161A, and L162A/W161A/T159A mutants along with WT PADI2 for further analysis. In the immunoprecipitation assay, using a GFP-tagged antibody, we observed that the P-TEFb complex (containing CCNT1 and CDK9) from HeLa cells was efficiently immunoprecipitated with GFP-WT PADI2 but not with GFP-L162A-PADI2 (Figure 5B). These results confirmed the functional role of the positively selected L162 in maintaining PADI2 interactions with the P-TEFb complex. Likewise, the P-TEFb complex was only weakly immunoprecipitated with either the double or triple PAD12 mutant, highlighting the function of the highly exposed region in the middle domain of PADI2 (PADI2_M) in this protein-protein interaction. Considering that the proper recruitment of the P-TEFb complex is required for efficient transcription of highly expressed genes relevant to cell proliferation, we validated the reduced expression levels of the c-MYC, CENPQ, and CCNB1 genes in cells expressing the single, double, or triple PADI2 mutant (Figure 5C). Of note, the levels of PADI2 expression did not differ significantly. We observed that the L162A single mutant (as well as double and triple mutants) significantly and specifically reduced the expression of highly expressed genes, including c-MYC, CCNB1, and CENPQ without affecting the levels of a control gene (the low-expressed LRRC39 gene). Next, we examined if the L162A mutant in PADI2 can mediate CDK9 recruitment to the promoter region of the target genes. We performed a chromatin immunoprecipitation (ChIP) assay using a CDK9-specific antibody in HeLa cells specifically expressing either WT PADI2, the L162A mutant, or the double and triple mutants. We found that CDK9 occupancy decreases in the presence of mutants compared to the WT PADI2 control (Figure 5D), suggesting that the positively selected L162 residue, along with W161A/T159A, is important to maintaining PADI2 interaction with the P-TEFb complex.
Given the fact that the L162-encompassing loop is important to maintaining PADI2 interaction with the P-TEFb complex, we next explored the effects of the PADI2 triple-mutant (L162A/W161A/T159A) on the transcriptome using single-cell RNA sequencing experiments. We compared the single-cell transcriptomics atlas of PADI2 triple-mutant to that of WT PADI2-expressing HeLa cells (Figures S6A and S6B). After quality control, 49,503 cells were subjected to downstream analysis, of which, 17,139 and 32,364 cells were derived from the WT and triple mutant PADI2 samples, respectively. We did not observe any significant differences in cell populations through unsupervised clustering and dimensionality reduction. TSNE plots organized the data into three clusters (Figure S6A). However, these clusters are not significantly different in triple mutants in comparison to WT (Figure S6B). Among the differentially expressed genes, we found c-MYC (Figures 6A and 6B). Within the top genes that are differentially expressed, we also found CCNB1 and CENPQ, which are related to cell cycle regulation (Figures 6A and 6B). Remarkably, in global differential expression (DEseq) analysis, the PADI2 triple mutant (L162A/W161A/T159A) affected the expression of over 469 genes (Log2FC > 0.58 or Log2FC < −0.58, p value <0.05). Of these, 162 genes were downregulated and 307 upregulated (Table S2). Gene ontology analysis of the top down-regulated genes (Log2FC < −1.0, p value <0.01) revealed enrichment in the mitochondrial function, cell cycle, RNA binding, and chromosome organization. The up-regulated genes were instead enriched for genes linked to the inflammatory response, cell-cell signaling, and cell adhesion processes (Log2FC > 1.0, p value <0.01) (Figure 6C; Table S3). We also investigated the regulatory gene sets using the gene ontology tool. The promoter regions of downregulated genes were significantly enriched for GTF2A2 (UniProt: P52657), GTF2E2 (UniProt: P29084), FOXE1 (UniProt: O00358), MZF1 (UniProt: P28698), ZFHX3_(UniProt: Q15911) binding motifs (Figure 6D; Table S4). Meanwhile, the upregulated genes showed significant enrichment in the promoter regions for motifs that can bind ZNF92 (UniProt: Q03936), METTL14 (UniProt: Q9HCE5), GFI1 and CTTTGA motif (Figure 6E; Table S4). This analysis highlights a connection between regulatory elements and differentially expressed (WT vs. PADI2 triple mutant) genes.
Importantly, c-MYC, CCNB1, and CENPQ overexpression have been linked to oncogenesis.65,66,67,68,69,70 These results support the functional relevance of the positively selected L162-encompassing loop in the middle domain of PADI2 in the expression of important oncogenic genes. Modulation of c-MYC expression by epigenetic regulation in cancer cells has been suggested to increase heterogeneity in the transcriptional landscape that promotes tumorigenesis.71 These observations suggest a potential function for the L162-encompassing loop of PADI2 in the regulation of the c-MYC/CCNB1/CENPQ axis which has been linked to tumorigenesis.
Discussion
We characterized the evolution of the PADI gene family using the most complete dataset of mammalian orthologous sequences to date. The three major extant groups in the class Mammalia are organized into two subclasses: Prototheria which includes monotremes (platypus and echidnas), and Theria, which includes the infraclasses Methatheria (marsupials), and Eutheria (placental mammals). According to Jones et al. 2009,72 there are 5416 mammalian species divided into 29 orders. Most known species belong to the 7 biggest orders (4865 species), all of which have at least one representative in our phylogenetic analysis.
The functional relevance of PADI2 is highlighted by its high sequence conservation across species and very low dN/dS values. We took advantage of the natural experiments performed by evolution and applied sensitive methodologies such as the branch-site test of positive selection. Comparing the amino acid sequences of 25 mammalian species, we identified 20 positively selected residues predominantly located in the non-catalytic domain of the PADI2 in different species and phylogenetic branches (Figures 1, 2 and S1–S4). Since these selected residues could have a strong functional impact on PADI2, we studied them further using the human sequence.
By examining the location of the positively selected residues in the structure of human PADI2, modeled in a complex with R1810 in the CTD of RNAPII, we determined that the majority of positively selected PADI2 residues were structurally highly exposed in the non-catalytic domains of PADI2 (Figure 3; Table 1). Notably, the highly exposed position and chemical properties of these positively selected residues suggested they play important functions in cellular processes. In addition, our MVORFFIP analysis revealed that these interface residues may play a significant role in protein-protein interactions (Figure 3D). Therefore, it could be postulated that the appearance of these positively selected residues contributed to the evolution of PADI2 function by modulating its interaction with an essential cognate set of proteins. Further work will be required to investigate these possibilities.
Specifically, the P-TEFb complex, comprising CCNT1 and CDK9, interacts with PADI2 and facilitates the effects of the citrullination of RNAPII-R1810 on transcription.30 Hence, we used structure modeling approaches to derive the tertiary structure of the PADI2–CCNT1/CDK9 complex. To identify the PADI2 residues that contribute to forming the PADI2–CCNT1/CDK9 complex, we analyzed structural models and identified a highly exposed loop in the PADI2_M domain as a potential interacting region. Strikingly, this loop includes the positively selected residue L162, along with T159 and W161. This observation supports the notion that positively evolved residues tend to coincide with the important residues on the protein surface, and that positively selected structural clusters are important for cellular function (Figure S6C).
Our data in HeLa cells overexpressing the GFP-tagged WT and a mutant PADI2 also unveiled an essential function for the L162-containing loop in interaction with the P-TEFb kinase complex and cell proliferation. Notably, the mutation of L162, alone or in combination with mutated W161 and/or T159, reduced (i) cell proliferation, (ii) PADI2 interaction with P-TEFb kinase complex, and (iii) reduced the PADI2-dependent expression of highly expressed genes that are relevant for cell growth.
We focused on the interactions between PADI2 and the P-TEFb complex, specifically with the positively selected L162-encompassing loop. However, in the future, it would be interesting to investigate the functional role of other positively selected PADI2 amino acids in cellular processes. Nonetheless, our analysis shows the importance of using a multi-disciplinary approach. Here, we used comparative genomics, evolutionary analysis, structural modeling, and genetic perturbations to analyze the functional relevance of a recently evolved non-catalytic domain in a highly conserved enzyme family. Overall, this evolutionary approach led to the identification of a PADI2 structural loop that is under positive selection and specifically supports an important role for PADI2 in the modulation of transcription elongation.
Limitations of the study
Note that one limitation of our study was that we could not mutate the endogenous PADI2 gene due to the hyperpolyploid nature of the HeLa genome. Therefore, we based our analysis on the overexpression of GFP-tagged wild-type and mutant versions of PADI2 in HeLa cells.
STAR★Methods
Key resources table
REAGENT or RESOURCE | SOURCE | IDENTIFIER |
---|---|---|
Antibodies | ||
Mouse monoclonal anti-GFP | Roche | 11814460001 |
Rabbit polyclonal anti-CCNT1 | Bethyl Labs | A303-499A |
Mouse monoclonal anti-GFP | Merck | 11814460001 |
Rabbit polyclonal anti-CDK9 (D-7) | Santa Cruz Biot. | SC-13130 |
Rabbit polyclonal anti-CCNT1 | Bethyl Labs | A303-499A |
IgG mouse | Merck | 12-371 |
IgG Rabbit | Cell Signaling | 2729S |
Chemicals, peptides, and recombinant proteins | ||
Lipofectamine 3000 | Invitrogen | L3000008 |
Proteinase K | ThermoFisher Scientific |
AM2546 |
TURBO™ DNase | ThermoFisher Scientific |
AM2239 |
DNase I (RNase-free) | ThermoFisher Scientific |
AM2222 |
Cell proliferation ELISA BrdU Colorimetric assay | Roche | 11647229001 |
Dynabeads™ M-280 Sheep Anti-Mouse IgG | ThermoFisher | 11201D |
Protein G Plus / Protein A Agarose | Millipore | IP05 |
Deposited data | ||
Single Cell RNA-seq | This study | GSE246420 |
Experimental models: Cell lines | ||
HeLa cells | ATCC | CCL-2 |
Oligonucleotides | ||
PADI2- T159A Forward 5'TGGTGAACTGTGACCGAGAGGCACCCT GGTTGCCCAAGGAGGACTGCCGTGATG 3' PADI2- T159A Reverse 5'CATCACGGCAGTCCTCCTTGGGCAACCA GGGTGCCTCTCGGTCACAGTTCACCA 3'. |
This study | |
PADI2- W161A Forward 5'TGGTGAACTGTGACCGAGAGACACCCGC GTTGCCCAAGGAGGACTGCCGTGATG 3' PADI2- W161A Reverse 5'CATCACGGCAGTCCTCCTTGGGCAACGC GGGTGTCTCTCGGTCACAGTTCACCA 3'. |
This study | |
PADI2- L162A Forward 5'TGGTGAACTGTGACCGAGAGACACCCTG GGCACCCAAGGAGGACTGCCGTGATG3' PADI2- L162A Reverse 5'CATCACGGCAGTCCTCCTTGGGTGCCCA GGGTGTCTCTCGGTCACAGTTCACCA3'. |
This study | |
PADI2- L162A/ W161A Forward 5'TGGTGAACTGTGACCGAGAGACACCCGCC GCACCCAAGGAGGACTGCCGTGATG 3' PADI2- L162A/ W161A Reverse 5'CATCACGGCAGTCCTCCTTGGGTGCGGCGG GTGTCTCTCGGTCACAGTTCACCA 3'. |
This study | |
PADI2- L162A/W161A/T159A Forward 5'TGGTGAACTGTGACCGAGAGGCACCC GCCGCACCCAAGGAGGACTGCCGTGATG -3' PADI2- L162A/W161A/T159A Reverse 5'CATCACGGCAGTCCTCCTTGGGTGCGGCGG GTGCCTCTCGGTCACAGTTCACCA3' |
This study | |
Recombinant DNA | ||
Human PADI2 wild type gene cloned in pCPR0032 vector | This study | WT-PADI2 |
L162A mutation in WT-PADI2 | This study | L162A-PADI2 |
W161A mutation in L162A-PADI2 | This study | L162A/W161A-PADI2 |
T159A mutation in L162A/W161A-PADI2 | This study | L162A/W161A/T159A-PADI2 |
Software and algorithms | ||
MVORFFIP | Segura et al.56,58 | http://www.bioinsilico.org/MVORFFIP |
OMA Database | Altenhoff et al.52 | https://omabrowser.org/oma/home/ |
GSEA | Subramanian et al.73 | http://software.broadinstitute.org/gsea/index.jsp |
Wasabi | Veidenberg et al.74 | http://wasabiapp.org/ |
MAFFT | Katoh et al.75 | https://mafft.cbrc.jp/alignment/server/index.html |
AliView | Larsson et al.76 | http://ormbunkar.se/aliview/ |
trimAl | Capella-Gutiérrez et al.77 | http://trimal.cgenomics.org/ |
PAL2NAL | Suyama et al.78 | https://www.bork.embl.de/pal2nal/ |
RAxML | Stamatakis et al.79 | https://cme.h-its.org/exelixis//web/software/raxml/ |
ETE 3 | Huerta-Cepas et al.80 | http://etetoolkit.org/ |
GROMACS | Pronk et al.81 | http://www.gromacs.org |
VD2OCK | Segura et al.82 | http://www.bioinsilico.org/cgi-bin/VD2OCK/staticHTML/home |
Resource availability
Lead contact
Further information and requests for resources and results should be directed to the lead contact Priyanka Sharma (priyanka.sharma@ipbs.fr).
Materials availability
Further information and requests for plasmids generated in this study will be available upon reasonable request to the Lead Contact.
Data and code availability
-
•
Single-cell RNA sequencing data performed and used in this study have been deposited at GEO under the accession code GSE246420 and publically available.
-
•
No original code was generated in this paper.
-
•
Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.
Experimental model and subject details
Cell lines
HeLa cells (ATCC CCL-2) were grown in DMEM with 10% FBS and 100U/ml penicillin-streptomycin according to the ATCC’s recommendations. Cells were transfected using Lipofectamine 3000 (Invitrogen) according to the manufacturer’s instructions.
Method details
PADI2-GFP plasmids and fluorescence-activated cell sorting (FACS) sorting
PADI2 was cloned into the pCPR0032 GFP-tagged vector using the forward primer (F) 5'- AGAACCTGTACTTCCAATCCATGCTGCGCGAGCGGAC-3' and the reverse primer (R) 5'- GATCCGTATCCACCTTTACTTTAGGGCACCATGTGCCACC-3′ using Gibson assembly.83 The selected plasmid sequence was verified by Sanger sequencing. Next, PADI2 wild-type (WT) was used to generate a single mutant (T159A or W161A or L162A), a double mutant (L162A/W161A), and a triple mutant (L162A/ W161A/ T159A), using the following primers:
PADI2- T159A Forward
5'- TGGTGAACTGTGACCGAGAGGCACCCTGGTTGCCCAAGGAGGACTGCCGTGATG -3'
PADI2- T159A Reverse
5'- CATCACGGCAGTCCTCCTTGGGCAACCAGGGTGCCTCTCGGTCACAGTTCACCA -3'.
PADI2- W161A Forward
5'- TGGTGAACTGTGACCGAGAGACACCCGCGTTGCCCAAGGAGGACTGCCGTGATG -3'
PADI2- W161A Reverse
5'- CATCACGGCAGTCCTCCTTGGGCAACGCGGGTGTCTCTCGGTCACAGTTCACCA -3'.
PADI2- L162A Forward
5'- TGGTGAACTGTGACCGAGAGACACCCTGGGCACCCAAGGAGGACTGCCGTGATG-3'
PADI2- L162A Reverse
5'-CATCACGGCAGTCCTCCTTGGGTGCCCAGGGTGTCTCTCGGTCACAGTTCACCA-3'.
PADI2- L162A/ W161A Forward
5'- TGGTGAACTGTGACCGAGAGACACCCGCCGCACCCAAGGAGGACTGCCGTGATG -3'
PADI2- L162A/ W161A Reverse
5'- CATCACGGCAGTCCTCCTTGGGTGCGGCGGGTGTCTCTCGGTCACAGTTCACCA -3'.
PADI2- L162A/W161A/T159A Forward
5'- TGGTGAACTGTGACCGAGAGGCACCCGCCGCACCCAAGGAGGACTGCCGTGATG -3'
PADI2- L162A/W161A/T159A Reverse
5'- CATCACGGCAGTCCTCCTTGGGTGCGGCGGGTGCCTCTCGGTCACAGTTCACCA-3'.
The corresponding fragments were generated by each pair of primers using oligonucleotide assembly and introduced by Gibson assembly.83 All the generated mutants were confirmed by Sanger sequencing.
For transfection, 2 × 106 HeLa cells were seeded in 10-cm plates, and 4 ug of the plasmid being tested was transfected using Lipofectamine 3000 (Invitrogen) for 24 hours according to the manufacturer's instructions. Cells were trypsinized, and GFP-positive live cells were sorted using BD influx (Becton and Dickinson, San Jose, CA). Briefly, cells were stained with 1μg/mL concentration of DAPI (4′, 6-diamidino-2-phenylindole) before FACS sorting. A SSC-H (side scatter height) versus FSC-H (forward scatter height), morphological-related parameters dot-plot was used to exclude debris by gating cells; doublets were then excluded using a FSC-H versus FSC-A (forward scatter area) by gating and dead cells were excluded using DAPI versus FSC-A dot-plot by gating living cells. GFP-positive cells were identified and isolated using a G4 gate in GFP versus autofluorescence (AF) dot-plot. Obtained data were analyzed using Flow Jo 10. 6. GFP-positive cells were used for experiments. Cells were centrifuged and stored as pellets at –80°C prior to RNA extraction and immunoprecipitation experiments.
Cell proliferation assay
BrdU (5'-bromo-2'-deoxyuridine) cell proliferation assay
HeLa cells (0.3 × 103) were transfected with either wild-type PADI2 (WT) or mutant PADI2 (with L642A or T159 or W161A or L642A/W161A, or L642A/W161A/T159A) using Lipofectamine 3000 (Invitrogen) in 96-well plates. The cell proliferation ELISA BrdU (5'-bromo-2'-deoxyuridine) colorimetric assay (Roche,11647229001) was performed as per the manufacturer's instructions. The experiments were performed on at least eight biological replicates.
Incucyte ® proliferation assays for live-cell analysis
HeLa cells seeded in 96-well plates with 300 cells per well, transfected with either wild-type PADI2 (WT) or mutant PADI2 (with L642A or T159 or W161A or L642A/W161A, or L642A/W161A/T159A). After 24hours of transfection, imaging was performed using the IncuCyte live cell imaging system (Essen BioScience). Scans at 4× magnification were taken every 8 hours for 5 days. Cell confluence was calculated from microscopy images using the Incucyte software algorithm to generate a proliferation index corresponding to the change in confluence for each well. These measurements are the mean ± SEM of at least six replicates.
RNA extraction and RT-qPCRs
RNA from HeLa cells transfected with either wild-type PADI2 (WT) or mutant PADI2 (with L642A, or L642A/W161A, or L642A/W161A/T159A) was extracted using RNeasy (Qiagen) according to the manufacturer's instructions. Purified RNA (1μg) was used for DNase treatment (Thermo Scientific) and quantified with a Qubit 3.0 Fluorometer (Life Technologies).
Reverse transcription of RNA was performed using a qScript™ cDNA Synthesis Kit (Quanta Bioscience 95047-100) according to the manufacturer's instructions. Complementary DNA was quantified by qPCR using Roche Lightcycler (Roche) as described.84 For each gene product, relative RNA abundance was calculated using the standard curve method and expressed as relative RNA abundance after normalizing against the human GAPDH gene level. All gene expression data generated by RT-qPCR represented the average and ± SEM of at least three biological replicates. Primers used for RT-qPCR are listed in Table S1.
Single-cell samples, library preparation, and sequencing
HeLa cells (0.6 × 106) were transfected with either wild-type PADI2 (WT) or the PADI2 triple mutant (with L642A/W161A/T159A) using Lipofectamine 3000 (Invitrogen) in 60mm plates. GFP-positive singlet cells and live cells were suspended in 1ml of PBS+BSA 0.05%. Cells were centrifuged at 400 rcf for 5 min at 4°C in order to bring the cell concentration to 300-1000 cells/μl. Cell concentration and viability were determined by manual counting using a Neubauer chamber and staining the cells with Trypan blue. Cells were partitioned into Gel Bead-In-Emulsions (GEMs) using the Chromium Controller system (10X Genomics), with a target recovery of 5000 total cells per sample. cDNA sequencing libraries were prepared using the Next GEM Single Cell 3’ Reagent Kits v3.1 (10X Genomics, PN-1000268), following the manufacturer's instructions. Briefly, after GEM-RT clean-up, cDNA was amplified for 11 cycles, and cDNA QC and quantification were performed on an Agilent Bioanalyzer High Sensitivity chip (Agilent Technologies). cDNA libraries were indexed by PCR using the PN-1000215 Dual Index Kit TT Set A Plate. Size distribution and concentration of 3' cDNA libraries were verified on an Agilent Bioanalyzer High Sensitivity chip (Agilent Technologies). Finally, sequencing of cDNA libraries was carried out using the Illumina NovaSeq 6000.
Single-cell RNA sequencing data processing and analysis
Transcriptome-wide analysis of human L162A/W161A/T159A-PADI2 mutant and WT cells was performed at single-cell resolution using NovaSeq 6000 pipeline. Sequencing data were aligned to the human reference genome Grch38. Data with at least 500 unique molecular identifiers (UMIs), log10 genes per UMI >0.8, >250 genes per cell and a mitochondrial ratio of less than 20% were extracted, normalized, and integrated using the Seurat package v4.0.3 in R4.0.2. After quality control and integration, we performed a modularity-optimized Louvain clustering. After quality control, 17,139 WT and 32,364 PADI2mut cells remained. Altogether, we used two biological replicates (PADI2_mut1: 16,177 cells, PADI2mut2: 16,187 cells, WT1: 8,568 cells, WT2: 8,571 cells).
Gene ontology (GO) and regulatory gene set analysis
GO Annotation and regulatory gene set analysis were performed using the online tool Gene Set Enrichment Analysis (GSEA, http://software.broadinstitute.org/gsea/index.jsp) collection database v5.73,85 The significant cut-off p-value and FDR q-value < 0.05. Plots were done with the use of Prism (GraphPad Prism 10.0.3 for MacOS), Cytoscape86 ( version 3.9.1) and R4.0.2.
ChIP-qPCRs
For ChIP assays,30 4 x 106 of FACS sorted GFP-positive HeLa cells (WT-PADI2, L162A-PADI2, L162A/W161A-PADI2, L162A/W161A/T159A-PADI2) were cross-linked for 10 min with 1% formaldehyde at 37°C. The chromatin lysate was sonicated to a DNA fragment size range of 100-200bp using a Biorupter sonicator (Diagenode). CDK9 was immunoprecipitated with 15μg of an anti-CDK9 antibody (D-7, sc-13130, lot no # B1422) or control mouse IgG (12-371, Merck) in IP Buffer with 2X SDS buffer (100mM NaCl, 50mM Tris-HCl, pH8, 5mM EDTA and 0.5% SDS) and 1X Triton buffer (100mM Tris-HCl, pH8.8, 100mM NaCl, 5mM EDTA and 5% Triton-X) with protease inhibitors (11836170001, Roche) for 16 hours at 4°C. This step was followed by incubation with 50μl of Dynabeads® M-280 sheep anti-mouse IgG (11201D, Thermo Scientific) for 3 hours. Beads were washed 3 times with low salt buffer (140mM NaCl, 50mM HEPES, pH 7.4, 1% Triton-X 100), 2 times with high salt buffer (500 mM NaCl, 50mM HEPES, pH 7.4, 1% Triton-X 100) followed by single wash with LiCl Buffer (10mM Tris HCl pH 8.0, 250 mM LiCl, 1% NP-40, 1% sodium deoxycholic acid and 1mM EDTA) and 1× TE buffer at 4°C. Subsequently, crosslinks were reversed at 65°C overnight, followed by RNase treatment for 1.5 hours, and bound DNA was purified by Phenol-Chloroform extraction. The resulting eluted DNA was quantified by Qubit 3.0 Fluorometer (Life Technologies) and followed by real-time qPCR analysis. Data are represented as fold-change over input fraction from at least 3 biological replicate experiments. Primers used for qPCR are listed in Table S1.
GFP-tagged PADI2 immunoprecipitation, western blot
Briefly, 3 × 106 FACS sorted GFP-positive (WT-PADI2, L162A-PADI2, L162A/W161A-PADI2, L162A/W161A/T159A-PADI2) cells were lysed on ice for 30 min in lysis buffer (1% Triton X-100 in 50mM Tris pH 7.4–7.6, 130 mM NaCl) containing protease inhibitors (11836170001, Roche) with rotation, followed by sonication for 7 min with every 30 sec on / 30 sec off. After centrifugation at 4°C and 13,000 rpm for 10 min, extracts were used for protein quantitation. For the immunoprecipitation (IP) assay, 2mg of extract was incubated for 12 hours with 100μl Dynabeads Protein A (10002D, Thermo Scientific). The monoclonal antibody anti-GFP (polyclonal rabbit, A-11122, Invitrogen) or a control rabbit IgG (2729S, Cell Signaling) was coupled with Dynabeads before incubation with extract at 4°C. The samples were washed 10 times with lysis buffer and boiled for 5 min in SDS gel sample buffer. Proteins were visualized by 4% to 12% SDS-PAGE gels and western blotting, using anti-GFP (11814460001, Roche), anti-CDK9 (sc-13130, Santa Cruz), or anti-CCNT1 (A303-499A, Bethyl Labs) were used for western blots.
Multiple sequence alignments
All of the homolog PADI sequences available in the OMA database were downloaded.52 The species for the study were selected using three criteria: (i) a good representation of the different mammalian lineages, (ii) good sequence quality, and (iii) the presence of a full PADI2 sequence. Three bird species (chicken, turkey, and duck) were used as outgroups. The protein sequences were aligned using the software PRANK74 and a pruned mammalian guide tree with branch distances.87 This program uses an evolutionary model to place insertions and deletions, minimizing the over-alignment of non-homologous regions, and has been shown to improve dN/dS estimates.88 Local realignments of two regions were done using MAFFT75 within AliView.76 Another MSA using only one-to-one orthologous PADI2 sequences was also built. TrimAI77 was applied to the PADI family MSA, enabling the -automated1 option, as recommended before a phylogenetic reconstruction. The PADI2 protein MSA was then converted into a nucleotide alignment using the script Pal2Nal. The corresponding cDNA sequences for each species were gathered from the OMA database and converted using the script Pal2Nal.78 The MSAs generated and the trees used are available as Supplementary Material S1.
Phylogenetic reconstruction
We reconstructed the PADI family tree using RaxML79 with 200 bootstrap values (-N 200), with an optimal amino acid substitution model chosen automatically (-m PROTGAMMAAUTO), and a rapid bootstrap algorithm (-f a) along with reproducible seed values (-p 12345, -x 345). Two reconstructions were performed: one with the original PADI family MSA and another one with the trimmed version of the alignment.
Estimation of dN/dS and positive selection
Estimates were made for both the number of non-synonymous substitutions per non-synonymous site (dN) and the number of synonymous substitutions per synonymous site (dS) using the free-ratio model in CodeML.89 For each branch and leaf in the tree, we performed a branch-site test of positive selection,54 implemented in the Phylogenetic Analysis by Maximum Likelihood (PAML) software package89 and available in the environment for tree exploration (ETE3) framework as bsA/bsA1.80 A likelihood ratio (LRT) was calculated as 2∗(L1-L0), where L1 and L0 are the maximum likelihood value for the alternative hypothesis and the null hypothesis respectively. A chi-squared distribution with 1 degree of freedom was used to calculate the p-values. Only the positions for which the p-value (< 0.05) is significant for positive selection are reported.
Characterization of positively selected residues using MVORFFIP
The prediction of interface residues was done using MVORFFIP.56 MVORFFIP is a structure-based prediction method that identifies protein-, peptide-, RNA- and DNA interfaces based on a range of structural, evolutionary, experimental, and energy-based information integrated by a Random Forest classifier The structure of PADI2 was submitted to the MVORFFIP server (http://www.bioinsilico.org/MVORFFIP)56 and protein interface scores were assigned to individual amino acids.
Structural modeling of PADI2-CDK9/CCNT1 and selection of putative interface residues
The structure of the trimer complex, PADI2-CDK9, was derived using protein docking as follows. The crystal structure of PADI2 (PDB code: 4n2a)25 and CDK9/CCNT1 (PDB code: 3blr)59 were available. The protein docking was performed using VD2OCK,82 generating the ensemble of over 10000 docking poses. Docking conformers were clustered using the GROMACS81 and sorted by the ES3DC potential.90 The top 200 poses were selected to identify the top 10 putative interface residues: for this, a composite score was calculated using the docking information, the MVORFFIP score and sequence conservation. Among the top 200 docking poses, as discussed above, a normalized score (Z score) was calculated for each exposed residue in PADI2. The mean and standard deviations to calculate the Z score were computed over the entire docking space, i.e., considering all docking poses. The MVORFFIP score was also computed as shown above. Finally, the conservation of each residue was computed using al2co.91
AlphaFold-multimer modeling of PADI2-CDK9/CCNT1 complex
Structural models of the trimer PADI2-CDK9/CCNT1 were generated using AlphaFold-Multimer60 as follows. The sequences were retrieved as mentioned above, PADI2 (PDB code: 4n2a)25 and CDK9/CCNT1 (PDB code: 3blr).59 The structure was derived using model parameters corresponding to “model_1_multimer”.
Quantification and statistical analysis
For RT-qPCR, ChIP-qPCR, and cell proliferation experiments, a Two-tailed unpaired Student's t-test was used to determine statistical significance between the groups. For cell proliferation experiments, the significance between groups calculated by Wilcoxon-Mann-Whitney test. Plots and indicated statistical analysis were done with the use of Prism (GraphPad Prism 10.0.3 for MacOS) unless otherwise stated. If exact p-values are not shown or indicated in legend then p-values are represented in all figures as follows: ∗, p-value ≤ 0.05; ∗∗, p-value ≤ 0.01; ∗∗∗, p-value ≤ 0.001; º, p-value > 0.05.
Acknowledgments
We thank Fátima Gebauer and François Le Dily from CRG for their constructive criticism and advice on this article. We acknowledge Veronica A. Raker for article editing. We thank the CNAG (National Center for Genomic Analysis, Barcelona) Genomic facility and CRG flow Cytometry unit for all technical support. This work was supported by grants to P.S. from the French National Research Agency (ANR) Young Investigator grant (ANR-21-CE12-0010), French Cancer Research Foundation (ARCPJA22021050003683), La Ligue Foundation for Cancer, France (Haute-Garonne, 285458), National Institute of Health and Medical Research (INSERM) Young Recruitment Support (U1194SHA), Cancéropôle Grand Sud-Ouest collaboration grant (R21031FF), Centre National de la Recherche Scientifique (CNRS), Université Toulouse III - Paul Sabatier, Toulouse, France. This work was also supported by a grant to M.B. from the European Research Council Synergy Grant “4DGenome” (609989).
Author contributions
Conceptualization, P.S.; methodology, J.L.V., N.F.F., and P.S.; investigation, J.L.V., N.F.F., B.O., M.B., and P.S.; formal analysis, J.L.V., N.F.F., and P.S.; data curation, J.L.V., N.F.F., M.E.R., C.T., F.P.P., B.O., C.N., and P.S.; project administration, P.S., writing-original draft, P.S. writing-review and editing, J.L.V., N.F.F., B.O., M.E.R., C.T., F.P.P., M.B., C.N., and P.S.; funding acquisition, P.S.; supervision, P.S.
Declaration of interests
The authors declare no competing interests.
Published: March 27, 2024
Footnotes
Supplemental information can be found online at https://doi.org/10.1016/j.isci.2024.109584.
Supplemental information
References
- 1.van Venrooij W.J., Pruijn G.J.M. Citrullination: A small change for a protein with great consequences for rheumatoid arthritis. Arthritis Res. 2000;2:249–251. doi: 10.1186/ar95. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Wang S., Wang Y. Peptidylarginine deiminases in citrullination, gene regulation, health and pathogenesis. Biochim Biophys Acta - Gene Regul Mech. 2013;1829:1126–1135. doi: 10.1016/j.bbagrm.2013.07.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Fuhrmann J., Clancy K.W., Thompson P.R. Chemical Biology of Protein Arginine Modifications in Epigenetic Regulation. Chem. Rev. 2015;115:5413–5461. doi: 10.1021/acs.chemrev.5b00003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Christophorou M.A., Sharma P., Zhang X., Christophorou M.A. 2023. Citrullination : New Tricks for an Old Mod; pp. 4–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Christophorou M.A. The virtues and vices of protein citrullination. R. Soc. Open Sci. 2022;9:220125–220195. doi: 10.1098/rsos.220125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Vossenaar E.R., Zendman A.J.W., Van Venrooij W.J., Pruijn G.J.M. PAD, a growing family of citrullinating enzymes: Genes, features and involvement in disease. Bioessays. 2003;25:1106–1118. doi: 10.1002/bies.10357. [DOI] [PubMed] [Google Scholar]
- 7.Tanikawa C., Ueda K., Suzuki A., Iida A., Nakamura R., Atsuta N., Tohnai G., Sobue G., Saichi N., Momozawa Y., et al. Citrullination of RGG Motifs in FET Proteins by PAD4 Regulates Protein Aggregation and ALS Susceptibility. Cell Rep. 2018;22:1473–1483. doi: 10.1016/j.celrep.2018.01.031. [DOI] [PubMed] [Google Scholar]
- 8.Christophorou M.A., Castelo-Branco G., Halley-Stott R.P., Oliveira C.S., Loos R., Radzisheuskaya A., Mowen K.A., Bertone P., Silva J.C.R., Zernicka-Goetz M., et al. Citrullination regulates pluripotency and histone H1 binding to chromatin. Nature. 2014;507:104–108. doi: 10.1038/nature12942. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Sharma P., Azebi S., England P., Christensen T., Møller-Larsen A., Petersen T., Batsché E., Muchardt C. Citrullination of Histone H3 Interferes with HP1-Mediated Transcriptional Repression. PLoS Genet. 2012;8:10029344–e1003015. doi: 10.1371/journal.pgen.1002934. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Chang X., Xia Y., Pan J., Meng Q., Zhao Y., Yan X. PADI2 is significantly associated with rheumatoid arthritis. PLoS One. 2013;8 doi: 10.1371/journal.pone.0081259. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Ishigami A., Ohsawa T., Hiratsuka M., Taguchi H., Kobayashi S., Saito Y., Murayama S., Asaga H., Toda T., Kimura N., Maruyama N. Abnormal accumulation of citrullinated proteins catalyzed by peptidylarginine deiminase in hippocampal extracts from patients with Alzheimer’s disease. J. Neurosci. Res. 2005;80:120–128. doi: 10.1002/jnr.20431. [DOI] [PubMed] [Google Scholar]
- 12.Arif M., Kato T. Increased expression of PAD2 after repeated intracerebroventricular infusions of soluble Aβ25-35 in the Alzheimer’s disease model rat brain: Effect of memantine. Cell. Mol. Biol. Lett. 2009;14:703–714. doi: 10.2478/s11658-009-0029-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Cherrington B.D., Zhang X., Mcelwee J.L., Morency E., Anguish L.J., Coonrod S.A. Potential Role for PAD2 in Gene Regulation in Breast Cancer Cells. PLoS One. 2012;7 doi: 10.1371/journal.pone.0041242. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Mohanan S., Cherrington B.D., Horibata S., Mcelwee J.L., Thompson P.R., Coonrod S.A. Potential Role of Peptidylarginine Deiminase Enzymes and Protein Citrullination in Cancer Pathogenesis. Biochem. Res. Int. 2012;2012 doi: 10.1155/2012/895343. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Guo W., Zheng Y., Xu B., Ma F., Li C., Zhang X., Wang Y., Chang X. Investigating the expression , effect and tumorigenic pathway of PADI2 in tumors. OncoTargets Ther. 2017;10:1475–1485. doi: 10.2147/OTT.S92389. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Wang L., Song G., Zhang X., Feng T., Pan J., Chen W., Yang M., Bai X., Pang Y., Yu J., et al. PADI2-mediated citrullination promotes prostate cancer progression. Cancer Res. 2017;77:5755–5768. doi: 10.1158/0008-5472.CAN-17-0150. [DOI] [PubMed] [Google Scholar]
- 17.Horibata S., Rogers K.E., Sadegh D., Anguish L.J., Mcelwee J.L., Shah P., Thompson P.R., Coonrod S.A. Role of peptidylarginine deiminase 2 ( PAD2 ) in mammary carcinoma cell migration. BMC Cancer. 2017;17:378. doi: 10.1186/s12885-017-3354-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Song S., Yu Y. Progression on citrullination of proteins in gastrointestinal cancers. Front. Oncol. 2019;9:15–16. doi: 10.3389/fonc.2019.00015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Yuzhalin A.E. Citrullination in cancer. Cancer Res. 2019;79:1274–1284. doi: 10.1158/0008-5472.CAN-18-2797. [DOI] [PubMed] [Google Scholar]
- 20.Beato M., Sharma P. Peptidyl arginine deiminase 2 (PADI2)-mediated arginine citrullination modulates transcription in cancer. Int. J. Mol. Sci. 2020;21:1351–1416. doi: 10.3390/ijms21041351. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Gao B.S., Rong C.S., Xu H.M., Sun T., Hou J., Xu Y. Peptidyl Arginine Deiminase, Type II (PADI2) Is Involved in Urothelial Bladder Cancer. Pathol. Oncol. Res. 2020;26:1279–1285. doi: 10.1007/s12253-019-00687-0. [DOI] [PubMed] [Google Scholar]
- 22.Xue T., Liu X., Zhang M., Qiukai E., Liu S., Zou M., Li Y., Ma Z., Han Y., Thompson P., Zhang X. PADI2-Catalyzed MEK1 Citrullination Activates ERK1/2 and Promotes IGF2BP1-Mediated SOX2 mRNA Stability in Endometrial Cancer. Adv. Sci. 2021;8:2002831–2002917. doi: 10.1002/advs.202002831. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Darrah E. Peptidylarginine deiminase 2, 3 and 4 have distinct specificities against cellular substrates: Novel insights into autoantigen selection in rheumatoid arthritis Erika. Ann. Rheum. Dis. 2012;71:92–98. doi: 10.1136/ard.2011.151712.Peptidylarginine. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Dreyton C.J., Knuckley B., Jones J.E., Lewallen D.M., Thompson P.R. Mechanistic studies of protein arginine deiminase 2: Evidence for a substrate-assisted mechanism. Biochemistry. 2014;53:4426–4433. doi: 10.1021/bi500554b. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Slade D.J., Fang P., Dreyton C.J., Zhang Y., Fuhrmann J., Rempel D., Bax B.D., Coonrod S.A., Lewis H.D., Guo M., et al. Protein arginine deiminase 2 binds calcium in an ordered fashion: Implications for inhibitor design. ACS Chem. Biol. 2015;10:1043–1053. doi: 10.1021/cb500933j. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Zhang X., Bolt M., Guertin M.J., Chen W., Zhang S., Cherrington B.D., Slade D.J., Dreyton C.J., Subramanian V., Bicker K.L., et al. Peptidylarginine deiminase 2-catalyzed histone H3 arginine 26 citrullination facilitates estrogen receptor α target gene activation. Proc. Natl. Acad. Sci. USA. 2012;109:13331–13336. doi: 10.1073/pnas.1203280109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Guertin M.J., Zhang X., Anguish L., Kim S., Varticovski L., Lis J.T., Hager G.L., Coonrod S.A. Targeted H3R26 Deimination Specifically Facilitates Estrogen Receptor Binding by Modifying Nucleosome Structure. PLoS Genet. 2014;10:10046133–e1004712. doi: 10.1371/journal.pgen.1004613. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Xiao S., Lu J., Sridhar B., Cao X., Yu P., Zhao T., Chen C.C., McDee D., Sloofman L., Wang Y., et al. SMARCAD1 Contributes to the Regulation of Naive Pluripotency by Interacting with Histone Citrullination. Cell Rep. 2017;18:3117–3128. doi: 10.1016/j.celrep.2017.02.070. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Falcão A.M., Meijer M., Scaglione A., Rinwa P., Agirre E., Liang J., Larsen S.C., Heskol A., Frawley R., Klingener M., et al. PAD2-Mediated Citrullination Contributes to Efficient Oligodendrocyte Differentiation and Myelination. Cell Rep. 2019;27:1090–1102.e10. doi: 10.1016/j.celrep.2019.03.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Sharma P., Lioutas A., Fernandez-Fuentes N., Quilez J., Carbonell-Caballero J., Wright R.H.G., Di Vona C., Le Dily F., Schüller R., Eick D., et al. Arginine Citrullination at the C-Terminal Domain Controls RNA Polymerase II Transcription. Mol. Cell. 2019;73:84–96.e7. doi: 10.1016/j.molcel.2018.10.016. [DOI] [PubMed] [Google Scholar]
- 31.Cummings T.F.M., Gori K., Sanchez-Pulido L., Gavriilidis G., Moi D., Wilson A.R., Murchison E., Dessimoz C., Ponting C.P., Christophorou M.A. Citrullination Was Introduced into Animals by Horizontal Gene Transfer from Cyanobacteria. Mol. Biol. Evol. 2022;39 doi: 10.1093/molbev/msab317. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Crisp A., Boschetti C., Perry M., Tunnacliffe A., Micklem G. Expression of multiple horizontally acquired genes is a hallmark of both vertebrate and invertebrate genomes. Genome Biol. 2015;16:50. doi: 10.1186/s13059-015-0607-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Emamalipour M., Seidi K., Zununi Vahed S., Jahanban-Esfahlan A., Jaymand M., Majdi H., Amoozgar Z., Chitkushev L.T., Javaheri T., Jahanban-Esfahlan R., Zare P. Horizontal Gene Transfer: From Evolutionary Flexibility to Disease Progression. Front. Cell Dev. Biol. 2020;8 doi: 10.3389/fcell.2020.00229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Soucy S.M., Huang J., Gogarten J.P. Horizontal gene transfer: Building the web of life. Nat. Rev. Genet. 2015;16:472–482. doi: 10.1038/nrg3962. [DOI] [PubMed] [Google Scholar]
- 35.Husnik F., McCutcheon J.P. Functional horizontal gene transfer from bacteria to eukaryotes. Nat. Rev. Microbiol. 2018;16:67–79. doi: 10.1038/nrmicro.2017.137. [DOI] [PubMed] [Google Scholar]
- 36.Goulas T., Mizgalska D., Garcia-Ferrer I., Kantyka T., Guevara T., Szmigielski B., Sroka A., Millán C., Usón I., Veillard F., et al. Structure and mechanism of a bacterial host-protein citrullinating virulence factor, Porphyromonas gingivalis peptidylarginine deiminase. Sci. Rep. 2015;5:11969–12017. doi: 10.1038/srep11969. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Touz M.C., Rópolo A.S., Rivero M.R., Vranych C.V., Conrad J.T., Svard S.G., Nash T.E. Arginine deiminase has multiple regulatory roles in the biology of Giardia lamblia. J. Cell Sci. 2008;121:2930–2938. doi: 10.1242/jcs.026963. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Mistry J., Chuguransky S., Williams L., Qureshi M., Salazar G.A., Sonnhammer E.L.L., Tosatto S.C.E., Paladin L., Raj S., Richardson L.J., et al. Pfam: The protein families database in 2021. Nucleic Acids Res. 2021;49:D412–D419. doi: 10.1093/nar/gkaa913. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Asaga H., Ishigami A. Protein deimination in the rat brain after kainate administration: Citrulline-containing proteins as a novel marker of neurodegeneration. Neurosci. Lett. 2001;299:5–8. doi: 10.1016/S0304-3940(00)01735-3. [DOI] [PubMed] [Google Scholar]
- 40.Rogers G., Winter B., McLaughlan C., Powell B., Nesci T. Peptidylarginine deiminase of the hair follicle: Characterization, localization, and function in keratinizing tissues. J. Invest. Dermatol. 1997;108:700–707. doi: 10.1111/1523-1747.ep12292083. [DOI] [PubMed] [Google Scholar]
- 41.Rus’d A.A., Ikejiri Y., Ono H., Yonekawa T., Shiraiwa M., Kawada A., Takahara H. Molecular cloning of cDNAs of mouse peptidylarginine deiminase type I, type III and type IV, and the expression pattern of type I in mouse. Eur. J. Biochem. 1999;259:660–669. doi: 10.1046/j.1432-1327.1999.00083.x. [DOI] [PubMed] [Google Scholar]
- 42.Beltrao P., Bork P., Krogan N.J., Van Noort V. Evolution and functional cross-talk of protein post-translational modifications. Mol. Syst. Biol. 2013;9:714–813. doi: 10.1002/msb.201304521. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Kumar S., Stecher G., Suleski M., Hedges S.B. TimeTree: A Resource for Timelines, Timetrees, and Divergence Times. Mol. Biol. Evol. 2017;34:1812–1819. doi: 10.1093/molbev/msx116. [DOI] [PubMed] [Google Scholar]
- 44.Corden J.L. An Arginine Nexus in the RNA Polymerase II CTD. Mol. Cell. 2019;73:3–4. doi: 10.1016/j.molcel.2018.12.013. [DOI] [PubMed] [Google Scholar]
- 45.Marshall N.F., Price D.H. Purification of P-TEFb, a transcription factor required for the transition into productive elongation. J. Biol. Chem. 1995;270:12335–12338. doi: 10.1074/jbc.270.21.12335. [DOI] [PubMed] [Google Scholar]
- 46.Adelman K., Lis J.T. Promoter-proximal pausing of RNA polymerase II: emerging roles in metazoans. Nat. Rev. Genet. 2012;13:720–731. doi: 10.1038/nrg3293. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Jonkers I., Lis J.T. Getting up to speed with transcription elongation by RNA polymerase II. Nat. Rev. Mol. Cell Biol. 2015;16:167–177. doi: 10.1038/nrm3953. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Core L., Adelman K. Promoter-proximal pausing of RNA polymerase II: A nexus of gene regulation. Genes Dev. 2019;33:960–982. doi: 10.1101/gad.325142.119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Chen F.X., Smith E.R., Shilatifard A. Born to run: Control of transcription elongation by RNA polymerase II. Nat. Rev. Mol. Cell Biol. 2018;19:464–478. doi: 10.1038/s41580-018-0010-5. [DOI] [PubMed] [Google Scholar]
- 50.Martin R.D., Hébert T.E., Tanny J.C. Therapeutic targeting of the general RNA polymerase II transcription machinery. Int. J. Mol. Sci. 2020;21 doi: 10.3390/ijms21093354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Muniz L., Nicolas E., Trouche D. RNA polymerase II speed: a key player in controlling and adapting transcriptome composition. EMBO J. 2021;40:1057400–e105821. doi: 10.15252/embj.2020105740. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Altenhoff A.M., Glover N.M., Train C.M., Kaleb K., Warwick Vesztrocy A., Dylus D., de Farias T.M., Zile K., Stevenson C., Long J., et al. The OMA orthology database in 2018: Retrieving evolutionary relationships among all domains of life through richer web and programmatic interfaces. Nucleic Acids Res. 2018;46:D477–D485. doi: 10.1093/nar/gkx1019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Yang Z., Bielawski J. Statistical methods for detecting molecular adaptation. Trends Ecol. Evol. 2000;15:496–503. doi: 10.1016/S0169-5347(00)01994-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Zhang J., Nielsen R., Yang Z. Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Mol. Biol. Evol. 2005;22:2472–2479. doi: 10.1093/molbev/msi237. [DOI] [PubMed] [Google Scholar]
- 55.Kosiol C., Vinař T., Da Fonseca R.R., Hubisz M.J., Bustamante C.D., Nielsen R., Siepel A. Patterns of positive selection in six mammalian genomes. PLoS Genet. 2008;4 doi: 10.1371/journal.pgen.1000144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Segura J., Jones P.F., Fernandez-Fuentes N. A holistic in silico approach to predict functional sites in protein structures. Bioinformatics. 2012;28:1845–1850. doi: 10.1093/bioinformatics/bts269. [DOI] [PubMed] [Google Scholar]
- 57.Jones S., Heyningen P.V., Berman H.M., Thornton J.M. 1999. Arcturus_Summer_Hba. [Google Scholar]
- 58.Segura J., Oliva B., Fernandez-Fuentes N. CAPS-DB: A structural classification of helix-capping motifs. Nucleic Acids Res. 2012;40:479–485. doi: 10.1093/nar/gkr879. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Baumli S., Lolli G., Lowe E.D., Troiani S., Rusconi L., Bullock A.N., Debreczeni J.E., Knapp S., Johnson L.N. The structure of P-TEFb (CDK9/cyclin T1), its complex with flavopiridol and regulation by phosphorylation. EMBO J. 2008;27:1907–1918. doi: 10.1038/emboj.2008.121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Evans R., O’Neill M., Pritzel A., Antropova N., Senior A.G.T., Žídek A., Bates R., Blackwell S., Yim J., Ronneberger O.B.S., et al. Protein complex prediction with AlphaFold-Multimer. bioRxiv. 2022 doi: 10.1101/2021.10.04.463034. Preprint at. [DOI] [Google Scholar]
- 61.Jumper J., Evans R., Pritzel A., Green T., Figurnov M., Ronneberger O., Tunyasuvunakool K., Bates R., Žídek A., Potapenko A., et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596:583–589. doi: 10.1038/s41586-021-03819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Lin F.R., Niparko J.K., Ferrucci L. 基因的改变NIH Public Access. Bone. 2014;23:1–7. doi: 10.1146/annurev-biochem-052610-095910.RNA. [DOI] [Google Scholar]
- 63.Kohoutek J. P-TEFb- The final frontier. Cell Div. 2009;4:19. doi: 10.1186/1747-1028-4-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Manuscript A. multi-tasking P-TEFb. 2009;20:334–340. doi: 10.1016/j.ceb.2008.04.008. [DOI] [Google Scholar]
- 65.Gabay M., Li Y., Felsher D.W. MYC activation is a hallmark of cancer initiation and maintenance. Cold Spring Harb. Perspect. Med. 2014;4:a014241. doi: 10.1101/cshperspect.a014241. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Mollaoglu G., Guthrie M.R., Böhm S., Brägelmann J., Can I., Ballieu P.M., Marx A., George J., Heinen C., Chalishazar M.D., et al. MYC Drives Progression of Small Cell Lung Cancer to a Variant Neuroendocrine Subtype with Vulnerability to Aurora Kinase Inhibition. Cancer Cell. 2017;31:270–285. doi: 10.1016/j.ccell.2016.12.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Qiu X., Boufaied N., Hallal T., Feit A., de Polo A., Luoma A.M., Alahmadi W., Larocque J., Zadra G., Xie Y., et al. MYC drives aggressive prostate cancer by disrupting transcriptional pause release at androgen receptor targets. Nat. Commun. 2022;13:2559. doi: 10.1038/s41467-022-30257-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Cui Z., Xiao L., Chen F., Wang J., Lin H., Li D., Wu Z. High mRNA Expression of CENPL and Its Significance in Prognosis of Hepatocellular Carcinoma Patients. Dis. Markers. 2021;2021 doi: 10.1155/2021/9971799. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Zhang H., Zhang X., Li X., Meng W.B., Bai Z.T., Rui S.Z., Wang Z.F., Zhou W.C., Jin X.D. Effect of CCNB1 silencing on cell cycle, senescence, and apoptosis through the p53 signaling pathway in pancreatic cancer. J. Cell. Physiol. 2018;234:619–631. doi: 10.1002/jcp.26816. [DOI] [PubMed] [Google Scholar]
- 70.Chen E.B., Qin X., Peng K., Li Q., Tang C., Wei Y.C., Yu S., Gan L., Liu T.S. HnRNPR-CCNB1/CENPF axis contributes to gastric cancer proliferation and metastasis. Aging (Albany NY) 2019;11:7473–7491. doi: 10.18632/aging.102254. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Liu C., Kudo T., Ye X., Gascoigne K. Cell-to-cell variability in Myc dynamics drives transcriptional heterogeneity in cancer cells. Cell Rep. 2023;42 doi: 10.1016/j.celrep.2023.112401. [DOI] [PubMed] [Google Scholar]
- 72.Jones K.E., Bielby J., Cardillo M., Fritz S.A., O’Dell J., Orme C.D.L., Safi K., Sechrest W., Boakes E.H., Carbone C., et al. PanTHERIA: a species-level database of life history, ecology, and geography of extant and recently extinct mammals. Ecology. 2009;90:2648. doi: 10.1890/08-1494.1. [DOI] [Google Scholar]
- 73.Subramanian A., Tamayo P., Mootha V.K., Mukherjee S., Ebert B.L., Gillette M.A., Paulovich A., Pomeroy S.L., Golub T.R., Lander E.S., Mesirov J.P. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA. 2005;102:15545–15550. doi: 10.1073/pnas.0506580102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Veidenberg A., Medlar A., Löytynoja A. Wasabi: An integrated platform for evolutionary sequence analysis and data visualization. Mol. Biol. Evol. 2016;33:1126–1130. doi: 10.1093/molbev/msv333. [DOI] [PubMed] [Google Scholar]
- 75.Katoh K., Misawa K., Kuma K.I., Miyata T. MAFFT: A novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002;30:3059–3066. doi: 10.1093/nar/gkf436. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Larsson A. AliView: A fast and lightweight alignment viewer and editor for large datasets. Bioinformatics. 2014;30:3276–3278. doi: 10.1093/bioinformatics/btu531. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Capella-Gutiérrez S., Silla-Martínez J.M., Gabaldón T. trimAl: A tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009;25:1972–1973. doi: 10.1093/bioinformatics/btp348. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Suyama M., Torrents D., Bork P. PAL2NAL: Robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 2006;34:609–612. doi: 10.1093/nar/gkl315. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Stamatakis A. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30:1312–1313. doi: 10.1093/bioinformatics/btu033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Huerta-Cepas J., Serra F., Bork P. ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data. Mol. Biol. Evol. 2016;33:1635–1638. doi: 10.1093/molbev/msw046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Pronk S., Páll S., Schulz R., Larsson P., Bjelkmar P., Apostolov R., Shirts M.R., Smith J.C., Kasson P.M., van der Spoel D., et al. GROMACS 4.5: A high-throughput and highly parallel open source molecular simulation toolkit. Bioinformatics. 2013;29:845–854. doi: 10.1093/bioinformatics/btt055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Segura J., Marín-López M.A., Jones P.F., Oliva B., Fernandez-Fuentes N. VORFFIP-driven dock: V-D2OCK, a fast, accurate protein docking strategy. PLoS One. 2015;10:1–12. doi: 10.1371/journal.pone.0118107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Gibson D.G., Young L., Chuang R.Y., Venter J.C., Hutchison C.A., Smith H.O. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat. Methods. 2009;6:343–345. doi: 10.1038/nmeth.1318. [DOI] [PubMed] [Google Scholar]
- 84.Vicent G.P., Zaurin R., Nacht A.S., Li A., Font-Mateu J., Le Dily F., Vermeulen M., Mann M., Beato M. Two chromatin remodeling activities cooperate during activation of hormone responsive promoters. PLoS Genet. 2009;5 doi: 10.1371/journal.pgen.1000567. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Daly M.J., Patterson N., Mesirov J.P., Golub T.R., Tamayo P., Spiegelman B. PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat. Genet. 2003;34:267–273. doi: 10.1038/ng1180. [DOI] [PubMed] [Google Scholar]
- 86.Shannon P., Markiel A., Owen O., Baliga N.S., Jonathan T., Wang D.R., Amin N., Schwikowski B., Cytoscape T.I. A Software Environment for Integrated Models. Genome Res. 2003;13:426. doi: 10.1101/gr.1239303.metabolite. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Villanueva-Cañas J.L., Ruiz-Orera J., Agea M.I., Gallo M., Andreu D., Albà M.M. New genes and functional innovation in mammals. Genome Biol Evol. 2017;9:1886–1900. doi: 10.1093/gbe/evx136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Villanueva-Cañas J.L., Laurie S., Albà M.M. Improving genome-wide scans of positive selection by using protein isoforms of similar length. Genome Biol. Evol. 2013;5:457–467. doi: 10.1093/gbe/evt017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Yang Z. PAML 4: Phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 2007;24:1586–1591. doi: 10.1093/molbev/msm088. [DOI] [PubMed] [Google Scholar]
- 90.Feliu E., Aloy P., Oliva B. On the analysis of protein-protein interactions via knowledge-based potentials for the prediction of protein-protein docking. Protein Sci. 2011;20:529–541. doi: 10.1002/pro.585. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Pei J., Grishin N.V. AL2CO: Calculation of positional conservation in a protein sequence alignment. Bioinformatics. 2001;17:700–712. doi: 10.1093/bioinformatics/17.8.700. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
-
•
Single-cell RNA sequencing data performed and used in this study have been deposited at GEO under the accession code GSE246420 and publically available.
-
•
No original code was generated in this paper.
-
•
Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.