Skip to main content
eLife logoLink to eLife
. 2025 Sep 10;14:RP103608. doi: 10.7554/eLife.103608

Regulatory networks of KRAB zinc finger genes and transposable elements changed during human brain evolution and disease

Yao-Chung Chen 1,2,, Arnaud Maupas 3, Katja Nowick 1,2,
Editors: Detlef Weigel4, Detlef Weigel5
PMCID: PMC12422733  PMID: 40928843

Abstract

Evidence indicates that transposable elements (TEs) can contribute to the evolution of new traits, with some TEs acting as deleterious elements while others are repurposed for beneficial roles in evolution. In mammals, some KRAB-ZNF proteins can serve as a key defense mechanism to repress TEs, offering genomic protection. Notably, the family of KRAB-ZNF genes evolves rapidly and exhibits diverse expression patterns in primate brains, where some TEs, including autonomous LINE-1 and non-autonomous Alu and SVA elements, remain mobile. This prompts questions about their interactions in primate brains and potential roles in human brain evolution and disease. For a systematic comparative analysis of TE interactions with other genes, we developed the tool TEKRABber and focused on strong and experimentally validated cases. Our bipartite network analysis revealed significantly more interactions between KRAB-ZNF genes and TEs in humans than in other primates, especially with recently evolved, i.e., Simiiformes-specific, TEs. Notably, ZNF528, under positive selection in humans, shows numerous human-specific TE interactions. Most negative interactions in our network, indicative of repression by KRAB-ZNF proteins, entail Alu TEs, while links to other TEs are generally positive. In Alzheimer’s patients, a subnetwork involving 21 interactions with an Alu module appears diminished or lost. Our findings suggest that KRAB-ZNF and TE interactions vary across TE families, have increased throughout human evolution, and may influence susceptibility to Alzheimer’s disease.

Research organism: Human, Rhesus macaque, Other

Introduction

Transposable elements (TEs) are repetitive DNA sequences capable of migrating and inserting into new locations within the host genome. When transposed, TEs may induce phenotypic changes in the organism (Bourque et al., 2018; Schrader and Schmitz, 2019). One well-known textbook example is the development of an industrial melanization mutant phenotype in the British peppered moth, Biston betularia, which has been linked to increased expression of the cort transcript due to a TE insertion in its first intron (van Hof et al., 2016). Another classic example for primate evolution is the retrovirus insertion in the salivary amylase gene, AMY1C, with this sequence being essential for primates to evolve as one of the few mammals capable of expressing amylase in saliva (Ting et al., 1992). A further example is the evolutionary loss of tails in humans and apes, which is associated with the insertion of an Alu retrotransposon into the intronic region of the TBXT gene. This insertion creates an alternative splice site with an ancestral Alu retrotransposon in the sequence, leading to the excision of the middle exon and the formation of an isoform (Xia et al., 2024).

TEs comprise about half or even more of the mammalian genome (Platt et al., 2018; Qu et al., 2023). Their positions in the genome can be annotated with tools like RepeatMasker (Lawson et al., 2023). TE insertions can be deleterious, such as the insertion of an L1 element initiating colorectal cancer in humans (Scott et al., 2016). Therefore, a variety of host factors are employed to regulate TE expression, such as small RNAs, chromatin and DNA modification pathways, and KRAB zinc finger (KRAB-ZNF) proteins (Colonna Romano and Fanti, 2022). KRAB-ZNF proteins are the largest family of transcription factors in higher vertebrates (Huntley et al., 2006; Ecco et al., 2017), characterized by fast evolution and contributing to gene expression differences between primates (Nowick et al., 2009; Nowick et al., 2013). KRAB-ZNF proteins bind to the interspersed DNA sequences of TEs and repress their expression upon recruiting the cofactor KAP1 (Groner et al., 2010). Fast evolution was also reported for TEs, which led to hypothesizing an evolutionary arms-race model, in which mutated TEs have the chance to escape the repression from KRAB-ZNF proteins until the KRAB-ZNF proteins evolve again with a suitable recognition ability (Jacobs et al., 2014; Imbeault et al., 2017). For instance, in primates, coevolution between a family of TEs, retroelements in endogenous retroviruses (ERVs), and the tandem repeats in KRAB-ZNF genes has been observed (Thomas and Schneider, 2011).

Studying the expression of KRAB-ZNF genes and TEs in the evolution of the human brain has crucial relevance, since some TEs are actively transcribed in the developing human brain (Bodea et al., 2018), including L1 subfamilies, HERV-K subfamilies, and primate-specific Alu subfamilies (Hancks and Kazazian, 2010; Larsen et al., 2018; Dembny et al., 2020). Furthermore, TE-derived promoters play a role in gene transcription within the human brain, which in turn suggests the significance of TEs in gene regulation during neurodevelopment (Playfoot et al., 2021). Interestingly, a substantial number of differentially expressed primate-specific KRAB-ZNF genes were detected in the adult prefrontal cortex, comparing humans to chimpanzees (Nowick et al., 2009), and newly evolved KRAB-ZNF genes were predominantly detected in the developing human brain (Zhang et al., 2011). Therefore, analyzing the interplay between TEs and KRAB-ZNF genes in the context of primate brain evolution could provide valuable insights into the complex mechanisms shaping human brain development and function.

The dysregulation of TE expression has also been linked to neurodegenerative diseases, including Alzheimer’s disease (AD), the primary cause of dementia (Ravel-Godreuil et al., 2021). One of the biomarkers of AD, the Tau protein, appears to induce the expression of at least some TEs (Guo et al., 2018), and the presence of TE products in the cytosol and endosomes induces neuroinflammation in AD patients (Evering et al., 2023). Previous studies have suggested that cognitive skills are related to zinc finger genes and proteins. For instance, patients carrying the schizophrenia-risk allele of ZNF804A showed differences in their reading and spelling performance (Becker et al., 2012). Additionally, the single nucleotide polymorphism of a KRAB-ZNF gene, ZNF224, is associated with AD neuropathology and cognitive functions (Shulman et al., 2010).

Taken together, the study of the regulatory networks involving KRAB-ZNF genes and TEs is expected to provide insights into the evolution of the human brain and the development of neurodegenerative diseases. Here, we present results from two independent RNA-seq datasets: one comparing different brain regions between humans and multiple nonhuman primates (NHPs) (Primate Brain Data) and the other comparing human control samples with AD samples (Mayo Data). These datasets collectively encompass a total of 514 samples from different species, brain regions, and disease states. Since it is very challenging to quantify and normalize expression of TEs for across-species comparison, we developed the TEKRABber R package for the systematic and comparative analysis of TE subfamilies (abbreviated as TEs in the following content). TEKRABber can further be used to explore expression correlations between TEs and any genes in any species of interest. Here, we used it to explore correlations with KRAB-ZNF genes in primates. Our work reveals an intricate network of TEs and KRAB-ZNF genes in the brain that changed during evolution and is modified in AD brains.

Results

TEKRABber: a software for cross-species comparative analysis of orthologs, TEs, and their co-expression

While computational tools for the analysis of TE expression in samples of a given species have already been developed (Table 1), to our knowledge, no tool exists yet that can compare TE expression across species, hampering the investigation of the impact of TEs on species evolution. Whereas the expression of orthologous genes can be compared between closely related species relatively easily, this task is more challenging for TEs, because the same TEs can be located in different regions on chromosomes and have different sequence lengths. Additionally, differences in the copy number of TEs between species can further complicate these comparisons. To gain functional evolutionary insights, it is further desired to estimate pairwise correlations between TEs and genes, which can be used to derive and compare regulatory networks involving TEs across species. We are aware of one method, TEffectR (Karakülah et al., 2019), that by providing mapped BAM files and specific locus regions, can be used for calculating TE and gene expression. Using a linear regression model with TE expression values, it subsequently predicts the impact of the TE on proximal gene expression in one species. However, it is not designed for contrasting correlations between pairwise orthologous genes and TEs across species directly from RNA-seq expression data. To better enable evolutionary studies on TEs, we developed an R Bioconductor software called TEKRABber (DOI: 10.18129/B9.bioc.TEKRABber). As a first use case for this new software, we investigated the interplay between TEs and KRAB-ZNF proteins, which can repress TE expression; hence the name TEKRABber. In a broader scope, TEKRABber addresses two primary challenges: comparing TE expression across species and efficiently calculating pairwise correlations between selected orthologous genes and TEs. With these features, it provides functionality not yet implemented in other tools (Table 1). It can also be used for exploring correlations of TEs with any other genes in any other species with genomes with TE annotations.

Table 1. Comparison of transposable element (TE) expression analysis software.

Software name Description Comparison feature References
RepEnrich Combines different mapping strategies for differentially expressed TE analysis using RNA-seq and ChIP-seq data Different conditions
(same species)
Criscione et al., 2014
TETools Compares TE expression from RNA-seq data Different conditions
(same species)
Lerat et al., 2016
Telescope Estimates TEs in specific genomic locations using RNA-seq data One condition in one species Bendall et al., 2019
TE Density Provides a metric showing the presence of TEs relative to genes within flexible genomic distance One condition in one species Teresi et al., 2022
PlanTEenrichment Calculates TE enrichment upon inputting a differentially expressed gene list and selection of a specific plant species Different conditions
(same species)
Eskier et al., 2023
GeneTEFlow A nextflow pipeline for analyzing differential expression of genes and TEs Different conditions
(same species)
Liu et al., 2020
TEffectR Estimates the proximal TE effects on gene expression using a linear regression model Different conditions
(same species)
Karakülah et al., 2019
TEKRABber Computes differentially expressed genes/TEs and one-to-one correlations using RNA-seq data Different conditions
(same species)
Across species comparison (different species)
Method presented here

TEKRABber is designed to handle various types of transcriptomic read counts and offers two distinct modes of analysis (Figure 1A). In the first mode, tailored for interspecies comparison, we utilized the Primate Brain Data as a demonstration. Initially, it retrieved annotations from Ensembl (Harrison et al., 2024) for orthologs and from RepeatMasker (Smit et al., 2013) for TEs to estimate normalizing factors, ensuring comparable expression levels between species. This approach minimizes the likelihood that differences in fold change for differential expression (DE) are caused by TE or gene length variations. It also guarantees that only orthologs and TEs with high orthology confidence are included in the comparison, avoiding bias toward any particular species (Figure 1—figure supplement 1). Subsequently, users can employ the output data object to conduct DE analysis and identify one-to-one correlations based on selected parameters. We demonstrate that the impact of scaling is most pronounced for comparisons between the most distantly related species in our study, with about 30% of TEs being detected as DE or not between humans and rhesus macaques, depending on whether the expression data were scaled or not for the comparison (Figure 1—figure supplement 1). The second mode is designed for comparing different conditions, such as control and disease states within the same species. In this scenario, we used the Mayo Data as an example. Users can bypass the interspecies normalization steps and directly generate data objects for DE and correlation analyses. Notably, TEKRABber (from version 1.8, Bioc3.19) includes a parallel computing option, significantly enhancing computational efficiency based on the number of cores a device can provide. Furthermore, TEKRABber offers an interactive user interface, providing users with an initial overview of their results before delving into the details (Figure 1B).

Figure 1. TEKRABber and the overview of the analysis workflow.

(A) Two independent RNA-seq datasets, Primate Brain Data and Mayo Data, were analyzed in this study. (1) Transcriptomic data were first preprocessed by removing adapters and low-quality reads and then mapped to their reference genome using STAR to generate BAM files. (2) TEtranscript was used to quantify the expression of genes and transposable elements (TEs). (3) Expression profiles were normalized across different species. (4) Differential expression (DE) analysis and pairwise correlations were calculated. Steps (3) and (4) were developed together in an R Bioconductor package, TEKRABber. (B) The user interface of TEKRABber features a dashboard layout that allows users to explore one-to-one gene-TE interactions, including correlation and differential expression results (more details in Materials and methods section).

Figure 1.

Figure 1—figure supplement 1. Comparison of differentially expressed (DE) transposable elements (TEs) with and without scaling.

Figure 1—figure supplement 1.

For across species comparison between humans and the NHPs indicated on the x-axis, using expression data from the primary and secondary cortices as an example. The y-axis shows the percentage of TEs that were only called using scaled or non-scaled data (the remainder needed to add up to 100% is the overlap of DE TEs between both methods). To be called DE, the TE needed to show an absolute log2foldchange larger than 1.5 and adjusted p-value < 0.05. The impact of scaling is the highest for the most distantly related species, the rhesus macaque, where more than 30% of TEs changed in assignment to being DE or not depending on the applied scaling.
Figure 1—figure supplement 2. Graphical abstract of the analysis.

Figure 1—figure supplement 2.

In our study, we used TEKRABber to explore the putative functional connections between KRAB-ZNFs and TEs in the context of human brain evolution and AD. We conducted an analysis of KRAB-ZNF genes and TEs expression patterns and networks using two independent RNA-seq datasets: Primate Brain Data and Mayo Data (Figure 1A, Supplementary file 1, tables S1–S3). The Primate Brain Data contains genome-wide expression information from 33 brain regions classified into seven groups (Khrameeva et al., 2020) of four primate species, while the Mayo Data includes data from two brain regions (temporal cortex and cerebellum) of AD patients and controls. In brief, expression of all genes and TEs was quantified and normalized across samples of each dataset, and subsequently DE and correlations between KRAB-ZNFs and TEs were calculated using TEKRABber (Figure 1A, Figure 1—figure supplement 2).

Dynamics of expression across species and brain regions: strong species differences especially of evolutionary young KRAB-ZNF genes and TEs

We obtained normalized expression values of KRAB-ZNF genes and TEs and assessed their variance across different species using t-SNE clustering (Figure 2A). The data labeled by species revealed distinct boundaries of variance, demonstrating clustering based on species. Specifically, humans and macaques formed their own clusters, while chimpanzees and bonobos were grouped in the same cluster, which agrees with phylogenetic distances among species. This finding indicated that clear differences in KRAB-ZNF genes and TE expression can be detected across species. For example, there were 12 upregulated and 42 downregulated KRAB-ZNF genes, along with 31 upregulated and 22 downregulated TEs in humans compared to chimpanzees in the primary and secondary cortices (Figure 2B; see Supplementary file 1, table S2 for information on Brodmann areas included in the group ‘primary and secondary cortices’). In contrast to the clear species differences, expression patterns of KRAB-ZNFs and TEs differed less across brain regions within the same species.

Figure 2. Expression of KRAB-ZNF genes and transposable elements (TEs) in Primate Brain Data.

(A) t-SNE plots of the expression of KRAB-ZNF genes and TEs from all 422 samples, including human, chimpanzee, bonobos, and macaques labeled by species and different brain regions. (B) Differentially expressed KRAB-ZNF genes and TEs comparing human and chimpanzee in primary and secondary cortices. (C) Species tree with the inferred numbers of TEs and KRAB-ZNFs that have evolved per branch. (Note: There are 247 relatively old TEs and 52 KRAB-ZNFs that were difficult to place into a specific branch. Thus, they are not presented in this panel.) (D) Expression of KRAB-ZNF genes and TEs in primary and secondary cortices across species. Both KRAB-ZNF genes and TEs were grouped into two groups based on their inferred evolutionary age, old (>44.2 million years ago [mya]) and young (≤44.2 mya). Young KRAB-ZNFs and young TEs have lower expression levels (Wilcoxon rank sum test, p<0.05). Expressions of all brain regions can be found in Figure 2—figure supplement 3. (E) Percentage of differentially expressed KRAB-ZNF genes and TEs in humans compared to chimpanzees in primary and secondary cortices and cerebellar white matter. (F) Human-specific differentially expressed (DE) (i.e. human-specifically changed) KRAB-ZNF genes and TEs in primary and secondary cortices compared to nonhuman primates (NHPs). Gray indicates no expression information. The colors for age inferences in (C), (D), (E), and (F) are the same: blue for evolutionary old and orange for evolutionary young KRAB-ZNFs and TEs, respectively.

Figure 2.

Figure 2—figure supplement 1. Distribution of KRAB-ZNFs evolutionary age inference.

Figure 2—figure supplement 1.

The evolutionary age of KRAB-ZNFs was inferred from GenTree (Shao et al., 2019) and primate orthologous annotations (Jovanovic et al., 2021). The evolutionary young group (≤ 44.2 mya) is in orange and the evolutionary old group (> 44.2 mya) is in blue.
Figure 2—figure supplement 2. Distribution of transposable elements (TEs) evolutionary age inference.

Figure 2—figure supplement 2.

The age of TEs were estimated using Dfam subfamily species annotation. The evolutionary young group (≤ 44.2 mya) is in orange and the evolutionary old group (> 44.2 mya) is in blue.
Figure 2—figure supplement 3. Expression of KRAB-ZNFs and transposable elements (TEs) among brain regions.

Figure 2—figure supplement 3.

Both KRAB-ZNF genes and TEs were grouped in two groups based on their inferred evolutionary age, old (> 44.2 mya) and young (≤ 44.2mya). Young KRAB-ZNFs and young TEs have lower expression levels (Wilcoxon Rank Sum Test, p < 0.05).

Certain TEs are primate-specific and have been extensively used in phylogenetic studies. For example, a subset of recently evolved Alu subfamilies is found only in Simiiformes (Xing et al., 2007; Williams et al., 2010). We next investigated whether expression patterns differ between Simiiformes-specific KRAB-ZNF genes and TEs (called evolutionary young from here on), and KRAB-ZNFs and TEs that have also orthologs outside Simiiformes (called evolutionary old from here on). To this end, we dated these genomic elements and classified them into the two groups based on their inferred evolutionary age (see Materials and methods). The old group consists of 234 KRAB-ZNF genes and 955 TEs that evolved prior to the emergence of Simiiformes (>44.2 million years ago [mya]), while the young group consists of 103 KRAB-ZNF genes and 309 TEs that evolved less than 44.2 mya (Figure 2C, Figure 2—figure supplements 1 and 2). We first found that the young KRAB-ZNFs and young TEs exhibited significantly lower expression levels across all brain regions, regardless of species (Figure 2D, Figure 2—figure supplement 3). Next, there were proportionally more young KRAB-ZNF genes and young TEs differentially expressed between humans and chimpanzees compared to old KRAB-ZNF genes and old TEs (Figure 2E). The same holds true when comparing humans to all NHPs, implying that young TEs and young KRAB-ZNFs are still more dynamically changed between different primates.

We further investigated if there are KRAB-ZNFs and TEs that are specifically changed in humans compared to the three NHPs. We detected 36 such KRAB-ZNF genes and 18 such TEs in the primary and secondary cortices. KRAB-ZNFs showed a trend toward more downregulation in humans, such as ZNF337 and ZNF394 (Figure 2F). Unlike this trend, some KRAB-ZNFs linked to cognitive disorders were human specifically upregulated, e.g., ZNF778, a candidate gene for autism spectrum disorder and cognitive impairment (Willemsen et al., 2010), and ZNF267, which is upregulated in the prefrontal cortex of AD patients (Patel et al., 2021). TEs tended to be more upregulated in humans compared to NHP. On the other hand, the LTR12B subfamily is one of the most downregulated TEs in humans. LTR12-related ERV subfamilies had been reported to be repressed by ZNF676 and ZNF728 in early human development (Iouranova et al., 2022).

Changes in correlations between TEs and KRAB-ZNF genes: increased connectivity in the human brain co-expression network with an enrichment for evolutionary young correlations

To systematically analyze the putative functional relationships between TEs and KRAB-ZNF genes, we conducted pairwise Pearson’s correlation analysis using normalized expression levels in seven clustered brain regions (Supplementary file 1, table S2 and Figure 1 in Khrameeva et al., 2020). We first analyzed the human samples. There were 324 KRAB-ZNFs and 895 TEs (subfamily level) expressed in Primate Brain Data (copy number provided in Supplementary file 1, table S4). In humans, we found 100,987 positive and 26,810 negative significant correlations between TEs and KRAB-ZNFs in the primary and secondary cortices and 38,295 positive and 11,475 negative significant correlations in the limbic and association cortices (adjusted p-value<0.01), while the other clusters have fewer or no correlations detected (Supplementary file 1, table S5). We will thus mainly focus on the primary and secondary cortices and the limbic and association cortices for our subsequent analyses.

The numbers of correlations between TEs and KRAB-ZNFs are significantly more than expected by chance, as gauged by repeating the correlation analysis with randomly picked genes and TEs (p<0.001; see Materials and methods; Figure 3A, Figure 3—figure supplement 1), indicating that putative functional relationships between TEs and KRAB-ZNFs can be detected in the data. The high number of positive correlations might be surprising, given that KRAB-ZNFs are considered to repress TEs. However, the dataset contains in general more positive correlations, even when choosing random genes, and KRAB-ZNFs still have more negative correlations to TEs than random genes. It is also plausible that some relationships between older KRAB-ZNFs and no longer harmful TEs are positive, e.g., due to embedding into functional pathways, when a repression is no longer needed.

Figure 3. TE:KRAB-ZNF in human primary and secondary cortices.

(A) and (B) demonstrate the workflow for checking significant TE:KRAB-ZNF using the primary and secondary cortices as an example. (A) We used randomly selected gene sets and KRAB-ZNFs to calculate correlations with transposable elements (TEs). The violet dots indicate the correlation counts of TE:KRAB-ZNF based on comparing all correlations, positive correlations, and negative correlations. They are significantly higher than for random gene sets (boxplots below, 1000 iterations, p<0.001). (B) Overlaps between TE:KRAB-ZNF (y-axis) and the KRAB-ZNF protein ChIP-exo data (Imbeault et al., 2017) (x-axis). Note that we use absolute coefficient values for negative correlations. Correlations under the yellow area are selected (C) Jaccard similarity, demonstrating that the correlations between TEs and KRAB-ZNFs overlapped significantly more with ChIP-exo data than randomly selected TEs and KRAB-ZNFs (p<0.001). The points indicate the overlap with actual correlations and ChIP-exo data. The boxplots indicate the overlap between randomly selected TE and KRAB-ZNF pairs with ChIP-exo data. (D) Subsets of the number of positive and negative TE:KRAB-ZNF in the primary and secondary cortices (c1_p and c1_n) and the limbic and association cortices (c2_p and c2_n). (E) TE:KRAB-ZNF network in the primary and secondary cortices with five modules. Nodes are colored in five colors representing the five modules, and nodes in white do not belong to any module. Young links are in orange and old links are in blue. (F) Distribution of the normalized degree counts in TE and KRAB-ZNF nodes in TE:KRAB-ZNF network. (G) This is the subnetwork colored in pink from (E), showing that this module mainly consists of Alu subfamilies. (H) The log count of correlations classified by TEs and the categories of links, including positive-old (P-O), positive-young (P-Y), negative-old (N-O), and negative young (N-Y). Red stars indicate that the class distribution of the TEs is significantly different (Chi-squared test, p<0.001). The right-hand side barplot shows the exact count of Alu subfamilies from the first row in the heatmap.

Figure 3.

Figure 3—figure supplement 1. Distribution of correlation in limbic and association cortices.

Figure 3—figure supplement 1.

This is the same method mentioned in Figure 3A and B. (A) Higher number of correlations comparing to KRAB-ZNFs (pink dot) and random selected genes (p < 0.001) (B) We use a threshold adjusted p-value < 0.01 and absolute coefficient larger than 0.4 to select TE:KRAB-ZNF for down-stream analysis.
Figure 3—figure supplement 2. Example of correlations (TE:KRAB-ZNF).

Figure 3—figure supplement 2.

We define it as a significant one-to-one correlation between TE and KRAB-ZNF based on the absolute coefficient being larger than 0.4 and the adjusted p-value being smaller than 0.01. We classify it as a young correlation when at least one of the components (TE or KRAB-ZNF gene) is evolutionary young. (A) A negative young example (young TE correlated with young KRAB-ZNF) (B) A negative young example (young TE correlated with old KRAB-ZNF) (C) An old example (old TE correlated with old KRAB-ZNF). X-axis is the log expression level of KRAB-ZNF and y-axis is the log expression of TE.
Figure 3—figure supplement 3. TE:KRAB-ZNF network in human limbic and association cortices.

Figure 3—figure supplement 3.

Nodes are colored based on 13 different modules. Nodes in white do not belong to any module. Links demonstrate the evolutionary age of the interaction (young link in orange; old link in blue). (A) primary and secondary cortices (cluster1) and (B) limbic and association cortices (cluster2). Violet dots indicating numbers of significant correlations of TE:KRAB-ZNF (adjusted p < 0.01). Box plots indicate the distribution of 1000 iterations of random selected genes correlated with TEs (hm: human, pt: chimpanzee, pp: bonobo, mm: macaque, all: positive and negative correlations, negative: negative correlations, positive: positive correlations).

To remove weak correlations, we set a threshold with absolute correlation coefficients greater than 0.4 and adjusted p-value below 0.01 (Benjamini-Hochberg correction) (Figure 3B, Figure 3—figure supplements 1 and 2). In addition, we sought independent experimental confirmation of our detected putative relationships by overlapping the correlations with experimental ChIP-exo data obtained from KRAB-ZNF proteins in human embryonic stem cells (Imbeault et al., 2017). Note that although ChIP-exo was performed in different cell types than the brain samples we analyzed, the overlap was significant by calculating their Jaccard similarity (p<0.001, Figure 3C). This result supports that our correlations have significantly captured the interplay between TEs and KRAB-ZNFs. Focusing on the ChIP-exo-confirmed correlations, we obtained 869 correlations in the primary and secondary cortices and 399 correlations in limbic and association cortices. Of those, 201 positive correlations and 166 negative correlations overlapped between these two brain region groups (Figure 3D).

To better understand the complexity of the interactions between TEs and KRAB-ZNFs, we represented the one-to-one pairwise correlations of TE and KRAB-ZNF (denoted as TE:KRAB-ZNF in the following content) in a bipartite network. In this bipartite network, nodes belong to one of two classes, TEs or KRAB-ZNFs, and only links between nodes of different classes are presented. Our bipartite network of human primary and secondary cortices reveals 354 nodes connected by 869 links (Figure 3E), and the human limbic and association cortices have 247 nodes with 399 links (Figure 3—figure supplement 3). To identify the hubs in the network, we first calculated their normalized degree distributions, considering the unequal numbers of TEs and KRAB-ZNFs. Our analysis revealed that KRAB-ZNF nodes connect to more TEs than TE nodes connect to KRAB-ZNFs, indicating that KRAB-ZNF nodes are more likely to act as hubs with higher connectivity (Figure 3F). For investigating the structure of the network, we calculated the bipartite modularities based on the correlation coefficients. Our findings revealed that the network of the primary and secondary cortices is clustered into 5 modules (Figure 3E), while the network of the limbic and association cortices clusters into 13 modules (Figure 3—figure supplement 3).

We divided TE:KRAB-ZNF into evolutionary young and old, calling a correlation as young when one of the nodes is evolutionary young and old when both nodes are evolutionary old (Figure 3—figure supplement 2). Taking also the correlation coefficients into account, all links can be classified into four categories: positive-old (P-O), positive-young (P-Y), negative-old (N-O), and negative-young (N-Y). We found that the links are nonrandomly distributed in the network. In particular, Alu, ERV1, ERVK, ERVL, L1, TcMar-Tigger, and SVA have significantly different interactions with KRAB-ZNF than the other TE subfamilies (Chi-squared test, p<0.001). For example, in a module with Alu subfamilies (Figure 3G), we observed that most of the correlations specifically belong to N-Y, while SVA had many P-Y correlations with KRAB-ZNFs (Figure 3H).

Next, we compared the TE:KRAB-ZNF between species and only considered the 178 KRAB-ZNFs and 836 TEs, which were expressed in all four species. Since the original study (Khrameeva et al., 2020) included four human individuals but only three individuals per NHP species, we performed a leave-one-out analysis of the human samples. For a fair comparison across species, we required that a correlation between KRAB-ZNFs and TEs in humans needed to be significantly detected in all human leave-one-out combinations (Figure 4A, Supplementary file 1, table S6). We then repeated our test of whether KRAB-ZNFs are more likely to correlate with TEs compared to randomly selected genes and found that human KRAB-ZNF still has a significantly higher number of correlations with TEs. In contrast, in NHPs, KRAB-ZNFs did not have more correlations to TEs than randomly selected genes (Figure 4—figure supplement 1).

Figure 4. Comparison of TE:KRAB-ZNF in human and nonhuman primates (NHPs).

(A) Workflow for selecting TE:KRAB-ZNF comparing between species. First, there were 178 transposable elements (TEs) and 836 KRAB-ZNFs detected in all four species. Second, the leave-one-out test in the human sample was performed for a fair comparison since humans had four repeats and NHPs only had three (adjusted p<0.01 and absolute coefficient >0.4) (B) Number of positive and negative correlations in human and NHPs. Red brackets indicated the change of correlation between two species. For example, there are 276 human positive correlations which are negatively correlated in bonobos (suffix n: negative, p: positive; hs: Homo sapiens, pt: Pan troglodytes, pp: Pan paniscus, mm: Macaca mulatta). (C) Network of 276 TE:KRAB-ZNF that were all positively correlated in humans but negatively correlated in bonobos. This network demonstrates two hubs, ZNF528 and ZNF112, connecting to multiple TE subfamilies. Node size of TEs refers to the relative abundance of connections to the hubs. Details of this network can be found in Figure 4—figure supplement 2. (D) ZNF528 protein sequence difference in a zinc finger domain (ZF), where humans have glutamine (Q) while bonobos have histidine (H) at the –1 finger position. The lower part of the illustration indicates that the zinc finger domain binds to the DNA sequence using the –1, 3, and 6 finger positions. (E) Number of different TE subfamily nodes which form evolutionary old and young correlations in (C) network comparing humans to bonobos. (F) Distribution counts of human-specific correlations categorized based on TE subfamilies showing only young links. N-Y: negative and young correlations; P-Y: positive and young correlations.

Figure 4.

Figure 4—figure supplement 1. Comparing KRAB-ZNFs and randomly selected genes in Primate Brain Data.

Figure 4—figure supplement 1.

Figure 4—figure supplement 2. 276 opposite TE:KRAB-ZNF regulatory network comparing humans to bonobos.

Figure 4—figure supplement 2.

This bipartite network refers to the 276 TE:KRAB-ZNF network mentioned in Figure 4C. KRAB-ZNF nodes are in violet and TE nodes are in green. The colors of the border of nodes and the edges represent their evolutionary age. The evolutionary young nodes and edges are in orange and the evolutionary old nodes and edges are in blue.

We subsequently determined human-specific and conserved TE:KRAB-ZNF interactions by assessing whether an interaction seen in humans also existed in any NHP. Remarkably, the results we obtained show that humans have a higher number of correlations and connectivity compared to NHPs (Figure 4B). This finding is not confounded by a lack of gene/TE annotations nor sample size, given that we only included KRAB-ZNF genes and TEs expressed in all four species, and that we used the same number of individuals per species; being even more conservative by requiring that all four permutations of human triples show a significant correlation.

Interestingly, we found that some correlations had opposite signs between species. For example, 276 TE:KRAB-ZNF with positive correlations in humans were negatively correlated in bonobos (the seventh column in Figure 4B). Although not significant, out of the 276 TE:KRAB-ZNF, 104 were also negatively correlated in chimpanzees and rhesus macaques and might represent human-specific changes in the sign of the correlation. Constructing a network of those 276 TE:KRAB-ZNF, we detected that ZNF112 and ZNF528 are two hubs connected to TEs with mostly old links and young links, respectively (Figure 4C). We asked whether sequence differences exist between the orthologous zinc finger proteins that might explain this putative functional change. For ZNF112, there were 21 zinc finger domains. However, among 14 amino acid differences, none of them affected a position within the zinc finger domains (–1, 3, and 6) that directly contacts the DNA. Between the orthologs of human and bonobo ZNF528, we discovered an amino acid difference in one position that directly contacts nucleotides of the DNA (–1 position of the 15th zinc finger domain) (Figure 4D). While in bonobo ZNF528, there is a histidine at this position, it is a glutamine in human ZNF528. For other KRAB-ZNF proteins, it has been demonstrated that replacing DNA-contacting amino acids with alanine or glutamine reduces their repressor potency (Nunez et al., 2011). Therefore, we speculate that variations in the zinc finger domain of ZNF528, specifically at position 588, may explain why human ZNF528 is not as negatively correlated with TEs as bonobo ZNF528 (Figure 4D). Interestingly, this changed position represents a very rare human polymorphism (rs373201614), which seems to be under positive selection in humans, including Denisovans and Neanderthals (Harrison et al., 2024).

The distribution of young and old links was not random in the 276 TE:KRAB-ZNF bipartite network (Figure 4C). For instance, TcMar-Tc2 and Gypsy had only old interaction with KRAB-ZNFs, while most of the ERVK correlations were classified as young (Figure 4E).

Last, we checked the species-specific correlations based on TE subfamilies. For example, there are 24,063 positive and 2455 negative correlations in humans that were not detected in NHPs (the first column and the fifth column in Figure 4B). Interestingly, these human-specific correlations were all evolutionary young links, and many of them represented negative correlations involving TEs of the Alu subfamilies (Figure 4F).

Alterations in TE and KRAB-ZNF expression in brains of Alzheimer’s patients reflect brain region differences

Several KRAB-ZNFs and TEs with human-specific expression patterns or correlations have been associated with AD. For example, ZNF267, which was upregulated in the human cortex compared to NHPs (Figure 2F), is a clear transcriptomic signature for the diagnosis of AD (Fehlbaum-Beurdeley et al., 2012), and the expression of AluYa5 subfamily leads to genetic dysregulation in AD (Kim et al., 2016). We thus hypothesized that the regulatory network of KRAB-ZNFs and TEs might be severely altered in the brains of AD patients. We utilized the Mayo Data to test this hypothesis (Table 2).

Table 2. Datasets.

Dataset Categories(biological replicates) Total number of samples
Primate Brain Data
(GSE127898)
Human (4) 132
Chimpanzee (3) 96
Macaque (3) 96
Bonobo (3) 98
Mayo Data
(syn5550404)
Control-temporal cortex 23
AD temporal cortex
Control-cerebellum
AD cerebellum
24
23
22

To investigate if there were differences in the expression patterns of TEs and KRAB-ZNFs comparing different brain regions and disease status, we conducted t-SNE analysis using their expression levels from the temporal cortex and cerebellum of human control individuals and AD patients. Results showed that variances were primarily influenced by brain regions rather than AD status (Figure 5A). Similar to our previous results with the Primate Brain Data (Figure 2D), we found that evolutionary young KRAB-ZNF genes and young TEs were expressed at lower levels than their older counterparts (Figure 5B). Comparing expression levels between AD and control samples, we obtained for the temporal cortex 6 KRAB-ZNF genes and 4 TEs that were upregulated in AD and 6 TEs that were downregulated in AD (Figure 5C). In the cerebellum, there were 6 upregulated and 1 downregulated TEs, and 4 downregulated and 10 upregulated KRAB-ZNF genes (Figure 5D).

Figure 5. Expression of transposable elements (TEs) and KRAB-ZNF genes in Mayo Data.

Figure 5.

(A) Variations in the expression of KRAB-ZNF genes and TEs using t-SNE analysis. cbe: cerebellum; tcx: temporal cortex. (B) Distributions of the expression of evolutionary old and young KRAB-ZNF genes and TEs. (C) and (D) Differentially expressed KRAB-ZNF genes and TEs (absolute log2FoldChange>0.5, p<0.05) in temporal cortex (tcx) and cerebellum (cbe). The expression of KRAB-ZNF genes and TEs in the cerebellum shared the same log expression scale.

Human-specific correlations limited to the healthy human temporal cortex: 21 TE:KRAB-ZNFs not detected in AD

To select significant correlations (TE:KRAB-ZNF) between control and AD samples in the temporal cortex and cerebellum, we employed the same filtering criteria as described in the Primate Brain Data analysis (Figure 3A–C), requiring an adjusted p-value less than 0.01, absolute correlation coefficients higher than 0.4, and TE:KRAB-ZNF pairs detected in ChIP-exo data (Imbeault et al., 2017). The overlaps of TE:KRAB-ZNF pairs are depicted in Figure 6A, demonstrating a higher number of correlations in the control group. Next, we selected 21 TE:KRAB-ZNF, which are detected from the temporal cortex both in human Primate Brain Data and the healthy controls in Mayo Data, but not detected in any NHPs in Primate Brain Data. These correlations represented a subset of TE:KRAB-ZNF, which were specific to healthy adult human brain samples but not detected in AD progression (Figure 6B). Among these 21 TE:KRAB-ZNF, there are 14 evolutionary young and 7 evolutionary old interactions, and we found that Alu subfamilies accounted for 11 of these 21 interactions (Figure 6C). We further investigated why these TE:KRAB-ZNF were not detected in AD samples and found that most of the correlations were not significant based on our defined threshold. For instance, AluYc:ZNF182 exhibited a positive correlation in both control and AD samples in the temporal cortex. However, according to our selection criteria, this TE:KRAB-ZNF pair was not deemed significant in the AD sample (FDR = 0.34) (Figure 6D). The sole exception, showing opposite direction of correlation between groups, was L1MA6:ZNF211, which displayed a negative correlation in the control group but had a nonsignificant very weak positive correlation in the AD group (Figure 6D). This indicates weakening and loss of some correlations between TEs and KRAB-ZNFs in AD.

Figure 6. TE:KRAB-ZNF analysis in Mayo Data.

Figure 6.

(A) Overlaps of TE:KRAB-ZNF between control and Alzheimer’s disease (AD) condition in temporal cortex and cerebellum (denoted as cbe_control, tcx_control, cbe_AD, and tcx_AD). (B) 21 human-control-specific TE:KRAB-ZNFs were selected from the intersection of human-specific TE:KRAB-ZNFs from Primate Brain Data (not detected in the other nonhuman primates [NHPs]) and control-specific TE:KRAB-ZNFs from Mayo Data in temporal cortex (not detected in AD samples). (C) Distribution of transposable element (TE) families counts among the 21 control-specific TE:KRAB-ZNF in the temporal cortex. (D) Comparison of the expression and correlation results of AluYc:ZNF182 and L1MA6:ZNF211 in the temporal cortex. (E) and (F) show the bipartite network of 21 TE:KRAB-ZNF in the temporal cortex. (E) Coloring TE nodes in green and KRAB-ZNF nodes in violet. Evolutionary young links are in orange, and evolutionary old links are in blue. Orange border specified that this TE or KRAB-ZNF evolved recently. (F) There are two modules in the network colored in pink and gray based on their bipartite modularity. Brown links indicate negative correlations, and green links are positive correlations.

Subsequently, we conducted a bipartite network analysis using the 21 TE:KRAB-ZNF pairs specific to the healthy human samples in the temporal cortex. Integrating evolutionary age inference, we identified a module of young Alu subfamilies in the temporal cortex characterized by negative correlations to KRAB-ZNFs (Figure 6E–F). Interestingly, the negative young TE:KRAB-ZNF pairs in this Alu module are not significantly detected in AD brains, suggesting that the regulation of the involved TEs is lost or at least impaired in the disease condition.

Discussion

In this study, we conducted systematic examinations of TEs and KRAB-ZNFs expression and correlation patterns in the brain and compared them between humans and NHPs, as well as between healthy human samples and AD. We demonstrated high divergence in the expression of TEs and KRAB-ZNF genes between species, suggesting a prominent evolutionary signature in the regulation of TEs and KRAB-ZNF genes in primates (Figure 2A). We also found that evolutionary younger members of both TEs and KRAB-ZNF genes exhibit significantly lower expression levels compared to older members (Figure 2D), proposing that newly emerged genetic elements might be subjected to more stringent regulatory mechanisms, potentially avoiding disruption at the organismal level. To represent the complexity of interactions between TEs and KRAB-ZNFs, we derived bipartite networks. We showed that humans had higher connectivity in these networks compared to NHPs (Figure 4B). This observation is in line with previous findings that humans also have higher connectivity in the transcription factor networks of the brain compared to NHPs (Nowick et al., 2009; Bakken et al., 2016; Berto et al., 2018). A substantial proportion of young TE:KRAB-ZNF seems to be human-specific, pointing to recent additions to the TE:KRAB-ZNF network in the human lineage and presumably an increase in the complexity of gene regulation (Figure 4F). The network of the 21 TE:KRAB-ZNF that were only discovered in the healthy human temporal cortex but not in other NHPs (Figure 6E and F) indicates potential alterations or loss of control of some TE expression in AD brains. These collective findings suggest an important role of the TE:KRAB-ZNF network in shaping the evolution and in maintaining the functionality of the healthy human brain, and particularly highlight the co-expression involving evolutionary young TEs and young KRAB-ZNFs.

Given the generally acknowledged role of KRAB-ZNF proteins as repressors on TE expressions, we expected to observe many negative correlations between KRAB-ZNFs and TEs. Indeed, many evolutionary young and human-specific correlations involving especially TEs of the Alu subfamilies are negative. However, we also observed numerous positive correlations between KRAB-ZNFs and TEs. Keeping in mind that correlations do not indicate direction of the causation and can also be caused by indirect relationships and other factors, several plausible explanations could be offered for these positive interactions. First, some TEs can function as cis-acting regulatory elements capable of influencing the expression of nearby genes, which have been mentioned to be tissue-specifically co-opted TEs (Sentmanat and Elgin, 2012; Wolf et al., 2020; Coronado-Zamora and González, 2023). Second, it is conceivable that the expression levels of KRAB-ZNF genes themselves are influenced by the presence and activity of TEs within the genome (Pontis et al., 2019; Senft and Macfarlan, 2021). Another perspective is that primate-specific KRAB-ZNFs can bind to gene promoters, regulating gene expression in the primate brain without necessarily targeting TEs (Farmiloe et al., 2020); observed correlations could then stem from indirect regulation. We should note, however, that 19.43% (446/2,296) of positively correlated pairs overlap significantly with the ChIP-exo results, i.e., exist in the presence of a KRAB-ZNF protein binding to the TE (Figure 3B). Furthermore, we discovered that all the human-specific correlations are evolutionary young correlations. Notably, different TE subfamilies interact with KRAB-ZNFs differently, e.g., Alu subfamilies are the only TEs that have more negative correlations than positive ones, and LTR subfamilies have only positive correlations (Figure 4F). Therefore, we encourage future research focusing on primate evolution to take these factors into account.

TEs have also been observed to be dysregulated in certain diseases. Here, we observed that the variances in expression levels of TEs and KRAB-ZNFs were more distinct across various brain regions than between control and AD samples (Figure 5A). This finding points out the difficulties in identifying differences between control and AD samples solely through the consideration of DE. This is also illustrated by the detection of only 6 and 10 differentially expressed KRAB-ZNFs and TEs, respectively (Figure 5C). Upon combining expression data with correlation results and an evolutionary perspective, we determined 21 human-specific TE:KRAB-ZNFs that were not detectable in AD brains (Figure 6B, E, and F), suggesting considerable alterations of connections in the context of a human-specific disease. By comparing healthy and disease samples, we discovered evolutionary young Alu subfamilies that are downregulated in AD patients, such as AluYh9 and AluSp subfamilies, which have been found to be negatively associated with Tau pathologic burden in the human dorsolateral prefrontal cortex (Guo et al., 2018). The AluY and AluSx subfamilies derived Alu-mediated in-frame deletions of the exons 9 and 10 in mutated forms of PSNE1, which was associated with early-onset AD (Le Guennec et al., 2017). Alu subfamilies have further been shown to regulate the expression of ACE in neurons from Alu insertions (Wu et al., 2013) and to play a role in A-to-I RNA editing during neurogenesis in the central nervous system and mitochondrial homeostasis (Larsen et al., 2018). Thus, we suggest the dysregulation of modules of Alu subfamilies with their connected KRAB-ZNFs in the regulatory network (Figure 6F) plays a role in AD progression.

With respect to the arms-race hypothesis (Jacobs et al., 2014), we observed a higher number of negative correlations between young TEs and young KRAB-ZNF genes in humans compared to NHPs. This suggests increased repression from young KRAB-ZNFs on young TEs in the human brain (Figure 4B and F). Still, our findings only weakly support the arms-race hypothesis. First, we noted that young TEs and KRAB-ZNFs exhibit lower expression levels than their old counterparts (Figures 2D and 5B). This pattern may not align with the expectation that young TEs have recently escaped repression, which would likely result in higher expression of the young TE or the young KRAB-ZNF to keep the TE in check. However, it remains challenging to discern at which stage of a potential arms race the TE:KRAB-ZNF pair presently is, i.e., whether the TE is on the rise with its expression or already repressed by the KRAB-ZNF. In addition, older TEs might be allowed to be expressed more highly as they are not harmful anymore. Not in line with the arms-race model is further evidence that some young TEs were also negatively correlated with old KRAB-ZNF genes, leading to weak assortativity regarding age inference. We acknowledge, however, that this is not a contradiction, since old KRAB-ZNF genes might be repurposed to repress young TEs, which could be more ‘cost-effective’ than evolving a new KRAB-ZNF gene. Potentially, restricting the analysis to KRAB-ZNFs and TEs younger than 25 mya would be more suitable for investigating the arms-race hypothesis in humans. During this period, we might expect a more direct arms-race scenario, as the evolutionary pressures between rapidly evolving TEs and their regulatory mechanisms should be more pronounced. However, it is important to note that such an analysis would have been limited by the detectable correlations of only 3 out of 9 KRAB-ZNF genes and 9 out of 92 TE subfamilies in our dataset (Supplementary file 1, table S7).

In response to the need for systematically comparing expression profiles of orthologous genes and TEs across diverse species or conditions, we developed TEKRABber, an R Bioconductor package that equips users to effortlessly adapt its comprehensive pipeline for their analyses. TEKRABber efficiently processes mapped raw counts from various TE quantification tools and normalizes transcriptomic data across species by extending the usage of the scaling-based normalization method (Zhou et al., 2019). To optimize computational efficiency and parallel computing during the calculations of pairwise correlations, we harnessed the capabilities of RCpp (Eddelbuettel and François, 2011) and doParallel (Corporation and Weston, 2011). This approach allows for effective scaling, accommodating larger datasets and leveraging additional computational resources as needed. Taking these advantages, users can adapt the TEKRABber’s pipeline to analyze selected orthologous genes and TEs from their expression data within an acceptable time frame. Taking one of the examples in this study, calculating pairwise correlations of 178 orthologs with 836 TEs (neglecting sample number) can be executed in approximately 5 min on standard hardware (Intel Core i7 2.6 GHz, 16 GB memory). Future studies could use TEKRABber, for instance, to investigate correlations between TEs and all genes, to identify candidates of genes whose expression is influenced by certain TEs. Another option would be to explore which factors might repress TE expression in invertebrates, which lack KRAB-ZNF proteins.

Nonetheless, it is crucial to recognize that our present approach is constrained by certain limitations. These constraints primarily stem from variations in the lengths and positions of the same TE across individual genomes of the same species, leaving some room for improving the normalization of TE expression levels. We only used the available reference genomes for our analysis; however, it becomes increasingly clear that substantial individual differences in the presence of TEs exist, which are not covered by a single reference genome of a species. Currently, there are emerging methods designed to detect TE expression, including tools for advancements in sequencing resolution like long-read sequencing (Marx, 2023), de novo TE annotations (Storer et al., 2022; Orozco-Arias et al., 2024), and locus-specific expression detection in single-cell RNA-seq analysis (Rodríguez-Quiroz and Valdebenito-Maturana, 2022). These developments hold promise for achieving a more precise depiction of the variations between samples and the reference, ultimately enhancing our understanding of TE-associated expression patterns. Hence, we acknowledge the significant potential of integrating improved TE orthologous information into our method.

Conclusion

In summary, our findings underscore the intricate network of interactions between TEs and KRAB-ZNFs in both human evolution and neurodegenerative disease. To achieve a comprehensive understanding of TEs and KRAB-ZNFs functions, it is not enough to only examine expression levels, but network analysis as facilitated by TEKRABber needs to be leveraged. We found that the human brain exhibits a notably denser TE:KRAB-ZNF network compared to NHPs, particularly for more recently evolved TEs and KRAB-ZNFs. The healthy human brain TE:KRAB-ZNF network contains a distinct module composed exclusively of Alu subfamilies, which is an evolutionary novelty not observed in AD brains. These insights highlight the nuanced dynamics of TE:KRAB-ZNF interactions and their relevance in both evolutionary and disease contexts. We emphasize that TEs can have a role in species evolution and provide a tool, TEKRABber, to further investigate this across a larger number of taxa.

Materials and methods

Primate Brain Data

For comparing humans with NHPs, we used published RNA-seq data (GSE127898) including 422 brain samples from biological replicates that consisted of 4 humans, 3 chimpanzees, 3 bonobos, and 3 macaques (Table 2). Samples from each individual were from 33 different brain regions, and total RNA was sequenced on the Illumina HiSeq 4000 system with a 150 bp paired-end sequencing protocol. More details about samples and the preparation steps can be found in Supplementary file 1, table S1 and Khrameeva et al., 2020. RNA-seq FASTQ data on Gene Expression Omnibus (Barrett et al., 2012) were retrieved using NCBI SRA Toolkit v3.0.3.

Mayo Data

The Mayo RNA-seq study (Allen et al., 2016) was utilized to compare human control and AD samples. Total RNA was sequenced from both control and AD samples collected from the temporal cortex and cerebellum. The preprocessed FASTQ files, which include only control and AD samples in the temporal cortex and cerebellum, were downloaded via Synapse consortium studies (accession: syn5550404). The number of samples can be found in Table 2 and more details, including biological sex, age, and Braak staging, are in Supplementary file 1, table S3.

Transcriptome analysis

Adapters and low-quality reads were removed from the FASTQ files using fastp v0.12.4 (Chen et al., 2018) with default parameters. The selected FASTQ files from both datasets were then mapped to their respective references downloaded from UCSC Table Browser, including hg38, panTro6, panPan3, and rheMac10 (Karolchik et al., 2004), using STAR v2.7.10b (Dobin et al., 2013) with the parameters ‘--outFilterMultimapNmax 100 --winAnchorMultimapNmax 100’. These parameters were chosen to increase the likelihood of capturing TE mapping by allowing for multiple alignments of reads and more loci anchors for mapping. The resulting BAM files were used to quantify counts of TEs and genes using TEtranscripts v2.2.3 (Jin et al., 2015) with the ‘--sortByPos’ parameter to determine the expression levels of genes and TEs. Gene and TE indices were created using UCSC gene annotations and the RepeatMasker track, enabling reads to match with these intervals. If a read overlapped with both a gene exon and a TE, we determined whether it had a unique alignment or multiple locations in the genome. If an annotation existed, the uniquely aligned read was assigned to the gene. Otherwise, it was assigned to the TE. For reads with ambiguous mappings, they were evenly weighted across TE or gene annotations using the expectation maximization algorithm (Dempster et al., 1977). Differentially expressed genes and TEs were quantified using DESeq2 v1.4.4 (Love et al., 2014), utilizing an absolute log2FoldChange threshold greater than 1.5 (adjusted p<0.05). However, the Mayo Data adopted an absolute log2FoldChange threshold greater than 0.5 due to a lower number of detected differentially expressed genes and TEs. Heatmaps and upset plots were visualized using ComplexHeatmap v2.2.0 (Gu et al., 2016; Gu, 2022).

Inferences about the evolutionary age of KRAB-ZNFs and TEs

A total of 337 KRAB-ZNF genes were identified using the comprehensive KRAB-ZNF catalog (Huntley et al., 2006). The evolutionary age of these genes was inferred through annotations provided by GenTree (Shao et al., 2019), which employed a synteny-based pipeline to date primate-specific protein-coding genes and depicted their origins using a branch view. For KRAB-ZNF genes lacking direct dating annotations in GenTree, we incorporated complementary information from annotations of primate orthologs across 27 species, including humans (Jovanovic et al., 2021). This approach allowed us to assign evolutionary ages to all 337 KRAB-ZNF genes for our downstream analysis (Figure 2—figure supplement 1). The evolutionary age of TEs was derived from Dfam (Storer et al., 2021) by extracting species-specific information for each TE subfamily (Figure 2—figure supplement 2). Alu elements, which are primate-specific, have been extensively used in phylogenetic studies, and a subset of recently evolved Alu subfamilies is found in all Simiiformes (Xing et al., 2007; Williams et al., 2010). For downstream analyses, we classified both KRAB-ZNF genes and TEs into young and old groups based on their emergence around the divergence of Simiiformes (Figure 2C).

Development of the TEKRABber software

To compare the expression of orthologous genes and TEs across species, we concatenated steps including normalization, DE, and correlation analysis into an R Bioconductor package, TEKRABber. The name was derived from the idea that TEs are bound (‘grabbed’) by KRAB-ZNF proteins. TEKRABber adapted the scale-based normalization method (Zhou et al., 2019) to normalize orthologous genes for comparison between two species. In brief, the conserved orthologous gene lengths from two species were selected from Ensembl data (Harrison et al., 2024) and combined with the expression data to find an optimal scaling factor that can normalize the data to achieve a minimization of deviation between empirical and nominal type I error in a hypothesis testing framework. The normalization step for TEs followed a similar concept. However, instead of using orthologous genes, we used the subset of TEs including LTR, LINE, SINE, SVAs, and DNA transposons (as defined by RepeatMasker, Smit et al., 2013), for which homologs can be found in other species, for scaling to normalize the expression of TEs (Figure 1—figure supplement 1). After normalization, differentially expressed orthologous genes and TEs were analyzed using DESeq2 v1.4.4 (Love et al., 2014). The one-to-one correlations between selected orthologous genes and TEs from each species were estimated. In this analysis, we used Pearson’s correlations with an adjusted p-value using the Benjamini-Hochberg correction (Benjamini and Hochberg, 1995) to obtain significant correlations.

Correlation analysis

To obtain significant correlations of KRAB-ZNFs and TEs for downstream analysis, we used a workflow including the following steps. We first applied Pearson’s correlations on one-to-one gene and TE, using their normalized expression levels. Then, we tested whether KRAB-ZNFs statistically had more correlations with TEs than randomly selected genes (1000 iterations, p-value<0.001). Next, we investigated the relationships between the number of correlations and correlation coefficient values. From the distribution, we decided on a threshold with an absolute value for the coefficient of larger than 0.4 and with an adjusted p-value less than 0.01. Last, we cross-validated our results by comparing the pairs found with experimentally determined TE and KRAB-ZNF relationships using ChIP-exo data from a human embryonic stem cell line (Imbeault et al., 2017) and testing different correlation strengths. For notation, we use TE:KRAB-ZNF to signify a correlation between a TE and a KRAB-ZNF gene, i.e., AluYc:ZNF441 indicates a significant correlation between the expression of AluYc and ZNF441.

Constructing TE:KRAB-ZNF regulatory network

Nodes and edges were selected from the TE:KRAB-ZNF results and processed into a dataframe object. The network structure was created using RCy3 v2.16 (Gustavsen et al., 2019) to import into Cytoscape v3.10.1 (Shannon et al., 2003) for visualization and editing. To visualize the regulatory networks, we applied the yFiles Organic Layout for an undirected graph by assigning nodes as objects that had repulsive or attractive forces between them (https://www.yworks.com).

Network analysis

TE:KRAB-ZNF networks were analyzed as bipartite networks, consisting of two classes of nodes, KRAB-ZNF genes and TEs, with connections existing only between the two classes and no edges between elements from the same class. Node properties were first checked using a bipartite module in NetworkX software v2.8.4 (Hagberg et al., 2008). These values included bipartite degree centrality, bipartite strength centrality, and bipartite betweenness centrality for each KRAB-ZNF and TE node. The top 5% scoring genes were selected as hubs (bipartite degree and bipartite strength centrality). For cluster detection, leading eigenvector community methods (Csárdi et al., 2023) were used to specify unipartite community structures, and then bipartite modularity (Barber, 2007) was calculated with CONDOR v1.1.1 (Platig et al., 2016). To conduct enrichment analysis among clusters, a Fisher’s exact test using 1 million simulations with a p-value<0.05 was used.

Acknowledgements

We extend our sincere thanks to the data providers and all members of the research team involved in The Mayo RNA-seq study. We also appreciate the invaluable data resources provided (https://doi.org/10.7303/syn5550404), which have supported the progression of this research. Our sincere thanks go to Vladimir M Jovanovic, Rebecca S Saager, Jeong-Eun Költzow, Fatemeh Zebardast, Melanie Sarfert, Marula Mathew, and Vanessa H Schulmann for their contributions through critical reading and feedback.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Yao-Chung Chen, Email: yao-chung.chen@fu-berlin.de.

Katja Nowick, Email: katja.nowick@fu-berlin.de.

Detlef Weigel, Max Planck Institute for Biology, Tübingen, Germany.

Detlef Weigel, Max Planck Institute for Biology, Tübingen, Germany.

Funding Information

This paper was supported by the following grants:

  • Deutsche Forschungsgemeinschaft NO 920/8-1 to Katja Nowick.

  • Deutsche Forschungsgemeinschaft NO 920/10-1 to Katja Nowick.

Additional information

Competing interests

No competing interests declared.

Author contributions

Conceptualization, Data curation, Software, Formal analysis, Validation, Visualization, Methodology, Writing – original draft, Writing – review and editing.

Formal analysis, Visualization, Methodology, Writing – original draft, Writing – review and editing.

Conceptualization, Supervision, Funding acquisition, Writing – original draft, Project administration, Writing – review and editing.

Ethics

Human subjects: This study was conducted using de-identified human sequencing data. The datasets were obtained from publicly available sources, specifically the NCBI Gene Expression Omnibus (GEO) (GSE127898) and the Synapse database (SYN5550404), for which appropriate data access approval was obtained. As the data were de-identified and no identifiable personal information was used or accessed, informed consent and ethical approval were not required for this analysis.

Additional files

Supplementary file 1. Supplementary Tables.
elife-103608-supp1.xlsx (237.8KB, xlsx)
MDAR checklist

Data availability

The current manuscript is a computational study, so no data have been generated for this manuscript. Source code for the analysis has been uploaded to GitHub (copy archived at Chen, 2024).

The following previously published datasets were used:

Khrameeva E, Kurochkin I, Mazin P, Khaitovich P. 2020. Transcriptome map of the human brain at the single-cell resolution. NCBI Gene Expression Omnibus. GSE127898

SageNeuroCommunityAdmin 2016. Mayo RNAseq Study. Synapse.

References

  1. Allen M, Carrasquillo MM, Funk C, Heavner BD, Zou F, Younkin CS, Burgess JD, Chai H-S, Crook J, Eddy JA, Li H, Logsdon B, Peters MA, Dang KK, Wang X, Serie D, Wang C, Nguyen T, Lincoln S, Malphrus K, Bisceglio G, Li M, Golde TE, Mangravite LM, Asmann Y, Price ND, Petersen RC, Graff-Radford NR, Dickson DW, Younkin SG, Ertekin-Taner N. Human whole genome genotype and transcriptome data for Alzheimer’s and other neurodegenerative diseases. Scientific Data. 2016;3:160089. doi: 10.1038/sdata.2016.89. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bakken TE, Miller JA, Ding S-L, Sunkin SM, Smith KA, Ng L, Szafer A, Dalley RA, Royall JJ, Lemon T, Shapouri S, Aiona K, Arnold J, Bennett JL, Bertagnolli D, Bickley K, Boe A, Brouner K, Butler S, Byrnes E, Caldejon S, Carey A, Cate S, Chapin M, Chen J, Dee N, Desta T, Dolbeare TA, Dotson N, Ebbert A, Fulfs E, Gee G, Gilbert TL, Goldy J, Gourley L, Gregor B, Gu G, Hall J, Haradon Z, Haynor DR, Hejazinia N, Hoerder-Suabedissen A, Howard R, Jochim J, Kinnunen M, Kriedberg A, Kuan CL, Lau C, Lee C-K, Lee F, Luong L, Mastan N, May R, Melchor J, Mosqueda N, Mott E, Ngo K, Nyhus J, Oldre A, Olson E, Parente J, Parker PD, Parry S, Pendergraft J, Potekhina L, Reding M, Riley ZL, Roberts T, Rogers B, Roll K, Rosen D, Sandman D, Sarreal M, Shapovalova N, Shi S, Sjoquist N, Sodt AJ, Townsend R, Velasquez L, Wagley U, Wakeman WB, White C, Bennett C, Wu J, Young R, Youngstrom BL, Wohnoutka P, Gibbs RA, Rogers J, Hohmann JG, Hawrylycz MJ, Hevner RF, Molnár Z, Phillips JW, Dang C, Jones AR, Amaral DG, Bernard A, Lein ES. A comprehensive transcriptional map of primate brain development. Nature. 2016;535:367–375. doi: 10.1038/nature18637. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Barber MJ. Modularity and community detection in bipartite networks. Physical Review E. 2007;76:066102. doi: 10.1103/PhysRevE.76.066102. [DOI] [PubMed] [Google Scholar]
  4. Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Holko M, Yefanov A, Lee H, Zhang N, Robertson CL, Serova N, Davis S, Soboleva A. NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Research. 2012;41:D991–D995. doi: 10.1093/nar/gks1193. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Becker J, Czamara D, Hoffmann P, Landerl K, Blomert L, Brandeis D, Vaessen A, Maurer U, Moll K, Ludwig KU, Müller-Myhsok B, Nöthen MM, Schulte-Körne G, Schumacher J. Evidence for the involvement of ZNF804A in cognitive processes of relevance to reading and spelling. Translational Psychiatry. 2012;2:e136. doi: 10.1038/tp.2012.62. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Bendall ML, de Mulder M, Iñiguez LP, Lecanda-Sánchez A, Pérez-Losada M, Ostrowski MA, Jones RB, Mulder LCF, Reyes-Terán G, Crandall KA, Ormsby CE, Nixon DF. Telescope: Characterization of the retrotranscriptome by accurate estimation of transposable element expression. PLOS Computational Biology. 2019;15:e1006453. doi: 10.1371/journal.pcbi.1006453. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B. 1995;57:289–300. doi: 10.1111/j.2517-6161.1995.tb02031.x. [DOI] [Google Scholar]
  8. Berto S, Nowick K, editor Species-specific changes in a primate transcription factor network provide insights into the molecular evolution of the primate prefrontal cortex. Genome Biology and Evolution. 2018;10:2023–2036. doi: 10.1093/gbe/evy149. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Bodea GO, McKelvey EGZ, Faulkner GJ. Retrotransposon-induced mosaicism in the neural genome. Open Biology. 2018;8:180074. doi: 10.1098/rsob.180074. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Bourque G, Burns KH, Gehring M, Gorbunova V, Seluanov A, Hammell M, Imbeault M, Izsvák Z, Levin HL, Macfarlan TS, Mager DL, Feschotte C. Ten things you should know about transposable elements. Genome Biology. 2018;19:199. doi: 10.1186/s13059-018-1577-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34:i884–i890. doi: 10.1093/bioinformatics/bty560. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Chen YC. PrimateBrain_TEKRABZNF. swh:1:rev:2f2625cd83245023f622ddbd838107f6ecb57b13Software Heritage. 2024 https://archive.softwareheritage.org/swh:1:dir:6e5533c045b4949b36f31e126402fa2f3d7afc9f;origin=https://github.com/ferygood/primateBrain_TEKRABZNF;visit=swh:1:snp:3ef9fa9eedbb9084fb8272fa489430b7f5e6de01;anchor=swh:1:rev:2f2625cd83245023f622ddbd838107f6ecb57b13
  13. Colonna Romano N, Fanti L. Transposable elements: major players in shaping genomic and evolutionary patterns. Cells. 2022;11:1048. doi: 10.3390/cells11061048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Coronado-Zamora M, González J. Transposons contribute to the functional diversification of the head, gut, and ovary transcriptomes across Drosophila natural strains. Genome Research. 2023;33:1541–1553. doi: 10.1101/gr.277565.122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Corporation M, Weston S. DoParallel: foreach parallel adaptor for the “parallel” package.:1.0.17. CRAN.R. 2011 https://CRAN.R-project.org/package=doParallel
  16. Criscione SW, Zhang Y, Thompson W, Sedivy JM, Neretti N. Transcriptional landscape of repetitive elements in normal and cancer human cells. BMC Genomics. 2014;15:583. doi: 10.1186/1471-2164-15-583. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Csárdi G, Nepusz T, Müller K, Horvát S, Traag V, Zanini F, Noom D. Igraph for r: r interface of the igraph library for graph theory and network analysis. v2.1.4Zenodo. 2023 doi: 10.5281/zenodo.14736815. [DOI]
  18. Dembny P, Newman AG, Singh M, Hinz M, Szczepek M, Krüger C, Adalbert R, Dzaye O, Trimbuch T, Wallach T, Kleinau G, Derkow K, Richard BC, Schipke C, Scheidereit C, Stachelscheid H, Golenbock D, Peters O, Coleman M, Heppner FL, Scheerer P, Tarabykin V, Ruprecht K, Izsvák Z, Mayer J, Lehnardt S. Human endogenous retrovirus HERV-K(HML-2) RNA causes neurodegeneration through Toll-like receptors. JCI Insight. 2020;5:131093. doi: 10.1172/jci.insight.131093. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B. 1977;39:1–22. doi: 10.1111/j.2517-6161.1977.tb01600.x. [DOI] [Google Scholar]
  20. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Ecco G, Imbeault M, Trono D. KRAB zinc finger proteins. Development. 2017;144:2719–2729. doi: 10.1242/dev.132605. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Eddelbuettel D, François R. Rcpp: seamless R and C++ integration. J. Stat. Soft. 2011;1:i08. doi: 10.18637/jss.v040.i08. [DOI] [Google Scholar]
  23. Eskier D, Arıbaş A, Karakülah G. In: Plant Genomic and Cytogenetic Databases. Garcia S, Nualart N, editors. Springer US; 2023. PlanTEnrichment: a how-to guide on rapid identification of transposable elements associated with regions of interest in select plant genomes; pp. 59–70. [DOI] [PubMed] [Google Scholar]
  24. Evering TH, Marston JL, Gan L, Nixon DF. Transposable elements and Alzheimer’s disease pathogenesis. Trends in Neurosciences. 2023;46:170–172. doi: 10.1016/j.tins.2022.12.003. [DOI] [PubMed] [Google Scholar]
  25. Farmiloe G, Lodewijk GA, Robben SF, van Bree EJ, Jacobs FMJ. Widespread correlation of KRAB zinc finger protein binding with brain-developmental gene expression patterns. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences. 2020;375:20190333. doi: 10.1098/rstb.2019.0333. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Fehlbaum-Beurdeley P, Sol O, Désiré L, Touchon J, Dantoine T, Vercelletto M, Gabelle A, Jarrige AC, Haddad R, Lemarié JC, Zhou W, Hampel H, Einstein R, Vellas B, EHTAD/002 study group Validation of AclarusDx, a blood-based transcriptomic signature for the diagnosis of Alzheimer’s disease. Journal of Alzheimer’s Disease. 2012;32:169–181. doi: 10.3233/JAD-2012-120637. [DOI] [PubMed] [Google Scholar]
  27. Groner AC, Meylan S, Ciuffi A, Zangger N, Ambrosini G, Dénervaud N, Bucher P, Trono D, editor KRAB–Zinc finger proteins and KAP1 can mediate long-range transcriptional repression through heterochromatin spreading. PLOS Genetics. 2010;6:e1000869. doi: 10.1371/journal.pgen.1000869. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Gu Z, Eils R, Schlesner M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics. 2016;32:2847–2849. doi: 10.1093/bioinformatics/btw313. [DOI] [PubMed] [Google Scholar]
  29. Gu Z. Complex heatmap visualization. iMeta. 2022;1:e43. doi: 10.1002/imt2.43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Guo C, Jeong H-H, Hsieh Y-C, Klein H-U, Bennett DA, De Jager PL, Liu Z, Shulman JM. Tau activates transposable elements in alzheimer’s disease. Cell Reports. 2018;23:2874–2880. doi: 10.1016/j.celrep.2018.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Gustavsen JA, Pai S, Isserlin R, Demchak B, Pico AR. RCy3: Network biology using Cytoscape from within R. F1000Research. 2019;8:1774. doi: 10.12688/f1000research.20887.3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Hagberg AA, Schult DA, Swart PJ. Exploring network structure, dynamics, and function using NetworkX. Python in Science Conference; 2008. pp. 11–15. [DOI] [Google Scholar]
  33. Hancks DC, Kazazian HH. SVA retrotransposons: Evolution and genetic instability. Seminars in Cancer Biology. 2010;20:234–245. doi: 10.1016/j.semcancer.2010.04.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Harrison PW, Amode MR, Austine-Orimoloye O, Azov AG, Barba M, Barnes I, Becker A, Bennett R, Berry A, Bhai J, Bhurji SK, Boddu S, Branco Lins PR, Brooks L, Ramaraju SB, Campbell LI, Martinez MC, Charkhchi M, Chougule K, Cockburn A, Davidson C, De Silva NH, Dodiya K, Donaldson S, El Houdaigui B, Naboulsi TE, Fatima R, Giron CG, Genez T, Grigoriadis D, Ghattaoraya GS, Martinez JG, Gurbich TA, Hardy M, Hollis Z, Hourlier T, Hunt T, Kay M, Kaykala V, Le T, Lemos D, Lodha D, Marques-Coelho D, Maslen G, Merino GA, Mirabueno LP, Mushtaq A, Hossain SN, Ogeh DN, Sakthivel MP, Parker A, Perry M, Piližota I, Poppleton D, Prosovetskaia I, Raj S, Pérez-Silva JG, Salam AIA, Saraf S, Saraiva-Agostinho N, Sheppard D, Sinha S, Sipos B, Sitnik V, Stark W, Steed E, Suner M-M, Surapaneni L, Sutinen K, Tricomi FF, Urbina-Gómez D, Veidenberg A, Walsh TA, Ware D, Wass E, Willhoft NL, Allen J, Alvarez-Jarreta J, Chakiachvili M, Flint B, Giorgetti S, Haggerty L, Ilsley GR, Keatley J, Loveland JE, Moore B, Mudge JM, Naamati G, Tate J, Trevanion SJ, Winterbottom A, Frankish A, Hunt SE, Cunningham F, Dyer S, Finn RD, Martin FJ, Yates AD. Ensembl 2024. Nucleic Acids Research. 2024;52:D891–D899. doi: 10.1093/nar/gkad1049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Huntley S, Baggott DM, Hamilton AT, Tran-Gyamfi M, Yang S, Kim J, Gordon L, Branscomb E, Stubbs L. A comprehensive catalog of human KRAB-associated zinc finger genes: Insights into the evolutionary history of a large family of transcriptional repressors. Genome Research. 2006;16:669–677. doi: 10.1101/gr.4842106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Imbeault M, Helleboid PY, Trono D. KRAB zinc-finger proteins contribute to the evolution of gene regulatory networks. Nature. 2017;543:550–554. doi: 10.1038/nature21683. [DOI] [PubMed] [Google Scholar]
  37. Iouranova A, Grun D, Rossy T, Duc J, Coudray A, Imbeault M, de Tribolet-Hardy J, Turelli P, Persat A, Trono D. KRAB zinc finger protein ZNF676 controls the transcriptional influence of LTR12-related endogenous retrovirus sequences. Mobile DNA. 2022;13:4. doi: 10.1186/s13100-021-00260-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Jacobs FMJ, Greenberg D, Nguyen N, Haeussler M, Ewing AD, Katzman S, Paten B, Salama SR, Haussler D. An evolutionary arms race between KRAB zinc-finger genes ZNF91/93 and SVA/L1 retrotransposons. Nature. 2014;516:242–245. doi: 10.1038/nature13760. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Jin Y, Tam OH, Paniagua E, Hammell M. TEtranscripts: a package for including transposable elements in differential expression analysis of RNA-seq datasets. Bioinformatics. 2015;31:3593–3599. doi: 10.1093/bioinformatics/btv422. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Jovanovic VM, Sarfert M, Reyna-Blanco CS, Indrischek H, Valdivia DI, Shelest E, Nowick K. Positive selection in gene regulatory factors suggests adaptive pleiotropic changes during human evolution. Frontiers in Genetics. 2021;12:662239. doi: 10.3389/fgene.2021.662239. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Karakülah G, Arslan N, Yandım C, Suner A. TEffectR: an R package for studying the potential effects of transposable elements on gene expression with linear regression model. PeerJ. 2019;7:e8192. doi: 10.7717/peerj.8192. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D, Kent WJ. The UCSC table browser data retrieval tool. Nucleic Acids Research. 2004;32:D493–D6. doi: 10.1093/nar/gkh103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Khrameeva E, Kurochkin I, Han D, Guijarro P, Kanton S, Santel M, Qian Z, Rong S, Mazin P, Sabirov M, Bulat M, Efimova O, Tkachev A, Guo S, Sherwood CC, Camp JG, Pääbo S, Treutlein B, Khaitovich P. Single-cell-resolution transcriptome map of human, chimpanzee, bonobo, and macaque brains. Genome Research. 2020;30:776–789. doi: 10.1101/gr.256958.119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Kim S, Cho CS, Han K, Lee J. Structural variation of alu element and human disease. Genomics & Informatics. 2016;14:70–77. doi: 10.5808/GI.2016.14.3.70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Larsen PA, Hunnicutt KE, Larsen RJ, Yoder AD, Saunders AM. Warning SINEs: Alu elements, evolution of the human brain, and the spectrum of neurological disease. Chromosome Research. 2018;26:93–111. doi: 10.1007/s10577-018-9573-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Lawson HA, Liang Y, Wang T. Transposable elements in mammalian chromatin organization. Nature Reviews. Genetics. 2023;24:712–723. doi: 10.1038/s41576-023-00609-6. [DOI] [PubMed] [Google Scholar]
  47. Le Guennec K, Veugelen S, Quenez O, Szaruga M, Rousseau S, Nicolas G, Wallon D, Fluchere F, Frébourg T, De Strooper B, Campion D, Chávez-Gutiérrez L, Rovelet-Lecrux A. Deletion of exons 9 and 10 of the Presenilin 1 gene in a patient with Early-onset Alzheimer Disease generates longer amyloid seeds. Neurobiology of Disease. 2017;104:97–103. doi: 10.1016/j.nbd.2017.04.020. [DOI] [PubMed] [Google Scholar]
  48. Lerat E, Fablet M, Modolo L, Lopez-Maestre H, Vieira C. TEtools facilitates big data expression analysis of transposable elements and reveals an antagonism between their activity and that of piRNA genes. Nucleic Acids Research. 2016;1:gkw953. doi: 10.1093/nar/gkw953. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Liu X, Bienkowska JR, Zhong W. GeneTEFlow: A Nextflow-based pipeline for analysing gene and transposable elements expression from RNA-Seq data. PLOS ONE. 2020;15:e0232994. doi: 10.1371/journal.pone.0232994. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology. 2014;15:550. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Marx V. Method of the year: long-read sequencing. Nature Methods. 2023;20:6–11. doi: 10.1038/s41592-022-01730-w. [DOI] [PubMed] [Google Scholar]
  52. Nowick K, Gernat T, Almaas E, Stubbs L. Differences in human and chimpanzee gene expression patterns define an evolving network of transcription factors in brain. PNAS. 2009;106:22358–22363. doi: 10.1073/pnas.0911376106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Nowick K, Carneiro M, Faria R. A prominent role of KRAB-ZNF transcription factors in mammalian speciation? Trends in Genetics. 2013;29:130–139. doi: 10.1016/j.tig.2012.11.007. [DOI] [PubMed] [Google Scholar]
  54. Nunez N, Clifton MMK, Funnell APW, Artuz C, Hallal S, Quinlan KGR, Font J, Vandevenne M, Setiyaputra S, Pearson RCM, Mackay JP, Crossley M. The multi-zinc finger protein ZNF217 contacts DNA through a two-finger domain. Journal of Biological Chemistry. 2011;286:38190–38201. doi: 10.1074/jbc.M111.301234. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Orozco-Arias S, Sierra P, Durbin R, González J. MCHelper automatically curates transposable element libraries across eukaryotic species. Genome Research. 2024;34:2256–2268. doi: 10.1101/gr.278821.123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Patel S, Howard D, Chowdhury N, Derieux C, Wellslager B, Yilmaz Ö, French L. characterization of human genes modulated by Porphyromonas gingivalis highlights the ribosome, hypothalamus, and cholinergic neurons. Frontiers in Immunology. 2021;12:646259. doi: 10.3389/fimmu.2021.646259. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Platig J, Castaldi PJ, DeMeo D, Quackenbush J. Bipartite community structure of eQTLs. PLOS Computational Biology. 2016;12:e1005033. doi: 10.1371/journal.pcbi.1005033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Platt RN, Vandewege MW, Ray DA. Mammalian transposable elements and their impacts on genome evolution. Chromosome Research. 2018;26:25–43. doi: 10.1007/s10577-017-9570-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Playfoot CJ, Duc J, Sheppard S, Dind S, Coudray A, Planet E, Trono D. Transposable elements and their KZFP controllers are drivers of transcriptional innovation in the developing human brain. Genome Research. 2021;31:1531–1545. doi: 10.1101/gr.275133.120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Pontis J, Planet E, Offner S, Turelli P, Duc J, Coudray A, Theunissen TW, Jaenisch R, Trono D. Hominoid-specific transposable elements and KZFPs facilitate human embryonic genome activation and control transcription in naive human ESCs. Cell Stem Cell. 2019;24:724–735. doi: 10.1016/j.stem.2019.03.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Qu Y, Izsvák Z, Wang J. Retrotransposon: a versatile player in human preimplantation development and health. Life Medicine. 2023;2:lnac041. doi: 10.1093/lifemedi/lnac041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Ravel-Godreuil C, Znaidi R, Bonnifet T, Joshi RL, Fuchs J. Transposable elements as new players in neurodegenerative diseases. FEBS Letters. 2021;595:2733–2755. doi: 10.1002/1873-3468.14205. [DOI] [PubMed] [Google Scholar]
  63. Rodríguez-Quiroz R, Valdebenito-Maturana B. SoloTE for improved analysis of transposable elements in single-cell RNA-Seq data using locus-specific expression. Communications Biology. 2022;5:1063. doi: 10.1038/s42003-022-04020-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Schrader L, Schmitz J. The impact of transposable elements in adaptive evolution. Molecular Ecology. 2019;28:1537–1549. doi: 10.1111/mec.14794. [DOI] [PubMed] [Google Scholar]
  65. Scott EC, Gardner EJ, Masood A, Chuang NT, Vertino PM, Devine SE. A hot L1 retrotransposon evades somatic repression and initiates human colorectal cancer. Genome Research. 2016;26:745–755. doi: 10.1101/gr.201814.115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Senft AD, Macfarlan TS. Transposable elements shape the evolution of mammalian development. Nature Reviews. Genetics. 2021;22:691–711. doi: 10.1038/s41576-021-00385-1. [DOI] [PubMed] [Google Scholar]
  67. Sentmanat MF, Elgin SCR. Ectopic assembly of heterochromatin in Drosophila melanogaster triggered by transposable elements. PNAS. 2012;109:14104–14109. doi: 10.1073/pnas.1207036109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Research. 2003;13:2498–2504. doi: 10.1101/gr.1239303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Shao Y, Chen C, Shen H, He BZ, Yu D, Jiang S, Zhao S, Gao Z, Zhu Z, Chen X, Fu Y, Chen H, Gao G, Long M, Zhang YE. GenTree, an integrated resource for analyzing the evolution and function of primate-specific coding genes. Genome Research. 2019;29:682–696. doi: 10.1101/gr.238733.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Shulman JM, Chibnik LB, Aubin C, Schneider JA, Bennett DA, De Jager PL. Intermediate phenotypes identify divergent pathways to alzheimer’s disease. PLOS ONE. 2010;5:e11244. doi: 10.1371/journal.pone.0011244. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Smit AFA, Hubley R, Green P. RepeatMasker open-4.0. 4.2.1ISB. 2013 http://www.repeatmasker.org
  72. Storer J, Hubley R, Rosen J, Wheeler TJ, Smit AF. The Dfam community resource of transposable element families, sequence models, and genome annotations. Mobile DNA. 2021;12:2. doi: 10.1186/s13100-020-00230-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Storer J, Hubley R, Rosen J, Smit A. Methodologies for the de novo discovery of transposable element families. Genes. 2022;13:709. doi: 10.3390/genes13040709. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Teresi SJ, Teresi MB, Edger PP. TE Density: a tool to investigate the biology of transposable elements. Mobile DNA. 2022;13:11. doi: 10.1186/s13100-022-00264-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Thomas JH, Schneider S. Coevolution of retroelements and tandem zinc finger genes. Genome Research. 2011;21:1800–1812. doi: 10.1101/gr.121749.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Ting CN, Rosenberg MP, Snow CM, Samuelson LC, Meisler MH. Endogenous retroviral sequences are required for tissue-specific expression of a human salivary amylase gene. Genes & Development. 1992;6:1457–1465. doi: 10.1101/gad.6.8.1457. [DOI] [PubMed] [Google Scholar]
  77. van Hof A, Campagne P, Rigden DJ, Yung CJ, Lingley J, Quail MA, Hall N, Darby AC, Saccheri IJ. The industrial melanism mutation in British peppered moths is a transposable element. Nature. 2016;534:102–105. doi: 10.1038/nature17951. [DOI] [PubMed] [Google Scholar]
  78. Willemsen MH, Fernandez BA, Bacino CA, Gerkes E, de Brouwer APM, Pfundt R, Sikkema-Raddatz B, Scherer SW, Marshall CR, Potocki L, van Bokhoven H, Kleefstra T. Identification of ANKRD11 and ZNF778 as candidate genes for autism and variable cognitive impairment in the novel 16q24.3 microdeletion syndrome. European Journal of Human Genetics. 2010;18:429–435. doi: 10.1038/ejhg.2009.192. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Williams BA, Kay RF, Kirk EC. New perspectives on anthropoid origins. PNAS. 2010;107:4797–4804. doi: 10.1073/pnas.0908320107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Wolf G, de Iaco A, Sun MA, Bruno M, Tinkham M, Hoang D, Mitra A, Ralls S, Trono D, Macfarlan TS. KRAB-zinc finger protein gene expansion in response to active retrotransposons in the murine lineage. eLife. 2020;9:e56337. doi: 10.7554/eLife.56337. [DOI] [PMC free article] [PubMed] [Google Scholar]
  81. Wu SJ, Hsieh TJ, Kuo MC, Tsai ML, Tsai KL, Chen CH, Yang YH. Functional regulation of Alu element of human angiotensin-converting enzyme gene in neuron cells. Neurobiology of Aging. 2013;34:1921. doi: 10.1016/j.neurobiolaging.2013.01.003. [DOI] [PubMed] [Google Scholar]
  82. Xia B, Zhang W, Zhao G, Zhang X, Bai J, Brosh R, Wudzinska A, Huang E, Ashe H, Ellis G, Pour M, Zhao Y, Coelho C, Zhu Y, Miller A, Dasen JS, Maurano MT, Kim SY, Boeke JD, Yanai I. On the genetic basis of tail-loss evolution in humans and apes. Nature. 2024;626:1042–1048. doi: 10.1038/s41586-024-07095-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  83. Xing J, Witherspoon DJ, Ray DA, Batzer MA, Jorde LB. Mobile DNA elements in primate and human evolution. American Journal of Physical Anthropology. 2007;Suppl 45:2–19. doi: 10.1002/ajpa.20722. [DOI] [PubMed] [Google Scholar]
  84. Zhang YE, Landback P, Vibranovski MD, Long M, editor Accelerated recruitment of new brain development genes into the human genome. PLOS Biology. 2011;9:e1001179. doi: 10.1371/journal.pbio.1001179. [DOI] [PMC free article] [PubMed] [Google Scholar]
  85. Zhou Y, Zhu J, Tong T, Wang J, Lin B, Zhang J. A statistical normalization method and differential expression analysis for RNA-seq data between different species. BMC Bioinformatics. 2019;20:163. doi: 10.1186/s12859-019-2745-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

eLife Assessment

Detlef Weigel 1

The authors present a software (TEKRABber) to analyze how expression of transposable elements (TEs) and TE silencing factors KRAB zinc finger (KRAB-ZNF) genes are correlated in experimentally validated datasets. TEKRABber is used to reconstruct regulatory networks of KRAB-ZNFs and TEs during human brain evolution and in Alzheimer's disease. The direction of the work is important, with potentially significant interest from others looking for a tool for correlative gene expression analysis across individual genomes and species. However, the reviews identified biases and shortcomings in the pipeline that could lead to an unacceptable number of false positive and negative signals and thus impact the conclusions, leaving the work in its current form incomplete.

Reviewer #1 (Public review):

Anonymous

The authors present their new bioinformatic tool called TEKRABber, and use it to correlate expression between KRAB ZNFs and TEs across different brain tissues, and across species. While the aims of the authors are clear and there would be significant interest from other researchers in the field for a program that can do such correlative gene expression analysis across individual genomes and species, the presented approach and work display significant shortcomings. In the current state of the analysis pipeline, the biases and shortcomings mentioned below, for which I have seen no proof of that they are accounted for by the authors, are severely impacting the presented results and conclusions. It is therefore essential that the points below are addressed, involving significant changes in the TEKRABber progamm as well as the analysis pipeline, to prevent the identification of false positive and negative signals, that would severely affect the conclusions one can raise about the analysis.

My main concerns are provided below:

One important shortcoming of the biocomputational approach is that most TEs are not actually expressed, and others (Alus) are not a proxy of the activity of the TE class at all. I will explain: While specific TE classes can act as (species-specific) promoters for genes (such as LTRs) or are expressed as TE derived transcripts (LINEs, SVAs), the majority of other older TE classes do not have such behavior and are either neutral to the genome or may have some enhancer activity as mapped in the program they refer to 'TEffectR'. A big focus is on Alus, but Alus contribute to a transcriptome in a different way too: They often become part of transcripts due to alternative splicing. As such, the presence of Alu derived transcripts is not a proxy for the expression/activity of the Alu class, but rather a result of some Alus being part of gene transcripts (see also next point). Bottom line is that the TEKRABber software/approach is heavily prone to picking up both false positives (TEs being part of transcribed loci) and false negatives (TEs not producing any transcripts at all) , which has a big implication for how reads from TEs as done in this study should be interpreted: The TE expression used to correlate the KRAB ZNF expression is simply not representing the species-specific influences of TEs where the authors are after.

With the strategy as described, a lot of TE expression is misinterpreted: TEs can be part of gene-derived transcripts due to alternative splicing (often happens for Alus) or as a result of the TE being present in an inefficiently spliced out intron (happens a lot) which leads to TE-derived reads as a result of that TE being part of that intron, rather than that TE being actively expressed. As a result, the data as analysed is not reliably indicating the expression of TEs (as the authors intend too) and should be filtered for any reads that are coming from the above scenarios: These reads have nothing to do with KRAB ZNF control, and are not representing actively expressed TEs and therefore should be removed. Given that from my lab's experience in brain (and other) tissues, the proportion of RNA sequencing reads that are actually derived from active TEs is a stark minority compared to reads derived from TEs that happen to be in any of the many transcribed loci, applying this filtering is expected to have a huge impact on the results and conclusions of this study.

Another potential problem that I don't see addressed is that due to the high level of similarity of the many hundreds of KRAB ZNF genes in primates and the reads derived from them, and the inaccurate annotations of many KZNFs in non-human genomes, the expression data derived from RNA-seq datasets cannot be simply used to plot KZNF expression values, without significant work and manual curation to safeguard proper cross species ortholog-annotation: The work of Thomas and Schneider (2011) has studied this in great detail but genome-assemblies of non-human primates tend to be highly inaccurate in appointing the right ortholog of human ZNF genes. The problem becomes even bigger when RNA-sequencing reads are analyzed: RNA-sequencing reads from a human ZNF that emerged in great apes by duplication from an older parental gene (we have a decent number of those in the human genome) may be mapped to that older parental gene in Macaque genome: So, the expression of human-specific ZNF-B, that derived from the parental ZNF-A, is likely to be compared in their DESeq to the expression of ZNF-A in Macaque RNA-seq data. In other words, without a significant amount of manual curation, the DE-seq analysis is prone to lead to false comparisons which make the stategy and KRABber software approach described highly biased and unreliable.

There is no doubt that there are differences in expression and activity of KRAB-ZNFs and TEs repspectively that may have had important evolutionary consequences. However, because all of the network analyses in this paper rely on the analyses of RNA-seq data and the processing through the TE-KRABber software with the shortcomings and potential biases that I mentioned above, I need to emphasize that the results and conclusions are likely to be significantly different if the appropriate measures are taken to get more accurate and curated TE and KRAB ZNF expression data.

Finally, there are some minor but important notes I want to share:

The association with certain variations in ZNF genes with neurological disorders such as AD, as reported in the introduction is not entirely convincing without further functional support. Such associations could be merely happen by chance, given the high number of ZNF genes in the human genome and the high chance that variations in these loci happen associate with certatin disease associated traits. So using these associations as an argument that changes in TEs and KRAB ZNF networks are important for diseases like AD should be used with much more caution.

There is a number of papers where KRAB ZNF and TE expression are analysed in parallel in human brain tissues. So the novelty of that aspect of the presented study may be limited.

Additional note after reviewing the revised version of the manuscript:

After reviewing the revised version of the manuscript, my criticism and concerns with this study are still evenly high and unchanged. To clarify, the revised version did not differ in essence from the original version; it seems that unfortunately, no efforts were taken to address the concerns raised on the original version of the manuscript, the results section as well as the discussion section are virtually unchanged.

eLife. 2025 Sep 10;14:RP103608. doi: 10.7554/eLife.103608.3.sa2

Author response

Yao-Chung Chen 1, Arnaud Maupas 2, Katja Nowick 3

The following is the authors’ response to the current reviews.

Reviewer #1 (Public review):

The authors present their new bioinformatic tool called TEKRABber, and use it to correlate expression between KRAB ZNFs and TEs across different brain tissues, and across species. While the aims of the authors are clear and there would be significant interest from other researchers in the field for a program that can do such correlative gene expression analysis across individual genomes and species, the presented approach and work display significant shortcomings. In the current state of the analysis pipeline, the biases and shortcomings mentioned below, for which I have seen no proof of that they are accounted for by the authors, are severely impacting the presented results and conclusions. It is therefore essential that the points below are addressed, involving significant changes in the TEKRABber progamm as well as the analysis pipeline, to prevent the identification of false positive and negative signals, that would severely affect the conclusions one can raise about the analysis.

Thank you very much for the insightful review of our manuscript. Since most of the comments on our revised version are not different from the comments on our first version, we repeated our previous answer, but wrote a new reply to the new concerns (please see the last two paragraphs).

We would also like to reiterate here that most of the critique of the reviewer concerns the performance of other tools and not TEKRABber presented in our manuscript. We consider it out of scope for this manuscript to improve other tools.

My main concerns are provided below:

One important shortcoming of the biocomputational approach is that most TEs are not actually expressed, and others (Alus) are not a proxy of the activity of the TE class at all. I will explain: While specific TE classes can act as (species-specific) promoters for genes (such as LTRs) or are expressed as TE derived transcripts (LINEs, SVAs), the majority of other older TE classes do not have such behavior and are either neutral to the genome or may have some enhancer activity as mapped in the program they refer to 'TEffectR'. A big focus is on Alus, but Alus contribute to a transcriptome in a different way too: They often become part of transcripts due to alternative splicing. As such, the presence of Alu derived transcripts is not a proxy for the expression/activity of the Alu class, but rather a result of some Alus being part of gene transcripts (see also next point). Bottom line is that the TEKRABber software/approach is heavily prone to picking up both false positives (TEs being part of transcribed loci) and false negatives (TEs not producing any transcripts at all) , which has a big implication for how reads from TEs as done in this study should be interpreted: The TE expression used to correlate the KRAB ZNF expression is simply not representing the species-specific influences of TEs where the authors are after.

With the strategy as described, a lot of TE expression is misinterpreted: TEs can be part of gene-derived transcripts due to alternative splicing (often happens for Alus) or as a result of the TE being present in an inefficiently spliced out intron (happens a lot) which leads to TE-derived reads as a result of that TE being part of that intron, rather than that TE being actively expressed. As a result, the data as analysed is not reliably indicating the expression of TEs (as the authors intend too) and should be filtered for any reads that are coming from the above scenarios: These reads have nothing to do with KRAB ZNF control, and are not representing actively expressed TEs and therefore should be removed. Given that from my lab's experience in brain (and other) tissues, the proportion of RNA sequencing reads that are actually derived from active TEs is a stark minority compared to reads derived from TEs that happen to be in any of the many transcribed loci, applying this filtering is expected to have a huge impact on the results and conclusions of this study.

We sincerely thank the reviewer for highlighting the potential issues of false positives and negatives in TE quantification. The reviewer provided valuable examples of how different TE classes, such as Alus, LTRs, LINEs, and SVAs, exhibit distinct behaviors in the genome. To our knowledge, specific tools like ERVmap (Tokuyama et al., 2018), which annotates ERVs, and LtrDetector (Joseph et al., 2019), which uses k-mer distributions to quantify LTRs, could indeed enhance precision by treating specific TE classes individually. We acknowledge that such approaches may yield more accurate results and appreciate the suggestion.

In our study, we used TEtranscripts (Jin et al., 2015) prior to TEKRABber. TEtranscripts applies the Expectation Maximization (EM) algorithm to assign ambiguous reads as the following steps. Uniquely mapped reads are first assigned to genes, and reads overlapping genes and TEs are assigned to TEs only if they do not uniquely match an annotated gene. The remaining ambiguous reads are distributed based on EM iterations. While this approach may not be as specialized as the latest tools for specific TE classes, it provides a general overview of TE activity. TEtranscripts outputs subfamily-level TE expression data, which we used as input for TEKRABber to perform downstream analyses such as differential expression and correlation studies.

We understand the importance of adapting tools to specific research objectives, including focusing on particular TE classes. TEKRABber is designed not to refine TE quantification at the mapping stage but to flexibly handle outputs from various TE quantification tools. It accepts raw TE counts as input in the form of dataframes, enabling diverse analytical pipelines. We would also like to clarify that, since the input data is transcriptomic, our primary focus is on expressed TEs, rather than the effects of non-expressed TEs in the genome. In the revised version of our manuscript, we emphasize this distinction in the discussion and provide examples of how TEKRABber can integrate with other tools to enhance specificity and accuracy.

Another potential problem that I don't see addressed is that due to the high level of similarity of the many hundreds of KRAB ZNF genes in primates and the reads derived from them, and the inaccurate annotations of many KZNFs in non-human genomes, the expression data derived from RNA-seq datasets cannot be simply used to plot KZNF expression values, without significant work and manual curation to safeguard proper cross species ortholog-annotation: The work of Thomas and Schneider (2011) has studied this in great detail but genome-assemblies of non-human primates tend to be highly inaccurate in appointing the right ortholog of human ZNF genes. The problem becomes even bigger when RNA-sequencing reads are analyzed: RNA-sequencing reads from a human ZNF that emerged in great apes by duplication from an older parental gene (we have a decent number of those in the human genome) may be mapped to that older parental gene in Macaque genome: So, the expression of human-specific ZNF-B, that derived from the parental ZNF-A, is likely to be compared in their DESeq to the expression of ZNF-A in Macaque RNA-seq data. In other words, without a significant amount of manual curation, the DE-seq analysis is prone to lead to false comparisons which make the stategy and KRABber software approach described highly biased and unreliable.

There is no doubt that there are differences in expression and activity of KRAB-ZNFs and TEs repspectively that may have had important evolutionary consequences. However, because all of the network analyses in this paper rely on the analyses of RNA-seq data and the processing through the TE-KRABber software with the shortcomings and potential biases that I mentioned above, I need to emphasize that the results and conclusions are likely to be significantly different if the appropriate measures are taken to get more accurate and curated TE and KRAB ZNF expression data.

We thank the reviewer for raising the important issue of accurately annotating the expanded repertoire of KRAB-ZNFs in primates, particularly the challenges of cross-species orthology and potential biases in RNA-seq data analysis. Indeed, we have also addressed this challenge in some of our previous papers (Nowick et al., 2010, Nowick et al., 2011 and Jovanovic et al., 2021).

In the revised manuscript, we include more details about our two-step strategy to ensure accurate KRAB-ZNF ortholog assignments. First, we employed the Gene Order Conservation (GOC) score from Ensembl BioMart as a primary filter, selecting only one-to-one orthologs with a GOC score above 75% across primates. This threshold, recommended in Ensembl’s ortholog quality control guidelines, ensures high-confidence orthology relationships.(http://www.ensembl.org/info/genome/compara/Ortholog_qc_manual.html#goc).

Second, we incorporated data from Jovanovic et al. (2021), which independently validated KRAB-ZNF orthologs across 27 primate genomes. This additional layer of validation allowed us to refine our dataset, resulting in the identification of 337 orthologous KRAB-ZNFs for differential expression analysis (Figure S2).

We acknowledge that different annotation methods or criteria may for some genes yield variations in the identified orthologs. However, we believe that this combination provides a robust starting point for addressing the challenges raised, while we remain open to additional refinements in future analyses.

Finally, there are some minor but important notes I want to share:

The association with certain variations in ZNF genes with neurological disorders such as AD, as reported in the introduction is not entirely convincing without further functional support. Such associations could be merely happen by chance, given the high number of ZNF genes in the human genome and the high chance that variations in these loci happen associate with certatin disease associated traits. So using these associations as an argument that changes in TEs and KRAB ZNF networks are important for diseases like AD should be used with much more caution.

We fully acknowledge the concern that, given the large number of KRAB-ZNFs and their inherent variability, some associations with AD or other neurological disorders could occur by chance. This highlights the importance of additional functional studies to validate the causal role of KRAB-ZNF and TE interactions in disease contexts. While previous studies have indeed analyzed KRAB-ZNF and TE expression in human brain tissues, our study seeks to expand on this foundation by incorporating interspecies comparisons across primates. This approach enabled us to identify TE:KRAB-ZNF pairs that are uniquely present in healthy human brains, which may provide insights into their potential evolutionary significance and relevance to diseases like AD.

In addition to analyzing RNA-seq data (GSE127898 and syn5550404), we have cross-validated our findings using ChIP-exo data for 159 KRAB-ZNF proteins and their TE binding regions in humans (Imbeault et al., 2017). This allowed us to identify specific binding events between KRAB-ZNF and TE pairs, providing further support for the observed associations. We agree with the reviewer that additional experimental validations, such as functional studies, are critical to further establish the role of KRAB-ZNF and TE networks in AD. We hope that future research can build upon our findings to explore these associations in greater detail.

There is a number of papers where KRAB ZNF and TE expression are analysed in parallel in human brain tissues. So the novelty of that aspect of the presented study may be limited.

We agree with the reviewer that many studies have examined the expression levels of KRAB-ZNFs and TEs in developing human brain tissues (Farmiloe et al., 2020; Turelli et al., 2020; Playfoot et al., 2021, among others). However, the novelty of our study lies in comparing KRAB-ZNF and TE expression across primate species, as well as in adult human brain tissues from both control individuals and those with Alzheimer’s disease. To our knowledge, no previous study has analyzed these data in this context. We therefore believe that our results will be of interest to evolutionary biologists and neurobiologists focusing on Alzheimer’s disease.

Additional note after reviewing the revised version of the manuscript:

After reviewing the revised version of the manuscript, my criticism and concerns with this study are still evenly high and unchanged. To clarify, the revised version did not differ in essence from the original version; it seems that unfortunately, no efforts were taken to address the concerns raised on the original version of the manuscript, the results section as well as the discussion section are virtually unchanged.

We regret that this reviewer was not satisfied with our changes. In fact, many of the points raised by this reviewer are important, but concern weaknesses of other tools. In our opinion, validating other tools would be out of scope for this paper. We want to emphasize that TEKRABber is not a quantification tool for sequencing data, but a software for comparative analysis across species. We provided a detailed answer to the reviewer and readers can refer to that answer in the public review above for further information.

The following is the authors’ response to the original reviews.

Reviewer #1 (Public review):

The authors present their new bioinformatic tool called TEKRABber, and use it to correlate expression between KRAB ZNFs and TEs across different brain tissues, and across species. While the aims of the authors are clear and there would be significant interest from other researchers in the field for a program that can do such correlative gene expression analysis across individual genomes and species, the presented approach and work display significant shortcomings. In the current state of the analysis pipeline, the biases and shortcomings mentioned below, for which I have seen no proof that they are accounted for by the authors, are severely impacting the presented results and conclusions. It is therefore essential that the points below are addressed, involving significant changes in the TEKRABber program as well as the analysis pipeline, to prevent the identification of false positive and negative signals, that would severely affect the conclusions one can raise about the analysis.

Thank you very much for the insightful review of our manuscript.

My main concerns are provided below:

(1) One important shortcoming of the biocomputational approach is that most TEs are not actually expressed, and others (Alus) are not a proxy of the activity of the TE class at all. I will explain: While specific TE classes can act as (species-specific) promoters for genes (such as LTRs) or are expressed as TE derived transcripts (LINEs, SVAs), the majority of other older TE classes do not have such behavior and are either neutral to the genome or may have some enhancer activity as mapped in the program they refer to 'TEffectR'. A big focus is on Alus, but Alus contribute to a transcriptome in a different way too: They often become part of transcripts due to alternative splicing. As such, the presence of Alu derived transcripts is not a proxy for the expression/activity of the Alu class, but rather a result of some Alus being part of gene transcripts (see also next point). The bottom line is that the TEKRABber software/approach is heavily prone to picking up both false positives (TEs being part of transcribed loci) and false negatives (TEs not producing any transcripts at all), which has a big implication for how reads from TEs as done in this study should be interpreted: The TE expression used to correlate the KRAB ZNF expression is simply not representing the species-specific influences of TEs where the authors are after.

With the strategy as described, a lot of TE expression is misinterpreted: TEs can be part of gene-derived transcripts due to alternative splicing (often happens for Alus) or as a result of the TE being present in an inefficiently spliced out intron (happens a lot) which leads to TE-derived reads as a result of that TE being part of that intron, rather than that TE being actively expressed. As a result, the data as analysed is not reliably indicating the expression of TEs (as the authors intend to) and should be filtered for any reads that are coming from the above scenarios: These reads have nothing to do with KRAB ZNF control, and are not representing actively expressed TEs and therefore should be removed. Given that from my lab's experience in the brain (and other) tissues, the proportion of RNA sequencing reads that are actually derived from active TEs is a stark minority compared to reads derived from TEs that happen to be in any of the many transcribed loci, applying this filtering is expected to have a huge impact on the results and conclusions of this study.

We sincerely thank the reviewer for highlighting the potential issues of false positives and negatives in TE quantification. The reviewer provided valuable examples of how different TE classes, such as Alus, LTRs, LINEs, and SVAs, exhibit distinct behaviors in the genome. To our knowledge, specific tools like ERVmap (Tokuyama et al., 2018), which annotates ERVs, and LtrDetector (Joseph et al., 2019), which uses k-mer distributions to quantify LTRs, could indeed enhance precision by treating specific TE classes individually. We acknowledge that such approaches may yield more accurate results and appreciate the suggestion.

In our study, we used TEtranscripts (Jin et al., 2015) prior to TEKRABber. TEtranscripts applies the Expectation Maximization (EM) algorithm to assign ambiguous reads as the following steps. Uniquely mapped reads are first assigned to genes, and reads overlapping genes and TEs are assigned to TEs only if they do not uniquely match an annotated gene. The remaining ambiguous reads are distributed based on EM iterations. While this approach may not be as specialized as the latest tools for specific TE classes, it provides a general overview of TE activity. TEtranscripts outputs subfamily-level TE expression data, which we used as input for TEKRABber to perform downstream analyses such as differential expression and correlation studies.

We understand the importance of adapting tools to specific research objectives, including focusing on particular TE classes. TEKRABber is designed not to refine TE quantification at the mapping stage but to flexibly handle outputs from various TE quantification tools. It accepts raw TE counts as input in the form of dataframes, enabling diverse analytical pipelines. We would also like to clarify that, since the input data is transcriptiomic, our primary focus is on expressed TEs, rather than the effects of non-expressed TEs in the genome. In the revised version of our manuscript, we emphasize this distinction in the discussion and provide examples of how TEKRABber can integrate with other tools to enhance specificity and accuracy.

(2) Another potential problem that I don't see addressed is that due to the high level of similarity of the many hundreds of KRAB ZNF genes in primates and the reads derived from them, and the inaccurate annotations of many KZNFs in non-human genomes, the expression data derived from RNA-seq datasets cannot be simply used to plot KZNF expression values, without significant work and manual curation to safeguard proper cross species ortholog-annotation: The work of Thomas and Schneider (2011) has studied this in great detail but genome-assemblies of non-human primates tend to be highly inaccurate in appointing the right ortholog of human ZNF genes. The problem becomes even bigger when RNA-sequencing reads are analyzed: RNA-sequencing reads from a human ZNF that emerged in great apes by duplication from an older parental gene (we have a decent number of those in the human genome) may be mapped to that older parental gene in Macaque genome: So, the expression of human-specific ZNF-B, that derived from the parental ZNF-A, is likely to be compared in their DESeq to the expression of ZNF-A in Macaque RNA-seq data. In other words, without a significant amount of manual curation, the DE-seq analysis is prone to lead to false comparisons which make the strategy and KRABber software approach described highly biased and unreliable.

There is no doubt that there are differences in expression and activity of KRAB-ZNFs and TEs respectively that may have had important evolutionary consequences. However, because all of the network analyses in this paper rely on the analyses of RNA-seq data and the processing through the TE-KRABber software with the shortcomings and potential biases that I mentioned above, I need to emphasize that the results and conclusions are likely to be significantly different if the appropriate measures are taken to get more accurate and curated TE and KRAB ZNF expression data.

We thank the reviewer for raising the important issue of accurately annotating the expanded repertoire of KRAB-ZNFs in primates, particularly the challenges of cross-species orthology and potential biases in RNA-seq data analysis. Indeed, we have also addressed this challenge in some of our previous papers (Nowick et al., 2010, Nowick et al., 2011 and Jovanovic et al., 2021).

In the revised manuscript, we include more details about our two-step strategy to ensure accurate KRAB-ZNF ortholog assignments. First, we employed the Gene Order Conservation (GOC) score from Ensembl BioMart as a primary filter, selecting only one-to-one orthologs with a GOC score above 75% across primates. This threshold, recommended in Ensembl’s ortholog quality control guidelines, ensures high-confidence orthology relationships. (http://www.ensembl.org/info/genome/compara/Ortholog_qc_manual.html#goc).

Second, we incorporated data from Jovanovic et al. (2021), which independently validated KRAB-ZNF orthologs across 27 primate genomes. This additional layer of validation allowed us to refine our dataset, resulting in the identification of 337 orthologous KRAB-ZNFs for differential expression analysis (Figure S2).

We acknowledge that different annotation methods or criteria may for some genes yield variations in the identified orthologs. However, we believe that this combination provides a robust starting point for addressing the challenges raised, while we remain open to additional refinements in future analyses.

(3) The association with certain variations in ZNF genes with neurological disorders such as AD, as reported in the introduction is not entirely convincing without further functional support. Such associations could merely happen by chance, given the high number of ZNF genes in the human genome and the high chance that variations in these loci happen to associate with certain disease-associated traits. So using these associations as an argument that changes in TEs and KRAB ZNF networks are important for diseases like AD should be used with much more caution.

There are a number of papers where KRAB ZNF and TE expression are analysed in parallel in human brain tissues. So the novelty of that aspect of the presented study may be limited.

We fully acknowledge the concern that, given the large number of KRAB-ZNFs and their inherent variability, some associations with AD or other neurological disorders could occur by chance. This highlights the importance of additional functional studies to validate the causal role of KRAB-ZNF and TE interactions in disease contexts. While previous studies have indeed analyzed KRAB-ZNF and TE expression in human brain tissues, our study seeks to expand on this foundation by incorporating interspecies comparisons across primates. This approach enabled us to identify TE:KRAB-ZNF pairs that are uniquely present in healthy human brains, which may provide insights into their potential evolutionary significance and relevance to diseases like AD.

In addition to analyzing RNA-seq data (GSE127898 and syn5550404), we have cross-validated our findings using ChIP-exo data for 159 KRAB-ZNF proteins and their TE binding regions in humans (Imbeault et al., 2017). This allowed us to identify specific binding events between KRAB-ZNF and TE pairs, providing further support for the observed associations. We agree with the reviewer that additional experimental validations, such as functional studies, are critical to further establish the role of KRAB-ZNF and TE networks in AD. We hope that future research can build upon our findings to explore these associations in greater detail.

Reviewer #1 (Recommendations for the authors):

It is essential before this work can be considered for publication, that the points above are addressed, involving significant changes in the TEKRABber program as well as the analysis pipeline, to prevent the identification of false positive and negative signals, that would severely affect the conclusions one can raise about the analysis.

We sincerely appreciate the reviewer’s insightful recommendations and constructive feedback. Each specific point has been carefully addressed in detail in the public reviews section above.

Reviewer #2 (Public review)

Summary:

The aim was to decipher the regulatory networks of KRAB-ZNFs and TEs that have changed during human brain evolution and in Alzheimer's disease.

Strengths:

This solid study presents a valuable analysis and successfully confirms previous assumptions, but also goes beyond the current state of the art.

Weaknesses:

The design of the analysis needs to be slightly modified and a more in-depth analysis of the positive correlation cases would be beneficial. Some of the conclusions need to be reinterpreted.

We sincerely thank the reviewer for the thoughtful summary, positive evaluation of our study, and constructive feedback. We appreciate the recognition of the strengths in our analysis and the valuable suggestions for improving its design and interpretation.

We would like to briefly comment on the suggested modifications to the design here and will provide a detailed point-by-point review later with our revised manuscript.

The reviewer recommended considering a more recent timepoint, such as less than 25 million years ago (mya), to define the "evolutionary young group" of KRAB-ZNF genes and TEs when discussing the arms-race theory. This is indeed a valuable perspective, as the TE repressing functions by KRAB-ZNF proteins may have evolved more recently than the split between Old World Monkeys (OWM) and New World Monkeys (NWM) at 44.2 mya we used.

Our rationale for selecting 44.2 mya is based on certain primate-specific TEs such as the Alu subfamilies, which emerged after the rise of Simiiformes and have been used in phylogenetic studies (Xing et al., 2007 and Williams et al., 2010). This timeframe allowed us to investigate the potential co-evolution of KRAB-ZNFs and TEs in species that emerged after the OWM-NWM split (e.g., humans, chimpanzees, bonobos, and macaques used for this study). However, focusing only on KRAB-ZNFs and TEs younger than 25 million years would limit the analysis to just 9 KRAB-ZNFs and 92 TEs expressed in our datasets. While we will not conduct a reanalysis using this more recent timepoint, we will integrate the recommendation into the discussion section of the revised manuscript.

Furthermore, we greatly appreciate the reviewer's detailed insights and suggestions for refining specific descriptions and interpretations in our manuscript. We will address these points in the revised version to ensure the content is presented with greater precision and clarity.

Once again, we thank both reviewers for their valuable feedback, which provides significant input for strengthening our study.

Reviewer #2 (Recommendations for the authors):

We thank the reviewer for the very insightful comments, which helped a lot in our interpretation and discussion of our results and in improving some of our statements.

The present study seeks to uncover how the repression of transposable elements (TEs) by rapidly evolving KRAB-ZNF genes, which are known for their role in TE suppression, may influence human brain evolution and contribute to Alzheimer's disease (AD). Utilizing their previously developed tool, TEKRABber, the researchers analyze transcriptome datasets from the brains of four species of Old World Monkeys (OWM) alongside samples from healthy human individuals and AD patients.

Through bipartite network analysis, they identify KRAB-ZNF/Alu-TE interactions as the most negatively correlated in the network, highlighting the repression of Alu elements by KRAB-ZNF proteins. In AD patient samples, they observe a reduction in a subnetwork comprising 21 interactions within an Alu TE module. These findings are consistent with earlier evidence that: (1) KRAB-ZNFs are involved in suppressing evolutionarily young Alu TEs; and (2) specific Alu elements have been reported to be deregulated in AD. The study also validates previous experimental ChIP-exo data on KRAB-ZNF proteins obtained in a different cell type (Imbeault et al., 2017).

As a novely, the study identifies a human-specific amino acid variation in ZNF528, which directly contacts DNA nucleotides, showing signs of positive selection in humans and several human-specific TE interactions.

Interestingly, in addition to the negative links, the researchers observed predominantly positive connections with other TEs, suggesting that while their approach is consistent with some previous observations, the authors conclude that it provides limited support for the 'genetic arms race' hypothesis.

The reviewer is a specialist in TE and evolutionary research.

Major issues:

The study demonstrates the usefulness of the TEKRABber tool, which can support and successfully validate previous observations. However, there are several misconceptions and problems with the interpretation of the results.

KRAB-ZNF proteins in repressing TEs in vertebrates In the Abstract: "In vertebrates, some KRAB-ZNF proteins repress TEs, offering genomic protection."

Although some KRAB-ZNF proteins exist in vertebrates, their TE-suppression role is not as prominent or specialized as it is in mammals, where it serves as a key defense mechanism against the mobilization of TEs.

We appreciate the reviewer’s clarification regarding the role of KRAB-ZNF proteins in vertebrates. To improve accuracy and precision, we have revised the wording to specify that this mechanism is primarily observed in mammals rather than vertebrates.

The definition of young and old

The study considers the evolutionary age of young ({less than or equal to} 44.2 mya) and old(> 44.2 mya). This is the time of the Old World Monkey (OWM) and New World Monkey (NWM) split. Importantly, however, the KRAB-ZNF / KAP1 suppression system primarily suppresses evolutionarily younger TEs (< 25 MY old). These TEs are relatively new additions to the genome, i.e. they are specific to certain lineages (such as primates or hominins) and are more likely to be actively transcribed (and recognized as foreign by innate immunity) or have residual activity upon transposition. Examples include certain subfamilies of LINE-1, Alu (Y, S, less effective for J), SVA and younger human endogenous retroviruses (HERVs) such as HERV-K. The KRAB-ZNF / KAP1 system therefore focuses primarily on TEs that have evolved more recently in primates, in the last few million years (within the last 25 million years). Older TEs are controlled by broader epigenetic mechanisms such as DNA methylation, histone modifications, etc. Therefore, the age ({less than or equal to} 44.2 mya) is not suitable to define it as young.

In this context, the specific TEs of the Simiiformes cannot be considered as 'recently evolved' (in the Abstract). The Simiiformes contain both OWM and NWM. Notably, the study includes four species, all of which belong to the OWMs.

The 'genetic arms race' theory

Unfortunately, the problematic definition of young and old could also explain why the authors conclude that their data only weakly support the 'genetic arms race' hypothesis.

The KRAB-ZNF proteins evolve rapidly, similar to TEs, which raises the 'genetic arms race' hypothesis. This hypothesis refers to the constant evolutionary struggle between organisms and TEs. TEs constantly evolve to overcome host defences, while host genomes develop mechanisms to suppress these potentially harmful elements. Indeed, in mammals, an important example is the KRAB-ZNF/TE interaction. The KRAB-ZNF proteins rapidly evolve to target specific TEs, creating a 'genetic arms race' in which each side - TEs and the KRAB-ZNF/KAP1 (alias TRIM28) repressor complex - drives the evolution of the other in response to adaptive pressure. Importantly, the 'genetic arms race' hypothesis describes the evolutionary process that occurs between TE and host when the TE is deleterious. Again, this includes the young TEs (< 25 MY old) with residual transposition activity or those that actively transcribed and exacerbate cellular stress and inflammatory responses. Approximately 25 million years ago, the superfamilies Hominoidea (apes) and Cercopithecoidea (Old World monkeys, I.e. macaque) split.

Just to clarify, our initial study aim was to examine whether TEs exhibit any evolutionary relationships with KRAB-ZNFs across the four studied species (human, chimpanzee, bonobo, and macaque). For investigating the arms-race hypothesis, we really appreciate the reviewer suggesting a more recent time point, such as less than 25 million years ago (mya), to define the "evolutionary young group" of TEs and KRAB-ZNF genes. This is indeed a valuable recommendation, as 25 mya marks the emergence of Hominoidea (Figure 2C in the manuscript), making it a meaningful reference point for studying recently evolved KRAB-ZNFs and TEs. However, restricting the analysis to elements younger than 25 mya would reduce the dataset to only 9 KRAB-ZNFs and 92 TEs. Nevertheless, we provide here our results for those elements in Table S7:

We observed that among the correlations in the < 25 mya subset, negative correlations (7) outnumbered positive ones (2). However, these correlations were derived from only 3 out of 9 KRAB-ZNFs and 9 out of 92 TE subfamilies. Therefore, based on our data, while the < 25 mya group shows a higher proportion of negative correlations, the sample size is too limited to derive networks or draw robust conclusions in our analysis, especially when compared to our original evolutionary age threshold of 44.2 mya. For this reason, we chose not to reanalyze the data but rather to acknowledge that our current definition of “young” may not be optimal for testing the arms-race model in humans. While previous studies (Jacobs et al., 2014; Bruno et al., 2019; Zuo et al., 2023) have explored relevant KRAB-ZNF and TE interactions, our review of the KRAB-ZNFs and TEs highlighted in those works suggests that a specific focus on elements <25 mya has not been a primary emphasis.

"our findings only weakly support the arms-race hypothesis. Firstly, we noted that young TEs exhibit lower expression levels than old TEs (Figure 2D and 5B), which might not be expected if they had recently escaped repression". - This is a misinterpretation. These old TEs are no longer harmful. This is not the case of the 'genetic arms race'.

We sincerely appreciate the reviewer’s comments, which have helped us refine our interpretation to prevent potential misunderstandings. Our initial expectation, based on the arms-race hypothesis, was that young TEs would exhibit higher expression levels due to a recent escape from repression, while young KRAB-ZNFs would show increased expression as a counter-adaptive response. However, our findings indicate that both young TEs and young KRAB-ZNFs exhibit lower expression levels. This observation does not align with the classical arms-race model, which typically predicts an ongoing cycle of adaptive upregulation. We rephrase the sentences in our discussion to hopefully make our idea more clear. In addition, we added the notion that older TEs might not be harmful anymore, which we agree with.

"Additionally, some young TEs were also negatively correlated with old KRAB-ZNF genes, leading to weak assortativity regarding age inference, which would also not be in line with the arms-race idea."

This is not a contradiction, as an old KRAB-ZNF gene could be 'reactivated' to protect against young TEs. It might be cheaper for the host than developing a brand new KRAB-ZNF gene.

We agree with the reviewer's point that older KRAB-ZNFs may be reactivated to suppress young TEs, potentially as a more cost-effective evolutionary strategy than the emergence of entirely new KRAB-ZNFs. We have incorporated this perspective into the revised manuscript to provide a more detailed discussion of our findings.

TEs remain active

In the abstract: "Notably, KRAB-ZNF genes evolve rapidly and exhibit diverse expression patterns in primate brains, where TEs remain active."

This is not precise. TEs are not generally remain active in the brain. It is only the autonomous LINE-1 (young) and non-autonomous Alu (young) and SVA (young) elements that can be mobilized by LINE-1. In addition, the evolutionary young HERV-K is recognized as foreign and alerts the innate immune system (DOI: 10.1172/jci.insight.131093) and is a target of the KRAB-ZNF/KAP1 suppression system.

In the abstract: "Evidence indicates that transposable elements (TEs) can contribute to the evolution of new traits, despite often being considered deleterious."

Oversimplification: The harmful and repurposed TEs are washed together.

We appreciate the reviewer’s detailed suggestions for improving the precision of our abstract. While we previously mentioned LINE-1 and Alu elements in the introduction, we now explicitly specify in the abstract that only certain TE subfamilies, such as autonomous LINE-1 and non-autonomous Alu and SVA elements, remain active in the primate brain. Additionally, we have refined the phrasing regarding the role of TEs in evolution to clearly distinguish between their deleterious effects and their potential for functional repurposing. These clarifications have been incorporated into the revised abstract to ensure greater accuracy and nuance.

Positive links

"The high number of positive correlations might be surprising, given that KRAB-ZNFs are considered to repress TEs."

Based on the above, it is not surprising that negative associations are only found with young (< 25 my) TEs. In fact, the relationship between old KRAB-ZNF proteins and old (non-damaging) TEs could be neutral/positive. The case of ZNF528 could be a valuable example of this.

We thank the reviewer for providing this plausible interpretation and added it to the manuscript.

"276 TE:KRAB-ZNF with positive correlations in humans were negatively correlated in bonobos" It would be important to characterise the positive correlations in more detail. Could it be that the old KRAB-ZNF proteins lost their ability to recruit KAP1/TRIM28? Demonstrate it.

The strategy of developing sequence-specific DNA recognition domains that can specifically recognise TEs is expensive for the host. Recent studies suggest that when the TE is no longer harmful, these proteins/connections can be occasionally repurposed. The repurposed function would probably differ from the original suppressive function.

In my opinion, the TEKRABber tool could be useful in identifying co-option events:

We appreciate the reviewer’s suggestion regarding the characterization of positive correlations. While it is possible that some old KRAB-ZNF proteins have lost their ability to recruit KAP1/TRIM28, we cannot conclude this definitively for all cases. To address this, we examined ChIP-exo data from Imbeault et al. (2017) (Accession: GSE78099) and analyzed the overlap of binding sites between KRAB-ZNFs, KAP1/TRIM28, and RepeatMasker-annotated TEs. Our results indicate that some old KRAB-ZNFs still exhibit binding overlap with KAP1 at TE regions, suggesting that their repressive function may be at least partially retained (Author response image 1).

Author response image 1. Overlap of KAP1, Zinc finger proteins, and RepeatMasker annotation.

Author response image 1.

Here we detect the overlap of ChIP-exo binding events using KAP1/TRIM28, with KRAB-ZNF genes (one at a time) and RepeatMasker annotation. (115 old and 58 young KRAB-ZNFs, Mann-Whitney, p<0.01).

Minor

"Lead poisoning causes lead ions to compete with zinc ions in zinc finger proteins, affecting proteins such as DNMT1, which are related to the progression of AD (Ordemann and Austin 2016)."

Not precise: While DNMT1 does contain zinc-binding domains, it is not categorized as a zinc finger protein.

We appreciate the reviewer’s insight regarding the classification of DNMT1. After careful consideration, we have removed this sentence from the introduction to maintain focus on KRAB zinc finger proteins.

Definition of TEs

"There were 324 KRAB-ZNFs and 895 TEs expressed in Primate Brain Data." Define it more precisely. It is not clear, what the authors mean by TEs: Are these TE families, subfamilies? Provide information on copy numbers of each in the analysed four species.

We appreciate the reviewer’s suggestion to clarify our definition of TEs. To improve precision, we have specified that the analysis was conducted at the subfamily level. Additionally, we have provided the copy numbers of TEs for the four analyzed species in Table S4.

Occupancy of TEs in the genome

"TEs comprise (i) one third to one half of the mammalian genome and are (ii) not randomly distributed..."

(i) The most accepted number is 45%. However, some more recent reports estimate over 50%, thus the one third is an underestimation.

(ii) Not randomly distributed among the mammalian species?

(i) We thank the reviewer for pointing out that our statement about the abundance of TEs was outdated. We have updated the estimate to reflect that TEs can occupy more than half of the genome, based on recent publications.

(ii) We acknowledge the reviewer’s concern regarding the distribution of TEs. Although TEs are interspersed throughout the genome, their insertion sites are not entirely random, as they tend to exhibit preferences for certain genomic regions. To clarify this, we have revised the wording in the paragraph accordingly.

We would like to express our sincere gratitude to both reviewers for their insightful feedback, which has been instrumental in enhancing the quality of our study.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Citations

    1. Khrameeva E, Kurochkin I, Mazin P, Khaitovich P. 2020. Transcriptome map of the human brain at the single-cell resolution. NCBI Gene Expression Omnibus. GSE127898
    2. SageNeuroCommunityAdmin 2016. Mayo RNAseq Study. Synapse. [DOI]

    Supplementary Materials

    Supplementary file 1. Supplementary Tables.
    elife-103608-supp1.xlsx (237.8KB, xlsx)
    MDAR checklist

    Data Availability Statement

    The current manuscript is a computational study, so no data have been generated for this manuscript. Source code for the analysis has been uploaded to GitHub (copy archived at Chen, 2024).

    The following previously published datasets were used:

    Khrameeva E, Kurochkin I, Mazin P, Khaitovich P. 2020. Transcriptome map of the human brain at the single-cell resolution. NCBI Gene Expression Omnibus. GSE127898

    SageNeuroCommunityAdmin 2016. Mayo RNAseq Study. Synapse.


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES