Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2024 May 8;25:455. doi: 10.1186/s12864-024-10344-9

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© The Author(s) 2024

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

PMC Copyright notice

Fig. 1 — Discarding multimappers leads to epigenetic mischaracterization of young TEs. A Percentage of uni- and multimapper reads mapping to portions of the human genome annotated as TEs and not annotated as TEs (non-TE) for dataset 1 containing two ChIP-seq libraries generated by the ENCODE consortium using single-end 50 bp (“SE50”) and 100 bp (“SE100”) reads. Mapping was performed with two different mapping tools: with BBMap and Bwa mem. TEs are the major source of ChIP-seq multimappers in the human genome. B Length distribution of TE individual copies. Only TEs shorter than 500 bp are shown. Note that ~ 74% (856 out of 1,160) of the TEs in the human genome are longer than 500 bp, spanning up to 153,104 bp. The bin width is 10 bp. TEs were classified as DNA, LTR (long terminal repeat), SINE (short interspersed nuclear element), LINE (long interspersed nuclear element) and Others (e.g., rolling-circle (RC), unknown classification). Standard NGS reads are too short to fully cover most TE copies, explaining why TEs often give rise to multimappers. C Read coverage (bar plot on the left y-axis; see Methods) per clade for the SE100 ChIP-seq library (dataset 1) for uni- and multimappers. Reads were mapped using Bwa mem. TEs found in multiple clades (e.g., L1HS and L1P1) were assigned to the “younger” clade (Homo and Hominoidea, respectively). Only 30 out of 1,160 different TEs in the human genome (~ 3%) have not been annotated to any clade and were not represented. The number of TE copies for each clade (line plot on the right y-axis) shows that the majority of TEs are Primates and Eutherian-specific. Evolutionary young TEs are prone to be underrepresented when excluding multimappers from ChIP-seq analysis