Skip to main content
. 2023 Feb 24;21:41. doi: 10.1186/s12915-023-01527-z

Table 3.

Sources of coordinate locations of various DNA features used in this study

Feature Source
RFD profiles RFD profile data for HeLa and GM06990 cells was downloaded from the database described in [31]. Positions of replication origins marked with red rectangles are based on BED files with ORI positions provided by the authors.
Known genes Exon locations for hg19 used to make gene visualizations were obtained from the UCSC FTP server: http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/refGene.txt.gz. Alternative splice variants for a particular gene were combined into a single entry that contained all possible exons.
GC content GC content was calculated in 1-kb intervals for the hg19 reference genome based on the BSgenome.Hsapiens.UCSC.hg19 and seqinr [74] R libraries.
Skew profile (S) and replication origins Compositional skew was calculated according to the specifications from [21], in 1-kb intervals across the entire hg19 reference genome.
Replication time Replication time data for hg19, from 15 cell lines, obtained in the ENCODE project, were downloaded from GEO (ID: GSE34399) in a bigWig file format. The data represent smoothed wave signals for 1-kb windows, obtained in the Repli-seq experiment.
Sequence conservation Sequence conservation data was obtained from the phastCons100way UCSC track using phastCons100way.UCSC.hg19 Bioconductor library [75]
Histone marks Histone marks, including H2az, H3k27ac, H3k27me3, H3k36me3, H3k4me1, H3k4me2, H3k4me3, H3k79me2, H3k9ac, H3k9me3, and H4k20me1, were downloaded from the UCSC table browser, for the K562 cell line.
CpG islands CpG island locations for hg19 were downloaded from the AnnotationHub (AH5086 track) using the AnnotationHub Bioconductor library [76]
Isochores Isochore locations for hg19 were downloaded from https://bioinfo2.ugr.es/isochores database [32] and divided into one of 5 groups: L1, L2, H1, H2, H3, based on their average GC content according to the following thresholds: L1 ∈ [0, 37); L2 ∈ [37,41); H1 ∈ [41,46); H2 ∈ [46,53); H3 ∈ [53,100)
DNAse hypersensitivity sites DNAse hypersensitivity peaks originating from the ENCODE project were downloaded from the UCSC table browser, for the K562 cell line [77].
Repeats Repeat sequence locations for hg19 were downloaded from the AnnotationHub (AH5122 track) using the AnnotationHub Bioconductor library [76]
Simple repeats Locations of simple repeats for hg19 were downloaded from the AnnotationHub (AH5124 track) using the AnnotationHub Bioconductor library [76]
Alu sequences Alu sequences are a subset of the Repeats track which contains all repeats from the Alu family (37 types).
G-quadruplexes G-quadruplex locations for the hg19 reference genome were downloaded from GEO (ID: GSE110582). The G4 locations originate from a study based on G4-seq [59]
Transcription start sites (TSS) Locations of transcription start sites (TSS) were determined based on the UCSC gene annotation file downloaded from the FTP server: http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/refGene.txt.gz
Chromatin loops Chromatin loop data were obtained from the GEO database (ID: GSE63525) for K562 cells [61].
S/MARs Locations of scaffold/nuclear matrix attached regions (S/MARs) for hg19 were downloaded from the MARome database in a BED file format [62]
TADs Genomic coordinates (hg19) of TADs mapped in 8 cell lines we downloaded from the ENCODE project website at https://www.encodeproject.org/search/?type=Experiment&assay_title=Hi-C [48].
Genomic coordinates (hg19) of TADs mapped in human hESC and IMR90 cells were downloaded from the RenLab website at http://chromosome.sdsc.edu/mouse/hi-c/download.html [49].