Table 3.
Sources of coordinate locations of various DNA features used in this study
| Feature | Source |
|---|---|
| RFD profiles | RFD profile data for HeLa and GM06990 cells was downloaded from the database described in [31]. Positions of replication origins marked with red rectangles are based on BED files with ORI positions provided by the authors. |
| Known genes | Exon locations for hg19 used to make gene visualizations were obtained from the UCSC FTP server: http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/refGene.txt.gz. Alternative splice variants for a particular gene were combined into a single entry that contained all possible exons. |
| GC content | GC content was calculated in 1-kb intervals for the hg19 reference genome based on the BSgenome.Hsapiens.UCSC.hg19 and seqinr [74] R libraries. |
| Skew profile (S) and replication origins | Compositional skew was calculated according to the specifications from [21], in 1-kb intervals across the entire hg19 reference genome. |
| Replication time | Replication time data for hg19, from 15 cell lines, obtained in the ENCODE project, were downloaded from GEO (ID: GSE34399) in a bigWig file format. The data represent smoothed wave signals for 1-kb windows, obtained in the Repli-seq experiment. |
| Sequence conservation | Sequence conservation data was obtained from the phastCons100way UCSC track using phastCons100way.UCSC.hg19 Bioconductor library [75] |
| Histone marks | Histone marks, including H2az, H3k27ac, H3k27me3, H3k36me3, H3k4me1, H3k4me2, H3k4me3, H3k79me2, H3k9ac, H3k9me3, and H4k20me1, were downloaded from the UCSC table browser, for the K562 cell line. |
| CpG islands | CpG island locations for hg19 were downloaded from the AnnotationHub (AH5086 track) using the AnnotationHub Bioconductor library [76] |
| Isochores | Isochore locations for hg19 were downloaded from https://bioinfo2.ugr.es/isochores database [32] and divided into one of 5 groups: L1, L2, H1, H2, H3, based on their average GC content according to the following thresholds: L1 ∈ [0, 37); L2 ∈ [37,41); H1 ∈ [41,46); H2 ∈ [46,53); H3 ∈ [53,100) |
| DNAse hypersensitivity sites | DNAse hypersensitivity peaks originating from the ENCODE project were downloaded from the UCSC table browser, for the K562 cell line [77]. |
| Repeats | Repeat sequence locations for hg19 were downloaded from the AnnotationHub (AH5122 track) using the AnnotationHub Bioconductor library [76] |
| Simple repeats | Locations of simple repeats for hg19 were downloaded from the AnnotationHub (AH5124 track) using the AnnotationHub Bioconductor library [76] |
| Alu sequences | Alu sequences are a subset of the Repeats track which contains all repeats from the Alu family (37 types). |
| G-quadruplexes | G-quadruplex locations for the hg19 reference genome were downloaded from GEO (ID: GSE110582). The G4 locations originate from a study based on G4-seq [59] |
| Transcription start sites (TSS) | Locations of transcription start sites (TSS) were determined based on the UCSC gene annotation file downloaded from the FTP server: http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/refGene.txt.gz |
| Chromatin loops | Chromatin loop data were obtained from the GEO database (ID: GSE63525) for K562 cells [61]. |
| S/MARs | Locations of scaffold/nuclear matrix attached regions (S/MARs) for hg19 were downloaded from the MARome database in a BED file format [62] |
| TADs | Genomic coordinates (hg19) of TADs mapped in 8 cell lines we downloaded from the ENCODE project website at https://www.encodeproject.org/search/?type=Experiment&assay_title=Hi-C [48]. |
| Genomic coordinates (hg19) of TADs mapped in human hESC and IMR90 cells were downloaded from the RenLab website at http://chromosome.sdsc.edu/mouse/hi-c/download.html [49]. |