Genome-Scale Analysis of Replication Timing: from Bench to Bioinformatics

Tyrone Ryba; Dana Battaglia; Benjamin D Pope; Ichiro Hiratani; David M Gilbert

doi:10.1038/nprot.2011.328

. Author manuscript; available in PMC: 2011 Dec 2.

Published in final edited form as: Nat Protoc. 2011 Jun 2;6(6):870–895. doi: 10.1038/nprot.2011.328

Genome-Scale Analysis of Replication Timing: from Bench to Bioinformatics

Tyrone Ryba ¹, Dana Battaglia ¹, Benjamin D Pope ¹, Ichiro Hiratani ², David M Gilbert ^1,^*

PMCID: PMC3111951 NIHMSID: NIHMS271929 PMID: 21637205

SUMMARY

Replication timing profiles are cell type-specific and reflect genome organization changes upon differentiation. In this protocol we describe how to analyze replication timing genome-wide in mammalian cells. Asynchronously cycling cells are pulse labeled with the nucleotide analog 5-bromo-2-deoxyuridine (BrdU) and sorted into S-phase fractions based on DNA content using flow cytometry. BrdU-labeled DNA from each fraction is immunoprecipitated, amplified, differentially labeled, and co-hybridized to a whole-genome CGH microarray, which is currently more cost effective than high-throughput sequencing and equally capable of resolving features at the biologically relevant level of tens to hundreds of kilobases. We also present a guide to analyzing the resulting datasets, based on methods we use routinely. Subjects include normalization, scaling, and data quality measures, loess (local polynomial) smoothing of replication timing values, segmentation of data into domains, and assignment of timing values to gene promoters. Finally, we cover clustering methods and means to relate changes in the replication program to gene expression and other genetic and epigenetic datasets. Some experience with R or similar programming languages is assumed. Altogether, the protocol takes approximately 3 weeks to complete.

Keywords: RT, genome-wide, FACS

INTRODUCTION

Although the mechanisms that specify the timing and placement of origin firing in higher eukaryotes remain a mystery, all eukaryotes have a defined replication timing program that is largely conserved between closely related species¹, including human and mouse²^,³²^,³. Analyses of replication timing in various cell types have yielded insights into genome organization and repackaging events during development, suggesting an important role for the timing program itself or 3D genome organization in regulating developmental gene expression¹^,³^,⁴. In this protocol, we describe approaches for measuring replication timing genome-wide. As data processing and analysis are often a bottleneck in these studies, the protocol also covers methods used routinely in our lab for downstream analysis³^,⁵^,⁶. Although this protocol emphasizes mammalian cells as applied to analyze replication timing changes in various mouse and human cell types³^,⁵^,⁶, it can be adapted to any proliferating cell type, and such variations have been used to analyze replication timing in Drosophila⁷^–⁹, Arabidopsis¹⁰, and budding yeast¹¹.

Overview of the Procedure: Generating experimental data (steps 1–61)

This first portion of the protocol describes how to derive raw data for genome-wide replication timing analysis. Given that the protocol measures the timing of events during the cell cycle, some form of synchronization is required. Synchronization can be achieved either prospectively, prior to cell collection, or retroactively, after the cells have been collected. In yeasts, prospective synchrony methods are well established, and in many cases the same method can be used to compare different strains¹²^,¹³. However, most synchronization schemes for multi-cellular organisms are cumbersome and optimized for specific cell lines¹⁴^–¹⁶, and most require the use of metabolic inhibitors that can interfere with normal regulation of replication¹⁷^,¹⁸. By contrast, retroactive synchronization using a fluorescence activated cell sorter (FACS) to select cells based upon the increase in DNA content during S-phase can be applied to any proliferating cell population without the need for any prior manipulation beyond dissociation of cells into a single-cell suspension¹⁹. Moreover, most prospective synchronization regimes for studying replication timing verify the quality of synchronization by FACS analysis of DNA content; since DNA content defines S phase interval, selection of cells for DNA content is the most direct means to the desired end. The resolution of S phase intervals is determined by the fineness of DNA content windows selected. The only situations in which the above synchronization alternatives may need to be considered are for cells that are very difficult to dissociate or those that are severely aneuploid, such that DNA content does not reflect the time during S phase.

In the original method²⁰^,²¹. cells were labeled with BrdU for a fraction of S-phase and sorted into several different time points during S-phase. BrdU-substituted DNA could then be isolated either based on its increased density or using anti-BrdU-antibodies and specific loci could be examined by hybridization or PCR²⁰^–²². With microarray analysis, replication of the entire genome can be queried in a single array hybridization by limiting the analysis to two differentially labeled samples, allowing all probes to be assigned one internally normalized relative replication timing value and rapid comparison of many samples³^,⁵^,⁶^,²³^,²⁴. One limitation of assigning one replication timing value per map position is that it cannot distinguish cases where homologous loci replicate asynchronously, a situation that is estimated to occur for a few percent of the genome¹⁹. However, the protocol can be readily adapted for analysis of these genomic segments by dividing and sorting S phase into finer fractions¹⁹

The two most popular variations of retroactive synchronization by FACS are described in the Procedure below. In the first method, BrdU-labeled cells are divided into early and late S-phase fractions, and BrdU-labeled DNA synthesized either early or late can then be labeled and hybridized to a microarray. This method produces a high signal to noise ratio since immunoprecipitation (BrdU-IP) substantially enriches for DNA synthesized in each half of S-phase. However, BrdU-IP efficacy can fluctuate and must be closely monitored. In the second method, unlabeled cells are sorted into total S-phase vs. G1-phase populations and DNA from these stages is differentially labeled and used as the target. This obviates BrdU-IP, but the dynamic range is limited to the 2-fold copy number increase during S-phase. Both methods give similar results, evidenced by a direct comparison in the same cell line in one study⁶. In both methods, DNA from each fraction is differentially labeled with Cy3 and Cy5 dyes and then cohybridized to a whole-genome oligonucleotide microarray. The ratio of the abundance of each probe in each fraction is then used to generate a replication-timing profile.

Overview of the Procedure: Normalization and computational analysis of replication timing datasets (steps 62–88)

In this section of the protocol we focus on methods specifically useful for replication timing analysis using whole-genome comparative genome hybridization (CGH) microarrays²⁵., which we have used to investigate the type, degree, and mechanism of replication timing changes in mouse and human cell lines³^,⁵^,⁶^,²³^,²⁴. General methods for normalizing and analyzing microarray experiments for chromatin modifications or transcription at gene promoters have been described in detail in other works²⁶^–²⁹. Similar to two-color microarray designs comparing an experimental sample to a reference, our replication timing experiments employ a two-channel design comparing early versus late fraction enrichment for each target. Typically, we include two dye-swap replicates per sample to address bias due to dye-specific effects, such as more rapid photobleaching of Cy5 dye than Cy3. Our philosophy is to minimize the number of transformations applied to the data and apply only minimally invasive global methods for removing bias and scaling datasets to allow comparisons between them.

All of the analysis described here uses the R framework for statistical computing³⁰^–³². Through user-submitted packages that facilitate a wide variety of methods, R has become an indispensible tool for many common computational tasks. Although R has an initially steep learning curve due to its command line interface, help is available in many locations and forms, including books³³^–³⁵, online manuals (http://cran.r-project.org/), and mailing lists aggregated in the R Mailing Lists Archive (http://tolstoy.newcastle.edu.au/R/). Help can also be found within R itself; str() is often helpful for viewing the structure of variables and datasets, and the ? operator (e.g., ?data.frame() ) provides a help page for the corresponding function. We use the R package LIMMA (linear models for microarray data), also available with a user interface through the limmaGUI package, for normalization and scaling²⁷^,³⁶. The steps for this process are straightforward, and illustrated using two biological replicate datasets of mouse L1210 lymphoblast cells, which are available in raw form in Supplementary Data and after normalization and smoothing at www.ReplicationDomain.org.

We provide this section as a verified route for extracting information from the microarray experiments described in the Procedure; however, users with sufficient experience with R or having different requirements for their data are free to modify the analysis as needed, and a wide array of alternative and additional methods are available through Bioconductor³¹. While our methods for downstream analyses were tested primarily with NimbleGen CGH microarrays, most are applicable to any data format containing chromosome, genomic position, and replication timing information for each probe.

EXPERIMENTAL DESIGN

BrdU Incorporation

The nucleotide analog 5-bromo-2′-deoxyuridine (BrdU) can be used to pulse-label newly synthesized DNA during S-phase. For mammalian cell types that have 8–12 hour S phases, incubation with BrdU for two hours has been empirically determined to provide sufficient incorporation to ensure successful BrdU-IP in subsequent steps yet be short enough to identify even subtle differences in replication timing, such as between female cells with one vs. two early replicating X chromosomes⁵. Success has also been achieved with BrdU labeling times as short as one hour, but subsequent BrdU-IP can be problematic as there is very little substituted DNA relative to background of unsubstituted DNA that will contribute to noise in the BrdU-IP⁶. The BrdU-labeling times for cells with S-phase lengths significantly different to mammalian cells, such as amphibian²⁰ or fly⁸ cells, should be adjusted appropriately.

FACS sorting fractions of S-phase

For first time users, it is recommended that at least 5 × 10⁶ cells be used; however, with experience and a sufficient fraction of S-phase cells, as few as 0.5–1 × 10⁶ starting cells can be successfully profiled. The important parameter is to obtain 20,000 – 30,000 cells in each of the early and late S-phase fractions. As described in Procedure step 1A, ethanol-fixed cells can be stained with propidium iodide (PI) and sorted based on DNA content. Alternative fluorochromes that do not require RNase digestion, such as chromomycin A3, can also be used with ethanol fixed cells²⁰^,²¹. Some cell types tend to clump or produce a lot of cellular debris when fixed in ethanol. For these cell types, the fixation step can be skipped and DNA can be stained with DAPI in permeabilized cells, as described in Procedure step 1B. The advantage of the method described in step 1A is that cells fixed in ethanol can be stored at −20°C (empirically determined to be the optimal temperature) or shipped to collaborators. Shipping should be done on dry ice, with a partition between the tube and the dry ice to prevent cell freezing. All steps, particularly storage, should be performed in the dark since BrdU-substituted DNA is light sensitive.

During FACS analysis forward and side scatter analyses should be used to select an appropriate population of cells free of doublets or cell debris, both of which can hinder accurate sorting of desired populations. Lasers used in this protocol include 488 Blue to detect PI or 407 Violet to detect DAPI in cells that have been stained for DNA content. Two separate fractions of S phase, early and late, are chosen to be collected, but more can be collected if desired²⁰^,²¹.

Immunoprecipitation of BrdU labeled DNA

DNA from BrdU-labeled cells should be sonicated into fragments ranging from 250bp to 2kb and then immunoprecipitated using an anti-BrdU antibody Sonication into fragments of this size helps eliminate immunoprecipitation of DNA that has not been BrdU labeled. If samples have been stored at −20°C prior to beginning the immunoprecipitation, thaw samples in a 56°C water bath to completely dissolve SDS and add 200μl of SDS-PK Buffer pre-warmed to 56°C with 0.05mg/mL glycogen to each sample prior to performing the phenol-chloroform extraction in Procedure step 13.

Quality control check of S-phase DNA

Due to the sensitivity and large number of steps involved, BrdU-IP is one of the trickiest parts of the protocol. To ensure quality, screen BrdU-IPs by PCR amplification using primers specific to DNA markers of known relative replication time (i.e. early or late). Although real-time PCR can be performed, we find gel electrophoresis to be sufficient to evaluate enrichment of DNA in each IP sample. Importantly, as PCR results can vary between aliquots of the same sample, and replication timing can vary between cell types³^,⁵, consistency across multiple samples from the same cell type is the best way to verify quality. Use the primer sets listed in Table 1 for mouse or human cell types, or substitute suitable alternatives to screen several IPs from both early and late S phase fractions.

TABLE 1.

Primers used for human and mouse BrdU IP screen

Name	Sequence	Base pairs
*Human test regions*
Mitochondrial DNA	Forward 5′-CCTAGGAATCACCTCCCATTCC-3′ Reverse 5′-GTGTTTAAGGGGTTGGCTAGGG-3′	168 bp
α-globin	Forward 5′-GACCCTCTTCTCTGCACAGCTC-3′ Reverse 5′-GCTACCGAGGCTCCAGCTTAAC-3′	257 bp
β-globin	Forward 5′-CCTGAGGAGAAGTCTGCCGTTA-3′ Reverse 5′-GAACCTCTGGGTCCAAGGGTAG-3′	241 bp
MMP15	Forward 5′-CAGGCCTCTGGTCTCTGTCATT-3′ Reverse 5′-AGAGCTGAGAAACCACCACCAG-3′	249 bp
BMP1	Forward 5′-GATGAAGCCTCGACCCCTAGAT-3′ Reverse 5′-ACCCGTCAGAGACGAACTTGAG-3′	177 bp
hPTGS2	Forward 5′-GTTCTAGGCTGGTGTCCCATTG-3′ Reverse 5′-CTTTCTGTACTGCGGGTGGAAC-3′	230 bp
hNETO1	Forward 5′-GGAGGTGGAATGCTAGGGACTT-3′ Reverse 5′-GCTGAGTGTGGCCTTAAGAGGA-3′	286 bp
hSLITRK6	Forward 5′-GGAGAACATGCCTCCACAGTCT-3′ Reverse 5′-GTCCTGGAAGTTGAGTGGATGG-3′	281 bp
hZFP42	Forward 5′-CTTGTGGGGACACCCAGATAAG-3′ Reverse 5′-AACCACCTCCAGGCAGTAGTGA-3′	233 bp
hDPPA2	Forward 5′-AGGTGGACAGCGAAGACAGAAC-3′ Reverse 5′-GGCCATCAGCAGTGTCCTAAAC-3′	168bp
*Mouse test regions*
Mitochondrial DNA	Forward 5′-GACATCTGGTTCTTACTTCA-3′ Reverse 5′-GTTTTTGGGGTTTGGCATTA-3′	346 bp
α-globin	Forward 5′-AAGGGGAGCAGAGGCATCA-3′ Reverse 5′-AGGGCTTGGGAGGGACTG-3′	439 bp
β-globin	Forward 5′-CAGTAAGCCACAGATCCTATTG-3′ Reverse 5′-CCCATAGTGACTATTGACTGTG-3′	369 bp
Pou5f1	Forward 5′-CCCTCCCTAAGTGCCAGTTTCT-3′ Reverse 5′-GTAATCGCCCTCAGCAGTGTCT-3′	194 bp
Mmp15	Forward 5′-AACAGAAGGCCTGCCTTGAC-3′ Reverse 5′-TGCATAGCACGACAGCATTG-3′	360 bp
Zfp42	Forward 5′-TGAGATTAGCCCCGAGACTGAG-3′ Reverse 5′-CGTCCCCTTTGTCATGTACTCC-3′	211 bp
Dppa2	Forward 5′-CCACAGGAAGACAGGAAGCAGT-3′ Reverse 5′-AGCCAGACAGGAGCCCTAGAGT-3′	199 bp
Ptn	Forward 5′-CTGGAATGAGTTACTGACGGGG-3′ Reverse 5′-CTGGCCCCACTGTGTAATAAGC-3′	230 bp
Mash1	Forward 5′-GAAGATGAGCAAGGTGGAGACG-3′ Reverse 5′-AGTAGGACGAGACCGGAGAACC-3′	182 bp
Akt3	Forward 5′-GAAGTGTGGGTTGAACCTCTGG-3′ Reverse 5′-GCACCCTCTCCACTGTTCTGAT-3′	173 bp

Component	Amount per reaction (μL)	Final
10X Taq buffer	1.25	1X
10mM dNTPs	0.25	0.2 mM
20 U/μL Taq Polymerase	0.06	1.2 U
F/R 20 μM combined primers	0.31	0.5 μM
Nuclease free water	to 12.5

Cycle number	Denature	Anneal	Extend
1	94°C, 2 min
2–39	94°C, 45 s	60°C, 45 s	72°C, 2 min
40			72°C, 5 min

> plotDensities(r)	# Raw data
> plotDensities(MA.l)	# After within-array normalization
> plotDensities(MA.q)	# After between-array normalization

> plotMA(r, array=1)	# Raw data, replicate 1
> plotMA(MA.l, array=1)	# After within-array normalization

> x = strsplit(as.character(a$PROBE_ID), “FS”)
> y = unlist(x)	# chr [1:770156] “CHR01” “003001832” …

> RTb = subset(RT, RT$CHR == “chr1”)	# Create a subset of timing values in chromosome 1
> par(mar=c(3.1,4.1,1,1),mfrow=c(2,1))	# Set plot margins; include two rows in layout
> plot(RTb[,1]~RTb$POSITION,pch=19,cex=0.2,col=“grey”,ylim=c(−3,3))	# Plot replicate 1
> plot(RTb[,2]~RTb$POSITION,pch=19,cex=0.2,col=“grey”,ylim=c(−3,3))	# Plot replicate 2

> acf(RT[,1],lag=1000)$acf[2]	# Replicate 1: R = 0.742
> acf(RT[,2],lag=1000)$acf[2]	# Replicate 2: R = 0.665
> acf(RT$mLymphAve, lag=1000)$acf[2]	# Averaged 1 and 2: R = 0.762

> chrs = levels(RT$CHR); str(chrs)	# Create a list of all chromosomes
> AllLoess = NULL	# Initialize a variable to store all loess-smoothed data
> for (chr in chrs) {	# For each chromosome,
> RTl = NULL	# Create a variable to store loess-smoothed values
> RTb = subset(RT, RT$CHR == chr)	# Subset the dataset to a single chromosome
> lspan = 300000/(max(RTb$POSITION)-min(RTb$POSITION))	# Set smoothing span
> cat(“Current chromosome: “, chr, “\n”)	# Output current chromosome to console
> RTla = loess(RTb$ mLymphR1~ RTb$POSITION, span = lspan)	# Smooth dataset 1
> RTlb = loess(RTb$mLymphR2~ RTb$POSITION, span = lspan)	# Smooth dataset 2
> RTlc = loess(RTb$mLymphAve ~ RTb$POSITION, span = lspan)	# Smooth dataset 3
> RTl = data.frame(CHR=RTb$CHR, POSITION=RTb$POSITION, RTla$fitted, RTlb$fitted,	# Combine the datasets for the current chromosome
> AllLoess = rbind(AllLoess, RTl)	# Combine current chromosome with overall dataset
> }	# End for loop
> x = as.data.frame(AllLoess)	# Reformat the smoothed datasets as a data frame

> RTc = subset(RT, CHR == “chr1”)	# Subset of raw timing data in chr1
> LSc = subset(LS, CHR == “chr1”)	# Subset of smoothed data in chr1
> par(mar=c(2.2,5.1,1,1), mfrow=c(3,1), col=“grey”, pch=19, cex=0.5, cex.lab=1.8, xaxs=“i”)
> plot(RTc$mLymphR1~RTc$POSITION, ylab=“mLymph R1”, xaxt=“n”)	# Plot raw data points
> lines(LSc$x300smo_mLymphR1~LSc$POSITION, col=“blue3”, lwd=3)	# Overlay loess line
> plot(RTc$mLymphR2~RTc$POSITION, ylab=“mLymph R2”, xaxt=“n”)
> lines(LSc$x300smo_ mLymphR2~LSc$POSITION, col=“blue3”, lwd=3)
> plot(RTc$mLymphAve~RTc$POSITION, xlab=“Coordinate (bp)”, ylab=“mLymph ave”)
> lines(LSc$x300smo_ mLymphAve~LSc$POSITION, col=“blue3”, lwd=3)

	Rep1	Rep2	Ave
Lymphoblast Rep1	1.000	0.978	0.995
Lymphoblast Rep2	0.978	1.000	0.994
Lymphoblast Ave	0.995	0.994	1.000

> par(ask=T,mar=c(3.1,4.1,1,1))	# Set figure margins; ask before replotting
> plot(Seg.mLymph, plot.type=“c”)	# Plot each chromosome separately
> plot(Seg.mLymph, plot.type=“s”)	# Plot overview of all chromosomes
> plot(subset(Seg.mLymph,chromlist=“chr2”), pch=19, pt.cols=c(“gray”,”gray”), xmaploc=T, ylim=c(− 3,3))	# Plot a single chromosome with alternate format

> Lymph = Seg.mLymphR1$output	# Extract domain information
> Lymph$size = Lymph$loc.end - Lymph$loc.start	# Calculate domain sizes
> LymphEarly = subset(Lymph, Lymph$seg.mean > 0)	# Create subset of early domains
> LymphLate = subset(Lymph, Lymph$seg.mean < 0)	# Create subset of late domains
> boxplot(LymphEarly$size, LymphLate$size)	# Distribution of early/late domain sizes

> RTd1 = RT$mLymphR1 - RT$mLymphR2	# Calculate timing differences between datasets
> mLength = length(RTd1)	# Determine total number of probes
> s = 0.67	# Set cutoff for significant changes
> sum(abs(RTd1)>s)/mLength	# Percentage changing, R1 vs. R2
> sum(RTd1 < −s)/mLength	# Early to Late changes: 1.6% of all probes
> sum(RTd1 > s)/mLength	# Late to Early changes: 1.3% of all probes

> mLymph.R1 = NULL; mLymph.R2 = NULL	# Initialize variables to store averaged data
> nWin = 35	# 5.8kb median probe spacing 35 = 203kb*
> mLength = nrows(RT)/nWin	# Calculate number of windows
> for (x in 1:mLength) {	# For each potential window,
> z1 = x * nWin	# Determine probe number at window start
> z2 = (x+1) * nWin	# Determine probe number at window end
> mLymph.R1[x] = mean(RT$mLymphR1[z1:z2])	# Average replicate 1 across window
> mLymph.R2[x]= mean(RT$mLymphR2[z1:z2])	# Average replicate 2 across window
> cat(“Current window: “, x, “/”, mLength, “\n”)	# Write the current window to the console
> }	# End for loop
> RTWind = data.frame(mLymph.R1, mLymph.R2)	# Write the results to a new data frame

> plot(cluster.bootstrap)	# Plot overall dendrogram
> pvrect(cluster.bootstrap)	# Outline datasets that cluster at a significant level

> quantile(dRTdom$seg.mean, probs = c(0.05, 0.95))	# Top 5% of changes to early/late
> quantile(dRTdom$seg.mean, probs = c(0.40, 0.60))	# Middle 20% of smallest changes
> LtoEdom = subset(dRTdom, dRTdom$seg.mean > 1.28552)	# Isolate late-to-early domains
> EtoLdom = subset(dRTdom, dRTdom$seg.mean < −1.32328)	# Isolate early-to-late domains
> middleDom = subset(dRTdom, dRTdom$seg.mean > −0.14808 & dRTdom$seg.mean < 0.23698)	# Isolate non-switching domains
> boxplot(middleDom$size, LtoEdom$size, EtoLdom$size)	# Plot distributions of domain sizes

> for(chr in chrs) {	# For each chromosome,
> RTc = subset(RT, CHR == chr)	# Create subset of timing values in the chromosome
> RSc = subset(RefSeq, CHR == chr)	# Create subset of RefSeq genes in the chromosome
> cat(“Current chromosome: “, chr, “\n”)	# Output current chromosome to console
> lspan = 300000/(max(RTc$POSITION)-min(RTc$POSITION))	# Set smoothing span
> smLym1 = loess(RT$mLymphR1 ~ RT$POSITION, span = lspan)	# Smooth dataset 1
> smLym2 = loess(RT$mLymphR2 ~ RT$POSITION, span = lspan)	# Smooth dataset 2
> smLym3 = loess(RT$mLymphAve ~ RT$POSITION, span = lspan)	# Smooth dataset 3
> Lym1 = predict(smLym1, RSc$TSS)	# Predict (interpolate) values at transcription start sites
> Lym2 = predict(smLym2, RSc$TSS)	# Predict values for dataset 2
> Lym3 = predict(smLym3, RSc$TSS)	# Predict values for dataset 3
> ChrSm = data.frame(CHR=chr,POSITION= RSc$TSS, Lym1, Lym2, Lym3)
> AllSm = rbind(AllSm, ChrSm)	# Combine information for all experiments/chromosomes
> }	# End for loop

> for (chr in chrs) {	# For each chromosome,
> RSc = subset(RefSeq, CHR == chr)	# Create subset of RefSeq genes in the chromosome
> MKc = subset(Marks, CHR == chr)	# Create subset of mark values in the chromosome
> for(m in 1:nrow(RSc)) {	# For each gene in the chromosome,
> if(RSc[m,]$Strand == “+”) {	# If the gene is in the forward orientation,
> RTcSub = subset(RTc, (RTc$Start < RSc[m,]$txStart +500) & (RTc$Start>
RSc[m,]$txStart - 2500))	# Collect values from txStart +500 to −2500bp
> AllHist = rbind(AllHist, apply(RTcSub, 2, max)[3:12])	# Assign max value to gene
> AllGenes = rbind(AllGenes, RSc[m,]$Gene)	# Combine with overall list
> }	# End if
> if(RSc[m,]$Strand == “−”) {	# If the gene is in the reverse orientation,
> RTcSub = subset(RTc, (RTc$Start < RSc[m,]$txEnd +2500) & (RTc$Start >
RSc[m,]$txEnd - 500))	# Collect values from txEnd +2500 to −500 bp
> AllHist = rbind(AllHist, apply(RTcSub, 2, max)[3:12])	# Assign max value to gene
> AllGenes = rbind(AllGenes, RSc[m,]$Gene)	# Combine with overall list
> }	# End if
> cat(“Chromosome:”, chr, “Gene:”, m, “/”, nrow(RSc), “\n”)	# Output current gene
> }	# End gene loop
>}	# End chromosome loop

> dom = 0	# Initialize domain number to 0
> for(chr in chrs) {	# For each chromosome,
> Seg.RTb = subset(Seg.RT, Seg.RT$chrom == chr)	# Get timing domains in chromosome
> MarksB = subset(Marks, Marks$CHR == chr)	# Get mark data in chromosome
> for (i in 1:dim(Seg.RTb)[1]) {	# For each domain,
> cat(“Current chr:”, chr, “Domain:”, dom, “\n”)	# Output current domain
> MarksD = subset(MarksB, MarksB$Start > Seg.RTb[i,]$loc.start & MarksB$Start <
Seg.RTb[i,]$loc.end)	# Find subset of marks in domain
> MarksD = MarksD[,3:12]	# Exclude chr/pos from mark data
> MarksD[,1:10] = MarksD[,1:10] - MarksD[,1]	# Subtract control values, if needed
> MarksData = rbind(MarksData, apply(MarksD,2, “mean”))	# Average mark data in domain
> dom = dom + 1	# Increment domain number
> }	# End domain loop
> }	# End chromosome loop

Step	Problem	Possible Reason	Solution
6	Cell aggregation or debris accumulation prevents accurate cell sorting	Failure to achieve single cell suspension with certain problematic cell types	Incubate with enzyme treatment, such as Tryspin-EDTA or accutase, for a longer period of time. Use gentle trituration to ensure that cell aggregates are broken apart prior to fixation and/or sorting. Occasional pausing and filtering of cell samples during FACs may help.
6		Vortexing during ethanol fixation was too harsh	Use the lowest vortex setting available while adding ethanol dropwise
49	Inconsistent PCR bands between aliquots of the same sample	Contamination between fractions during FACS, likely due to problems in cell fixation	Switch from PI staining (with fixation) to DAPI staining (without fixation)
		Inconsistent number of cells aliquoted to each tube	Mix contents thoroughly before aliquoting and freezing for storage. Aliquot 20,000 cells while the samples are hot, to avoid pipetting errors due to SDS formation in the solution
		BrdU labeling time was insufficient	Incubate growing cells with BrdU for a longer period of time. Cells with longer S-phases require longer BrdU incubation times.
		Varying efficiency of BrdU-immunoprecipitation between samples caused by loss of DNA-protein pellet.	Use caution when removing supernatant from the loose DNA-protein pellet. Centrifuge the sample multiple times as needed to remove supernatant without disturbing the pellet.
57	Samples do not pass screening	Bias created during WGA	Increase the amount of starting material for WGA. For instance, start with 100 ul of IP sample pool instead of 50 ul at step 52.
81	Skew towards early or late values	Bias created during WGA or labeling, or excessive photobleaching during scanning.	Check early vs. late WGA yields, and avoid multiple scans of the array
84	Low autocorrelation (high level of noise)	Values are not properly sorted by chromosomal location	Ensure that chromosome and position columns are properly assigned to experimental values, and sorted as in step 80
84	Low autocorrelation (high level of noise)	Low signal intensity (step 74)	Check yield after labeling and amplification steps, as well as scanner settings.
86c(v)	Large difference in domain numbers between similar datasets	Sensitivity of segmentation algorithms to differences in data quality	Either adjust the parameter undo. SD (using similar autocorrelation-level datasets as a guide), or add Gaussian noise to higher-quality datasets to equalize their acf (step 84) prior to segmentation.

PERMALINK

Genome-Scale Analysis of Replication Timing: from Bench to Bioinformatics

Tyrone Ryba

Dana Battaglia

Benjamin D Pope

Ichiro Hiratani

David M Gilbert

SUMMARY

INTRODUCTION

Overview of the Procedure: Generating experimental data (steps 1–61)

Overview of the Procedure: Normalization and computational analysis of replication timing datasets (steps 62–88)

EXPERIMENTAL DESIGN

BrdU Incorporation

FACS sorting fractions of S-phase

Immunoprecipitation of BrdU labeled DNA

Quality control check of S-phase DNA

TABLE 1.

Amplification methods for immunoprecipitated single-stranded DNA

Labeling and hybridization of amplified samples

Array design

Array scanning

Quality control of microarray data

Downstream analysis

MATERIALS

REAGENTS

EQUIPMENT

REAGENT SETUP

EQUIPMENT SETUP

PROCEDURE

BrdU labeling and staining of cells for FACS

BOX 1. Method for sorting according to S/G1phase - TIMING 1 d.

S/G1 FACS Sorting TIMING - 1 d

(A) Labeling and PI staining of cells for FACS following ethanol fixation - TIMING 3.5 h

(B) BrdU labeling and DAPI staining of cells for FACS - TIMING 3 h

Figure 1.

BrdU immunoprecipitation - TIMING 2 d

PCR method for quality control of BrdU-immunoprecipitation - TIMING 5 h

Whole genome amplification - TIMING 8 h

Labeling and hybridizing - TIMING 1–3 d

Normalization of raw datasets TIMING - 1 d

Figure 3.

Figure 4.

Figure 5.

A.) Copy position and chromosome columns from original .pair files

B.) Parse position and chromosome information from PROBE_ID column

Figure 6.

Figure 7.

Figure 8.

Static properties of the timing program in a given cell type TIMING - 3 h

A.) Loess smoothing

Figure 9.

B.) Correlations between datasets

C.) Segmentation

Figure 10.

Dynamic changes in the timing program TIMING - 3 h

A.) Percent changes analysis

B.) Clustering approaches

C.) Properties of RT switching domains

Comparison and alignment to outside datasets TIMING - 6 h

A.) Assignment of replication timing values to gene promoters

B.) Assignment of histone and other epigenetic marks to gene promoters

C.) Integration of epigenetic mark values over replication domains

TIMING

TABLE 2.

ANTICIPATED RESULTS

Figure 2.

Supplementary Material

Acknowledgments

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases