Examining cellular heterogeneity in human DNA methylation studies: Overview and recommendations

Maggie Po-Yuan Fu; Sarah Martin Merrill; Keegan Korthauer; Michael Steffen Kobor

doi:10.1016/j.xpro.2025.103638

. 2025 Feb 12;6(1):103638. doi: 10.1016/j.xpro.2025.103638

Examining cellular heterogeneity in human DNA methylation studies: Overview and recommendations

Maggie Po-Yuan Fu ¹, Sarah Martin Merrill ^1,², Keegan Korthauer ^1,³, Michael Steffen Kobor ^1,^4,^∗

PMCID: PMC11969412 PMID: 39951379

Summary

Intersample cellular heterogeneity (ISCH) is one of the largest contributors to DNA methylation (DNAme) variability. It is imperative to account for ISCH to accurately interpret analysis results in epigenome-wide association studies. We compiled this primer based on the current literature to guide researchers through the process of estimating and accounting for ISCH in DNA methylation studies. This primer outlines the procedure of bioinformatic ISCH prediction, including using reference-based and reference-free algorithms. It then follows with descriptions of several methods to account for ISCH in downstream analyses, including robust linear regression and principal-component-analysis-based adjustments. Finally, we outlined three methods for estimating differential DNAme signals in a cell-type-specific manner. Throughout the primer, we provided statistical and biological justification for our recommendations, as well as R code examples for ease of implementation.

Subject area: Genomics, Molecular Biology

Introduction

Epigenetics often refers to the study of mitotically heritable molecular marks that can perpetuate distinct gene activity states with the same underlying DNA sequence.¹ These marks become already highly relevant in early development due to their roles in establishing tissue- and cell-population-specific functions and maintenance of gene expression profiles.²^,³ Disruption of normal epigenetic regulation reshapes the cellular differentiation landscape,⁴ highlighting the role of epigenetic memory in defining cell types and their functions. For example, in CD8+ T cell differentiation, as naive cells differentiate into effector cells, the cells exhibit activation of functional effector genes via DNA methylation (DNAme) and histone modification regulation.⁵ Many studies have highlighted the cell-type-specific nature of epigenetic regulation, and it is therefore unsurprising that not just tissues but even their constituent cell types are the most relevant factors driving epigenetic variability in healthy human populations.⁶^,⁷

Of the many interesting epigenetic marks,⁸ DNAme is by far the one most studied in the context of epigenome-wide association studies (EWAS),⁹ which generally refers to the population-level investigation of the associations between variables of interest and DNAme differences across the epigenome.⁸^,¹⁰^,¹¹ As EWAS matured over the past decade, the link between cell type and DNAme patterns is now regularly accounted for. As cellular composition of samples may vary in heterogeneous tissues, such as blood, differences in DNAme measured between two groups may either reflect average DNAme changes across all cell types or simply represent differences in cell type composition between groups.⁶ This cell type composition difference across samples, or intersample cellular heterogeneity (ISCH), can be measured directly or predicted with bioinformatic methods. Sometimes, ISCH is a concomitant variable epigeneticists want to control for in their studies, akin to other contributors of interindividual DNAme variability, such as genetic background, age, and biological sex.⁷^,¹²^,¹³^,¹⁴ As ISCH can be a potential confounder in the relation between DNAme and the variable of interest, failure to account for ISCH in EWAS might lead to inflated false-positive and false-negative rates.⁶^,¹³ Other times, ISCH can be the outcome of interest, as changes in cellular composition can inform a range of biological signals, from physiological stress and immune development to disease risks.⁷^,¹⁵^,¹⁶^,¹⁷

The primer aims to address the presence of DNAme-based cellular heterogeneity in human studies and methods to account for this variability. While the methods referenced in the primers are often developed specifically for population-based studies, they are generally applicable to human DNAme studies. When researchers set out to examine cellular heterogeneity in epigenomic studies, the process typically involves, first, the estimation of ISCH in the cohort, followed by adjustment for it in downstream analysis, or, alternatively, the estimation of the cell-type-specific differential DNAme signature. In this primer, we outline the bioinformatic deconvolution tools DNAme currently employed to estimate cell type proportions from DNAme measures. We discuss the applicability of both reference-based and reference-free deconvolution methods. In addition, we provide code examples for ISCH prediction in four commonly used tissues: blood, buccal swabs, saliva, and brain. Then, we present methods developed to adjust for ISCH in the context of EWAS or other analyses. Finally, we discuss three algorithms to estimate cell-type-specific DNAme associations, as well as limitations of the methods. A decision scheme is provided in Figure 1 to guide the readers through this process, depending on the ultimate goal of their studies, and their respective available measures.

Decision scheme for examining ISCH in DNAme studies

Starting from the cellular composition of the dataset, if the samples are composed of heterogeneous cell populations, follow the decision tree to determine optimal strategy to account for the ISCH. Blue boxes signify decision points, and yellow circles indicate the sections in this article that contain relevant detail. The purple circles represent different analysis strategies depending on the users’ dataset characteristics and desired analysis. EWAS, epigenome-wide association study. Created with BioRender.com.

All pseudocode in this article is based on R and tested in R Studio.¹⁸^,¹⁹ We have included a vignette with an example dataset for the readers to follow along and examine output format at https://rpubs.com/mfu/1217877. To ensure that the packages can be installed and run as presented here, we recommend R version 4.1.0+. As general setup, there are several required R packages for the code example provided in each section, which are mentioned at the beginning of each code block. Some of these are available on CRAN, and others are on Bioconductor.²⁰ The example code in Box 1 showcases how the packages can be downloaded from respective repositories.

Box 1. Pseudocode for package installation in R.

# Install CRAN packages

install.packages("dplyr")

# Install Bioconductor packages

if (!require("BiocManager", quietly = TRUE))

install.packages("BiocManager")

BiocManager::install("minfi")

The code example and the packages used are also optimized for the most commonly used Illumina Infinium BeadChip microarrays, both the 450K array and its successor, the EPIC (850K) array.²¹^,²² The methods can be adapted for the latest EPIC v2 array by specifying the annotation and sequencing-based DNAme profiling methods with packages like methyLiftover as well.²³^,²⁴ Estimation of ISCH usually requires some level of preprocessing of raw DNAme array data to obtain either an RGChannelSet object or a beta matrix—i.e., a table of beta values corresponding to the percentage of cells methylated with values ranging from 0 to 1. For ease of data processing, we present a curated list of cell type prediction tools compatible with each input type in Table 1. In most cases, preprocessing should be applied prior to extraction of the beta matrix, including quality control, background color correction, and probe type normalization (Box 2).²⁵ Preprocessed beta matrix will be the expected input for methods that account for ISCH or estimate cell-type-specific DNAme signature.

Table 1.

Available reference-based ISCH prediction algorithms and reference datasets with corresponding packages, compatible tissues, and required input types

Tissue	Package/Source	Function	Input type
Methods available; reference dataset available

Blood, cord blood, brain	minfi²⁶^,²⁷^,²⁸	estimateCellCounts	RGChannelSet
Blood, cord blood	ewastools²⁹	estimateLC	beta matrix (preprocessed)
Blood, cord blood, brain	FlowSorted.Blood.EPIC³⁰^,³¹^,²⁶	estimateCellCounts2	RGChannelSet
Blood, buccal swab, saliva, breast, cervix, lung^a	EpiDISH³²	hepidish	beta matrix (preprocessed)
Blood	methylCC³³	estimatecc	RGChannelSet or BSseq
Brain	HiBED³⁴	HiBED_deconvolution	beta matrix (preprocessed)
Brain	scMD³⁵	scMD	beta matrix (preprocessed)
Brain	cets³⁶	cetsBrain	beta matrix (preprocessed)
Placenta	planet³⁷	plCellCpGsFirst	beta matrix (preprocessed)
Blood, cord blood, brain, liver, muscle, and others	CimpleG³⁸	CimpleG	beta matrix (preprocessed)
Tissues with matched expression data	scDeconv³⁹	refDeconv	beta matrix (preprocessed)
Cell-free DNA (to deconvolute tissues of origin with reference from 29 major human tissues)	cfSort⁴⁰	see below^b	sequencing data (bismark-aligned BAM file)
Cell-free DNA (to deconvolute tissues of origin with reference from 10 tissues including blood, brain, heart, and others)	CelFiE⁴¹	see below^b	processed sequencing data
Cell-free DNAme from cancer and healthy individuals	MethylBERT⁴²	see below^b	N/A
Solid tumors across 33 cancer typesa	MethylResolver⁴³	MethylResolver	beta matrix (preprocessed)
20 solid tumor types and immune cells	HiTIMED⁴⁴	HiTIMED_deconvolution	beta matrix (preprocessed)
Tumor-infiltrating lymphocytes	methylCibersort⁴⁵	CIBERSORT	beta matrix (preprocessed)
Tumor-infiltrating lymphocytes	EMeth⁴⁶	cv.emeth	beta matrix (preprocessed)
Immune cells, and other unknown cell types are estimated with reference-free method	PRMeth⁴⁷	prmeth	beta matrix (preprocessed)

Only reference dataset available^c

Saliva	BeadSorted.Saliva.EPIC⁴⁸		N/A
13 tissues, including brain, lung, heart, liver, and others	EpiSCORE⁴⁹		N/A
Breast tissue, breast milk	GSE74877, GSE67024, and GSE110554 ⁵⁰		N/A
More than 17 tissues, including blood, brain, muscle, heart, digestive system, and others	ENCODE project⁵¹		N/A
Compilation of more than 23 tissues of origin from public datasets and adipocytes, hepatocytes, alveolar lung cells, and others	Moss et al.⁵²		N/A

Only methods available

	EnsDeconv⁵³	EnsDeconv	beta matrix (preprocessed)

Open in a new tab

Although the MethylResolver package states that it estimates ISCH in 33 solid tumors, the package estimates the relative proportion of tumor and immune cells.⁴³ Similarly, the EpiDISH reference dataset includes only immune cells, epithelial cells, fibroblasts, and adipocytes. Although these are the major cell types in tissues such as the lung, there are also minor populations of unaccounted-for cell types, including muscle cells, nerve cells, and other supporting cells.³²

cfSort,⁴⁰ CelFiE,⁴¹ and MethylBERT⁴² were developed in Python for next-generation-sequencing-based data such as whole genome bisulfite sequencing or reduced representation bisulfite sequencing. We included them here as they utilized novel machine-learning-based algorithms for cell type deconvolution.

A few of the reference datasets listed do not have corresponding functions for ISCH estimation. The cell-type-specific probes can be selected with the IDOL or t test methods, followed by regression using, for example, the EpiDISH package. Some of these comprehensive DNAme tissue datasets were specifically utilized in studies of cell-free DNA to deduce the tissue of origin.⁴⁰^,⁵²^,⁵⁴

Box 2. Pseudocode for reading raw IDAT files into R, and minimal preprocessing for beta matrix extraction.

library(minfi)

library(dplyr)

yourRGChannelSet <- read.metharray.exp("GSE112618_RAW")

# Minimal preprocessing - only background color correction and inter-sample normalization

yourBetaMatrix <- preprocessFunnorm(yourRGChannelSet) %>%

getBeta()

Obtaining ISCH information

ISCH information of a dataset can be obtained via several methods, broadly divided into direct and indirect types. Direct method types include staining and microscopy of histological samples, clinical report of complete blood count (CBC), flow cytometry, and cytometry by time of flight (CyTOF), whereas DNAme-based ISCH prediction is considered indirect. Depending on the tissue of interest, DNAme-based prediction may be more or less accurate or granular than alternative sources of cell count information. In cases of blood, for example, the most advanced DNAme-based prediction gives higher resolution of cell type specificity than CBC, because of the ability to resolve populations of adaptive immune cells.³⁰^,³¹^,⁵⁵ On the other hand, flow cytometry and CyTOF provide greater granularity in cell counts and are thus considered the gold standards for cell count information.⁷^,²⁶ The reported correlations between flow counts and predicted counts range from 0.15 to 0.99, depending on the cell types and prediction methods.²⁶^,⁵⁶ When the direct cell count measurement provides higher resolution than the bioinformatic prediction, the user can either bypass the prediction or compare the predicted ISCH with cell count measurement as a sanity check (refer to Table 1 for available cell types for each reference dataset).

Reference-based ISCH prediction

As studies in human populations are typically performed in more readily accessible tissues, various types of blood preparations (e.g., cord blood, whole blood, dried blood spots, buffy coat, or peripheral blood mononuclear cells [PBMCs]), buccal swabs, and saliva samples tend to be the tissues of choice for such DNAme studies. DNAme reference profiles of sorted cells have been created for these commonly investigated tissues, as well as other biologically relevant tissues like the brain (Table 1). Reference-based prediction methods can be applied to these tissues, while reference-free methods are commonly adopted for tissues lacking reference profiles.

Reference-based methods are recommended for tissues where a reference DNAme dataset of sorted cell populations is available, because the deconvolution process is based on biologically measured DNAme profiles instead of statistically inferred components, and they tend to yield more accurate estimates than reference-free methods.⁵⁷^,⁵⁸^,⁵⁹ An exception is when the epigenetic profile of the reference dataset and/or the set cell types available in the reference can be reasonably assumed to differ markedly from that of the population sampled. For example, the cord blood reference dataset was created because the presence of nucleated red blood cells and other distinct immune cell DNAme signatures in the fetal and neonatal circulation rendered the adult reference dataset inadequate for studying the differences in DNAme associated with systemic exposures at early developmental stages.²⁶ User discretion is therefore strongly advised in the selection of an appropriate reference dataset. The steps for reference-based cell type prediction are outlined in Figure 2.

Flowchart of reference-based cell type (CT) proportion prediction for estimating ISCH in heterogeneous tissues

The top row shows the general steps, and the bottom row shows an example of the algorithm. The pre-selected probe sets are curated either based on supervised learning optimized on cell type classification (e.g., IDOL probes) or based on biological relevance of the DNAme sites in cell type identity (e.g., DHS probes). The DNAme level is represented with the notation β_, and the proportion of a given CT is represented with p. CP, constraint projection; RPC, robust partial correlation; SVR, support vector regression.

Example: Estimating cell type proportions in blood

The three main reference-based regression methods available for prediction are constraint projection (CP),⁶⁰ robust partial correlation (RPC),³² and support vector regression (SVR).⁶¹ CP, also known as the Houseman method, utilizes quadratic programming to enforce constraints on the coefficients during regression such that the estimated cell type proportions are positive and sum to 1. RPC and SVR impose the same set of constraints in a post hoc manner; i.e., adjusting predicted proportions above 1 and below 0 to 1 and 0, respectively, post-deconvolution.⁶² One of the most commonly used functions for estimating cell type proportion in human blood using a reference is estimateCellCount2 in the FlowSorted.Blood.EPIC package, which uses CP to estimate cell type proportions. Box 3 shows an example for running this method to estimate blood cell type composition in whole blood from reference data.

Box 3. Pseudocode for estimating blood cell type proportions.

library(FlowSorted.Blood.EPIC) # Bioconductor package

library(ExperimentHub) # Bioconductor package

hub <- ExperimentHub()

FlowSorted.Blood.EPIC <- hub[["EH1136"]] # Loading reference dataset

ecc_output <- estimateCellCounts2(yourRGChannelSet,

processMethod = "preprocessNoob", #^a

probeSelect = "auto", #^b

cellTypes = c("CD8T", "CD4T", "NK", "Bcell", "Mono", "Neu"),

referencePlatform = "IlluminaHumanMethylationEPIC",

referenceset = "FlowSorted.Blood.EPIC", # Specify the reference dataset.

IDOLOptimizedCpGs = NULL, #^c

returnAll = T)

blood_ct <- ecc_output$prop # Output the estimated proportions

^aNormalization of reference and sample datasets minimize probe type bias and batch effects. The estimateCellCounts2 function vignette in the FlowSorted.Blood.EPIC R package mentions two normalization methods: preprocessNoob normalization, which corrects for background and dye bias, is recommended for cord blood; and preprocessQuantile, which adjusts the distribution of methylated and unmethylated intensity levels, is recommended for blood and brain.⁶³ Refer to the minfi package vignette for other preprocessing methods supported by the function (https://bioconductor.org/packages/devel/bioc/vignettes/minfi/inst/doc/minfi.html).

^bThe probes that are most differentially methylated for each given cell type (e.g., CD4T versus not CD4T) in the reference data are selected by largest magnitude of t test statistics. For estimateCellCounts2, the default is 100 for each cell type.

^cAs alternatives to automatic probe selection, as described in the previous note, probe sets optimized for cell type prediction have been curated by multiple research groups through literature searches or machine learning, such as the DHS (DNase hypersensitive sites)⁶² and IDOL (identifying optimal libraries) probes.⁵⁶ We have found that some probes in the curated probe sets are occasionally removed in the quality control steps, and the general applicability of these probe sets has not been demonstrated by independent studies. Therefore, we recommend automatic probe selection for the prediction step. To use the IDOL probes, the probeSelect argument should be changed to “IDOL” and the probe set should be specified in the IDOLOptimizedCpGs argument.

The estimateCellCounts2 function only performs CP-based regression. To use RPC or SVR, the coefficient table (i.e., mean DNAme levels for predictive loci per cell type) should be exported and the prediction step can be performed with EpiDISH (Boxes 4 and 5).

Box 4. Pseudocode for obtaining coefficient table of mean cell-type-specific DNAme levels from estimateCellCounts2.

blood_coef <- as.matrix(ecc_output$compTable)

blood_ct <- ecc_output$prop

Box 5. Pseudocode for estimating buccal swabs or saliva cell type proportions.

library(EpiDISH) # Bioconductor package

data(centEpiFibIC.m) # Reference for fibroblasts, epithelial cells, and immune cells.

data(centBloodSub.m) # Reference for immune cell subtypes.

buccal_ct <- hepidish(beta.m = yourBetaMatrix,

ref1.m = centEpiFibIC.m,

ref2.m = centBloodSub.m,

h.CT.idx = 3, # Specify the column in ref1.m to which ref2.m applies.

method = "RPC")

Example: Estimating cell type proportion in buccal swabs or saliva

The EpiDISH R package is commonly used to estimate cell type proportions in buccal swabs or saliva samples, because of the availability of epithelial and fibroblast references.³² The hepidish function in this package specifically utilizes a hierarchical structure to facilitate predictions with increasingly finer levels of resolution. For example, it first estimates fibroblasts, epithelial and immune cell proportions in buccal swab samples, followed by deconvolution of the proportion of each immune cell subtype. While EpiDISH does not support normalization or probe selection, it allows specification of the regression method, i.e., CP, RPC, or CBS (Cibsersort/SVR). In our experience, due to the post hoc nature of RPC and SVR, these methods generate more predictions at the boundary conditions (i.e., more 0s and 1s). In tissues composed of mostly a single cell type, such as buccal swabs, this can translate to more accurate predictions.

Example: Estimating cell type proportion in the brain

The novel brain reference dataset and ISCH prediction package HiBED (hierarchical brain extended deconvolution) predicts brain cell populations with a hierarchical structure, first grouping them into endothelial and stromal, glial, and neuronal proportions, followed by finer predictions of seven major cell types (i.e., GABAergic neurons, glutamatergic neurons, astrocytes, microglial cells, oligodendrocytes, endothelial cells, and stromal cells).³⁴ The package selects probes based on empirical Bayes moderated t statistics and performs CP-based regression. In the example code below, we have included a beta-mixture quantile (BMIQ) normalization step, as implemented in the original publication. Alternatively, one can opt to skip or perform other normalization methods instead of the one in Box 6.

Box 6. Pseudocode for estimating brain cell type proportions.

library(HiBED) # install with devtools::install_github("SalasLab/HiBED")

library(minfi) # Bioconductor package

library(wateRmelon) # Bioconductor package

# Normalization

brain_Mset <- preprocessNoob(yourRGChannelSet) # noob normalization for color correction

brain_betas <- BMIQ(brain_Mset) # alternatively, do getBeta(brain_Mset) to extract beta matrix without BMIQ

# Brain ISCH prediction

HiBED_result <- HiBED_deconvolution(brain_betas,

h = 2) # 2 layers of deconvolution yields 7 brain cell types

Multi-omics ISCH predictors

Recently, some reference-based ISCH algorithms have implemented integration of data beyond DNAme to improve prediction accuracy or the range of estimated cell types. EpiSCORE (epigenetic cell type deconvolution using single cell omic references), for example, utilizes single-cell RNA sequencing (scRNA-seq) information for tissues without appropriate DNAme references.⁶⁴ EpiSCORE identifies tissue-specific transcriptomic markers based on scRNA-seq atlases, followed by imputation of corresponding DNAme signatures using matched expression and DNAme datasets, which can then be applied in ISCH estimation. A detailed protocol has been developed, which provides step-by-step instructions regarding the process.⁶⁵

Similarly, the scDeconv package enables users to predict ISCH with scRNA-seq or RNA data corresponding to DNAme data without an appropriate reference.³⁹ The algorithm requires a matched RNA-seq and DNAme dataset in the tissue of interest, where the estimated cell type proportion from the matched RNA-seq data is used to train the DNAme-based cell type predictor. A detailed guide for scDeconv can be found on its GitHub page (https://github.com/yuabrahamliu/scDeconv).

Reference-free ISCH prediction

Reference-free prediction provides a solution to address ISCH if no applicable reference DNAme dataset is available for the tissue of choice. In general, reference-free prediction assumes that cell types produce a strong and consistent signal in DNAme across samples, which the algorithm is designed to estimate.⁵⁹ The reference-free methods have yet to be reviewed systematically. However, studies comparing a selection of reference-free methods—surrogate variable analysis (SVA),⁶⁶ FaST-LMM-EWASher,⁶⁷ ReFACTor,⁶⁸ RefFreeEWAS,⁶⁹ and RefFreeCellMix⁷⁰—showed that SVA consistently yielded high sensitivity and specificity across study designs.⁵⁸^,⁷¹^,⁷²

SVA requires a linear model that specifies the variables of interest and calculates the residuals of this linear regression model. SVA then estimates the major sources of variation through singular value decomposition (SVD). These sources of variation are designated as “surrogate variables” (SVs), and their corresponding DNAme profiles are then calculated. Each estimated SV does not necessarily correspond to a single cell type, as highly correlated cell type populations, or any other major source of biological or technical variation, can be decomposed into a single SV.

Note: The sva function does not allow missing values, so imputation is essential and can be done in a few different ways.⁷³^,⁷⁴^,⁷⁵ We showed one example in Box 7 with the impute.knn function from the impute package, which calculates the Euclidean distance of the DNAme matrix and estimates the missing values with the average DNAme of the k nearest neighbors.

Box 7. Pseudocode for reference-free ISCH prediction with sva.

library(sva) # Bioconductor package

library(impute) # Bioconductor package

yourBetaMatrix <- impute.knn(yourBetaMatrix)$data # Note

mod <- model.matrix(∼as.factor(mainVar), data = yourPhenoData) # For mainVar, enter the column name in yourPhenoData.

mod0 <- model.matrix(∼1, data = yourPhenoData)

yourBetaMatrix <- yourBetaMatrix[, rownames(mod)] # The model.matrix function removes samples with missing values, so ensure that the same samples are included in both data frames.

n.sv <- num.sv(yourBetaMatrix, mod, method = "leek") # Estimate the number of SVs

svaobj <- sva(yourBetaMatrix, mod, mod0, n.sv = n.sv)

sv <- svaobj$sv

We recommend starting with SVA for reference-free ISCH prediction. By examining the SV’s association with known variables, the SVs can be interpreted. Assume, for example, that we have a sample from a tissue that is expected to have five major cell types, and SVA identified eight SVs. One can start off by generating a correlation matrix to examine the relationship with available meta data variables that may contribute to DNAme variability, including sex, age, genetic ancestry, and batch variables. If, in the example provided in Box 7, five SVs are correlated with these major contributors and three are not, one can imagine the variability of the five cell types to be captured in three SVs. On the other hand, if none of the SVs is correlated with any of the major contributors of DNAme variability, then observing eight SVs when there are only five cell types suggests that SVA may be capturing variability attributable to unmeasured factors, in addition to ISCH arising from the expected major cell types. In these cases, users are encouraged to apply a few alternative algorithms and compare their prediction outputs. In addition to the number of predicted fractions, another way to roughly assess the prediction performance is by comparing the average adjusted R² of regressing DNAme against each reference-free prediction method (Box 8). Higher adjusted R² indicates that the model explains a higher proportion of the variability in DNAme, adjusted for the relative number of predictors included. If the number of major cell types in the tissue is not known, algorithms such as RefFreeCellMix and MeDeCom that optimize the number of potential cell types may be appropriate. A list of other currently available reference-free ISCH estimation algorithms is presented in Table 2.

Box 8. Pseudocode for comparing mean variance explained, as measured by R2, by different ISCH estimation methods.

library(dplyr) # CRAN package

summaryR2 <- apply(yourBetaMatrix, 1, function(x){

mod1 <- lm(formula = x ∼ ct1 + ct2 + ct3…, data = ISCH1) # predicted ISCH from method 1

mod2 <- lm(formula = x ∼ ct1 + ct2 + ct3…, data = ISCH2) # predicted ISCH from method 2

return(c(mod1_R2 = summary(mod1)$adj.r.squared, mod2_R2 = summary(mod2)$adj.r.squared))

}) %>% t() %>% as.data.frame()

mod1_R2_avg <- mean(summaryR2$mod1_R2)

mod2_R2_avg <- mean(summaryR2$mod2_R2)

Table 2.

Reference-free ISCH prediction algorithms

Algorithm	Description
SVA⁶⁶	estimates major sources of variation (i.e., surrogate variables) through singular value decomposition (SVD)
RefFreeEWAS⁶⁹	RefFreeEWAS adapts SVA to estimate ISCH specifically; it decomposes the total variance of the DNAme in a linear additive model to represent the cell mixture conditions^a
FaST-LMM-EWASher⁶⁷	FaST-LMM-EWASher combines intersample correlation and principal component analysis (PCA) to approximate ISCH^b
ReFACTor⁶⁸	ReFACTor uses sparse PCA to select for a subset of cell-type-associated differentially methylated probes
RefFreeCellMix⁷⁰	RefFreeCellMix utilizes non-negative matrix factorization (NMF) on a beta matrix to iteratively find the number of cell types and estimate their DNAme profiles, called “latent DNAme components” (LMC)
MeDeCom⁷⁶	MeDeCom also applies NMF but regularizes the LMCs so that they are more likely to have beta values closer to 0 or 1, as would be expected in a purified cell population.^c
CONFINED⁷⁷	CONFINED attempts to differentiate biological from technical variance by applying canonical correlation analysis (CCA) across multiple datasets; CONFINED is not applicable to single-cohort studies, but may be effective in multi-cohort analysis
EDec⁷⁸	the EDec package combines both reference-based CP and reference-free NMF
BayesCCE⁷⁹	BayesCCE expands on ReFACTor to predict relative counts of cell types using a Bayes prior that is based on known cell count ratio in the tissue of interest
ARIC⁸⁰	ARIC aims to improve prediction accuracy for rare cell populations; in the feature selection step, ARIC removes features that lead to high collinearity across cell types; it also uses weighted SVR to minimize the relative error of predicted components
Tsisal⁸¹	Tsisal transforms the DNAme data into a simplex—a linear programming algorithm with a geometric form; the user must specify the number of cell types present in the sample; Tsisal is also partially reference based, as the user can input the reference DNAme of some cell types and allow Tsisal to estimate the proportions of cell types with and without references
debCAM⁸²	The debCAM package uses convex analysis of mixtures (CAM), which is also based on simplex identification

Open in a new tab

The RefFreeEWAS package has been removed from the CRAN R package repository, so we are unable to run either the RefFreeEWAS or RefFreeCellMix functions directly. Instead, alternative implementations of these two functions are available in the TOAST package, with a novel feature selection step that iteratively identifies the cell-type-associated differential DNAme signature.⁸³ Please refer to the TOAST package vignette (https://github.com/ziyili20/TOAST) for detailed usage instructions.

A guide to implementing FaST-LMM-EWASher was presented previously.⁸⁴

The MeDeCom user guide describes ways to speed up the prediction step and provides useful tools, such as comparison of the estimated cell type to reference profiles. The authors have also provided a helpful preprocessing pipeline and downstream analysis visualization steps that are compatible with MeDeCom, RefFreeCellMix, and EDec.⁸⁵

In addition to the reference-free ISCH estimation methods listed in Table 2, previous studies have applied PCA directly to DNAme data and used the first few PCs for ISCH estimation.⁸⁶ However, ISCH may not be captured in the first few PCs, such as in disease conditions, or the first few PCs may also include variance associated with the variables of interest. Similar to SVA output, evaluation of the PC and the variance explained prior to taking this approach is recommended.

Accounting for intersample differences in cellular composition

Once ISCH has been measured directly or estimated bioinformatically, the associations between it and the variables of interest can be tested. Depending on whether the ISCH measure is compositional or not, the methods for adjustment differ. It has been proposed that the compositional and interdependent nature of proportional data can lead to multicollinearity and violate linear regression assumptions.⁸⁷ We wish to clarify this conundrum here.

Multicollinearity describes the linear dependency of predictor variables in a regression model, and it inflates standard error calculation and leads to unstable estimates.⁸⁸ The extent to which multicollinearity is plaguing each independent variable can be assessed with the variance inflation factor (VIF). VIF above 10 generally indicates high level of multicollinearity and unreliable estimate for the specific variable.⁸⁸^,⁸⁹ However, high VIF of some covariates in the model is not necessarily problematic, as long as the coefficients of covariates with high VIF are not interpreted. For example, even if the ISCH variables’ VIFs are high, the coefficient estimates for the variables of interest are unaffected as long as their VIFs remain low (i.e., as long as the variables of interest are not highly correlated with ISCH variables).

It is possible for the main variable of interest to be linked directly to changes in cellular composition.¹⁵^,¹⁶ In such cases, elevated VIF can pose a problem to interpretability of the regression results, as it can be impossible to determine whether DNAme differences may be attributed to the variable of interest or ISCH. Thus, we recommend researchers to thoroughly explore the relationships between all covariates prior to multivariate analyses. Including the estimated proportions of all cell types in EWAS models can sometimes lead to perfect multicollinearity (i.e., the proportion of one cell type can be explained by the sums of all other cell types), where the effect of one cell type cannot be estimated.⁸⁹ We therefore recommend removing one cell type intentionally to bypass the issue. The users can, for example, remove the cell type with the lowest intersample variability, or alternatively, the one with the lowest likelihood of biological plausibility. After deciding which ISCH variables to account for, the procedure can be implemented in two ways: if applicable, ISCH can be accounted for as covariates in statistical tests, or by calculating ISCH-residualized DNAme profile and using the residualized value in the statistical tests (Figure 1). For example, ISCH can be accounted for as covariates in EWAS models. Whereas in cases like a permutation test, where ISCH cannot be accounted for as covariates in the statistical test, the latter approach offers a viable method to still adjust for ISCH-associated variability.

Use robust linear regression and include ISCH information as covariates

The ISCH variables can be accounted for in EWAS models, either directly or converted into principal components (PCs) of ISCH variables (ISCH PCs). Mathematically, they would generate the same model if all the PCs are included and lead to the same coefficient and p value estimates for the variables of interest. The advantage of direct incorporation (e.g., reference-estimated monocyte proportion) is ease of interpretation. The association between DNAme and specific cell types can be investigated with this approach, but again, due to multicollinearity, we caution the readers on over-interpreting cell-type-associated differential DNAme. On the other hand, the advantage of including a subset of the top ISCH PCs is the ability to capture most of the ISCH variability while having fewer variables in the model, thereby increasing the detection power of the model and preventing overfitting.⁹⁰ Additionally, PCA removes multicollinearity among the ISCH variables only and allows the users to explain a general ISCH-associated DNAme signal (e.g., general blood cell type associated DNAme signature), under the condition that the variables of interest are not highly correlated with the ISCH PCs. We recommend including all the ISCH PCs as covariates, or at least sufficient PCs to account for 90% of the variance. Box 9 shows an example of PCA on the ISCH estimates.

Box 9. Pseudocode for principal component calculations of ISCH variables.

pca <- princomp(yourEstimatedISCH) # replace with your data

pca_var <- cumsum(pca$sdevˆ2 / sum(pca$sdevˆ2) # cumulative variance explained by PCs

names(pca_var)[pca_var > 0.9][1] # the PC at which cumulative variance exceeds 90%

yourISCHpcs <- as.data.frame(pca$scores) # the ISCH PCs

Either the ISCH estimates or the ISCH PCs can then be accounted for in EWAS models as covariates. In Box 10, we give an example of robust linear regression using the rlm function from the MASS package, which dampens the influence of outlier observations.⁹¹

Box 10. Pseudocode for accounting for ISCH variables in EWAS with robust regression.

library(MASS) # CRAN package

library(sfsmisc) # CRAN package

library(dplyr) # CRAN package

pd <- join(yourPhenoData, yourestimatedISCH, by = sampleID) # or yourISCHpcs

ewas_result <- apply(yourBetaMatrix, 1, function(y) {

mod <- rlm(formula = y ∼ mainVar + celltypesvariables, data = pd) # Include all

ISCH variables and all

other relevant covariates here.

out <- f.robftest(mod, var = 2) # The var = 2 argument

refers to the mainVar.

return(c(mod$coefficients[2], out$p.value))

}) %>% t()

colnames(ewas_result) <- c("coefficient", "pvalue")

Adjusting for ISCH by residualization

If ISCH cannot be included as a covariate in model fitting, perhaps because the user is not interested in fitting a traditional EWAS, an alternative approach to accounting for cell type proportions is to “residualize” ISCH-associated variability (i.e., fit a linear model that includes only the ISCH variables, and then proceed to use the residuals of this model in downstream analyses). An example of this is shown in Box 11. Similar to previous discussions, cell type proportions can be included directly or transformed with PCA before being included in linear models. We recommend only adjusting for ISCH when the cell type proportions or PCs cannot be included as covariates (described in the previous section), as they can be partially correlated with variables of interest (e.g., age and disease status). In these cases, regressing the cell type PCs out may lead to model overfitting, thereby removing more variation attributable to ISCH alone.⁶⁶^,⁹²

Box 11. Pseudocode for residualizing ISCH variability on DNAme levels.

library(dplyr) # CRAN package

residuals <- apply(yourBetaMatrix, 1, function(x){

x <- as.numeric(x)

mod <- lm(x ∼ celltypes, data = yourEstimatedISCH) # or replace with yourISCHpcs

return(residuals(summary(mod))) # The residual represents the

variance not associated with ISCH

}) %>% t()

adj.residuals <- residuals + matrix(apply(yourBetaMatrix, 1, mean), nrow = nrow(residuals), ncol = ncol(residuals))

# Add the residuals back to the mean DNAme level of each methylation site to obtain an adjusted beta matrix. This is optional, with the intention of creating an “imitation beta matrix” with the same scale as the typical DNAme data distribution (i.e., [0, 1]).

adj.residuals[adj.residuals <= 0] <- 0.0001 # Adjust the “beta values” such

that they are bounded by 0 and 1

adj.residuals[adj.residuals > 1] <- 0.9999

Accounting for noncompositional estimates

In cases where the estimated cell type effect is not in compositional form but in counts or as continuous variables (such as SVA output), the variables can be incorporated as covariates in linear-regression-based EWAS models without PCA. Again, we do not suggest regressing out the variance associated with estimated cell type effect before EWAS, as it can remove more variation attributable to ISCH alone. It can also lead to underestimation of standard errors of coefficients (anti-conservative) as it treats the adjusted levels as observed, without accounting for the variability in the estimated ISCH effects.

Investigating cell-type-specific associations of variables of interest and DNAme

While accounting for ISCH addresses the problem of cellular composition confounding with the variables of interest, the cell-type-specific DNAme (csDNAme) can still be masked. That is, the DNAme profile of each cell type can show different changes upon exposure to stimuli,⁹³ and it can be challenging to determine which cell types are driving the association detected at the bulk level. For example, epithelial cells in buccal swabs exhibit csDNAme with smoking behavior.⁹⁴ The csDNAme signal may not be identified when immune cells exist in high abundance in the original sample preparation.

Three methods have been developed to address this issue by deconvoluting the cell-type-specific DNAme associations with variables of interest: cell-type-specific differential methylation calling (CellDMC), cell-type-specific differential analysis using regression (CeDAR), and tensor composition analysis (TCA).

CellDMC fits a model with an interaction term between the covariates and estimated cell type proportion.⁹⁴ Through the statistical significance of this term, the CellDMC algorithm ranks the sites that are differentially methylated across cell types (Box 12).

Box 12. Pseudocode for identifying cell-type-specific associations with CellDMC.

library(EpiDISH) # Bioconductor package

mod <- model.matrix(∼covariates, data = yourphenodata) # Create a model matrix of

relevant covariates that are not

the estimated ISCH or the main

variable of interest.

dmc_output <- CellDMC(yourBetaMatrix, mainVar, yourestimatedISCH, cov.mod = mod)

dmc_csDM <- dmc_output$coe

Another novel method, CeDAR, also implements an interaction model to determine csDNAme. However, it uniquely adopts a Bayesian hierarchical model based on the correlation structure of some cell types.⁹⁵ The model assumes that differential DNAme patterns are likely to be correlated between closely related cell types. After estimating csDNAme for each cell type separately, the marginal likelihood of the differential methylation state is estimated based on the prior, which is the cell type hierarchy. The CeDAR algorithm by default estimates the cell type hierarchy based on the correlation structure of top differentially methylated DNAme sites across cell types (Box 13).

Box 13. Pseudocode for identifying cell-type-specific associations with CeDAR.

library(TOAST) # Bioconductor package

cedar_output <- cedar(Y_raw = yourBetaMatrix,

prop = yourestimatedISCH,

design.1 = yourphenodata,

factor.to.test = 'mainVarName',

cutoff.tree = c('pval',0.01),

cutoff.prior.prob = c('pval',0.01))

cedar_csDM <- cedar_output$toast_res

TCA uses matrix factorization and assumes that the csDNAme signal of a particular variable of interest is shared across samples. The significance of an csDNAme effect size for variables of interest are then tested with a generalized likelihood ratio test (Box 14). While it has yet to be independently confirmed, the original publication suggested that TCA might have higher detection power, especially with increasing number of unique cell types and when csDNAme is present in multiple cell types.⁹³

Box 14. Pseudocode for identifying cell-type-specific associations with TCA.

library(TCA) # CRAN package

tca_output <- tca(X = yourBetaMatrix,

W = yourestimatedISCH,

C1 = covariatesassociatedwithISCH,

C2 = covariatesnotassociatedwithISCH)

tca_csDM_pvals <- tca_output$gammas_hat_pvals

While the CellDMC algorithm runs the fastest and the model has a straightforward interpretation, the interaction component of the model reduces its power to detect true positives when multiple cell types are present. CeDAR uses the same interaction model but improves the confidence of its prediction with the hierarchical model assumption. The tradeoff is a longer run time, depending on the sample size and the model complexity. Similarly, TCA requires more computational power but may yield better performance under conditions with three or more cell types.⁹³ Using a test dataset of 3,000 CpGs, 100 samples, 6 cell types, and 3 other covariates, we clocked CellDMC at 6.4 s, CeDAR at 15.7 s, and TCA at 30.1 s. The run time scales linearly with the number of DNAme sites tested. The user may decide on which method to use depending on the cohort characteristics and the number of cell types involved.

Conclusion

In this primer, we presented a practical guide to estimate and account for ISCH in DNAme studies, with references to additional resources and authors’ recommendations for best practices. Given the contribution of ISCH on DNAme variability in general, examination of measured or predicted ISCH variables can inform downstream analysis, and accounting for it in a principled manner will lead to more accurate and interpretable findings. There is no single optimal procedure for studying ISCH in DNAme research. Users must decide on the appropriate pipeline based on factors such as characteristics of the cohort in question, sample size, availability of cell count information, whether the variables of interest are associated with ISCH, and the intended analysis. As new methods leveraging ISCH information are continually being developed, future research can borrow ideas from other omics methodologies to expand the analytical repertoire. For example, deconvolution of cell-type-specific signature was first performed in gene expression studies,⁹⁶ and now, the development of novel and robust methods for csDNAme signal detection can complement findings of single-cell DNAme research. Similarly, trajectory-based analysis, such as pseudotime, has been commonly used in transcriptomic studies to explore the transition of cell populations over time in longitudinal or interventional cohorts.⁹⁷^,⁹⁸^,⁹⁹^,¹⁰⁰ This can be applied in DNAme-based studies to examine immune system development, cellular response to external stimuli such as vaccines or smoking, or tracking tumor progression over time. These analytical tools, with constantly advancing machine learning methodologies, enhance our ability to understand how DNAme responds to the environment at a cellular level.

Resource availability

A vignette of example code for readers’ reference is available at https://rpubs.com/mfu/1217877. Most of the packages mentioned above have curated vignettes that detail the analysis pipelines and explain the parameters of their functions. Refer to the references and Tables 1 and 2 for additional resources.

Acknowledgments

We would like to acknowledge contribution from Dr. Chaini Konwar and Lea Separovic, who have provided insightful comments in the reviewing process that improved the flow and presentation of the paper. We are also grateful for Dr. Nicole Gladish’s past work in the lab, which built a foundation to account for ISCH, as laid out in the primer. Furthermore, Alan Kerr has contributed to the infrastructure underlying the R environment and has provided suggestions on troubleshooting and version control in previous drafts of the paper. Finally, thanks are due to our colleague, Hannah-Ruth Engelbrecht, who has provided insightful suggestions on comparing reference-free prediction methods. M.S.K. is supported by a grant from the Canadian Institutes of Health Research (PJT-173230) and receives research support from the Edwin S.H. Leong Centre for Healthy Aging.

Author contributions

M.P.-Y.F., S.M.M., and M.S.K.: conceptualization. M.P.-Y.F.: writing – original draft preparation. M.P.-Y.F., S.M.M., K.K., and M.S.K.: writing – review and editing. All authors contributed to the article and approved the submitted version.

Declaration of interests

M.S.K. is the Edwin S.H. Leong UBC Chair in Healthy Aging.

References

1.Carter B., Zhao K. The epigenetic basis of cellular heterogeneity. Nat. Rev. Genet. 2021;22:235–250. doi: 10.1038/s41576-020-00300-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Eckersley-Maslin M.A., Alda-Catalinas C., Reik W. Dynamics of the epigenetic landscape during the maternal-to-zygotic transition. Nat. Rev. Mol. Cell Biol. 2018;19:436–450. doi: 10.1038/s41580-018-0008-z. [DOI] [PubMed] [Google Scholar]
3.Jullien J., Vodnala M., Pasque V., Oikawa M., Miyamoto K., Allen G., David S.A., Brochard V., Wang S., Bradshaw C., et al. Gene resistance to transcriptional reprogramming following nuclear transfer is directly mediated by multiple chromatin-repressive pathways. Mol. Cell. 2017;65:873–884.e8. doi: 10.1016/j.molcel.2017.01.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Izzo F., Lee S.C., Poran A., Chaligne R., Gaiti F., Gross B., Murali R.R., Deochand S.D., Ang C., Jones P.W., et al. DNA methylation disruption reshapes the hematopoietic differentiation landscape. Nat. Genet. 2020;52:378–387. doi: 10.1038/s41588-020-0595-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Henning A.N., Roychoudhuri R., Restifo N.P. Epigenetic control of CD8+ T cell differentiation. Nat. Rev. Immunol. 2018;18:340–356. doi: 10.1038/nri.2017.146. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Jaffe A.E., Irizarry R.A. Accounting for cellular heterogeneity is critical in epigenome-wide association studies. Genome Biol. 2014;15:R31. doi: 10.1186/gb-2014-15-2-r31. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Bergstedt J., Azzou S.A.K., Tsuo K., Jaquaniello A., Urrutia A., Rotival M., Lin D.T.S., MacIsaac J.L., Kobor M.S., Albert M.L., et al. The immune factors driving DNA methylation variation in human blood. Nat. Commun. 2022;13:5895. doi: 10.1038/s41467-022-33511-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Mattei A.L., Bailly N., Meissner A. DNA methylation: a historical perspective. Trends Genet. 2022;38:676–707. doi: 10.1016/j.tig.2022.03.010. [DOI] [PubMed] [Google Scholar]
9.Li M., Zou D., Li Z., Gao R., Sang J., Zhang Y., Li R., Xia L., Zhang T., Niu G., et al. EWAS Atlas: a curated knowledgebase of epigenome-wide association studies. Nucleic Acids Res. 2019;47:D983–D988. doi: 10.1093/nar/gky1027. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Wahl S., Drong A., Lehne B., Loh M., Scott W.R., Kunze S., Tsai P.C., Ried J.S., Zhang W., Yang Y., et al. Epigenome-wide association study of body mass index and the adverse outcomes of adiposity. Nature. 2017;541:81–86. doi: 10.1038/nature20784. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Li S., Wong E.M., Bui M., Nguyen T.L., Joo J.H.E., Stone J., Dite G.S., Giles G.G., Saffery R., Southey M.C., Hopper J.L. Causal effect of smoking on DNA methylation in peripheral blood: A twin and family study. Clin. Epigenet. 2018;10 doi: 10.1186/s13148-018-0452-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Gatev E., Inkster A.M., Negri G.L., Konwar C., Lussier A.A., Skakkebaek A., Sokolowski M.B., Gravholt C.H., Dunn E.C., Kobor M.S., Aristizabal M.J. Autosomal sex-associated co-methylated regions predict biological sex from DNA methylation. Nucleic Acids Res. 2021;49:9097–9116. doi: 10.1093/nar/gkab682. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Lappalainen T., Greally J.M. Associating cellular epigenetic models with human phenotypes. Nat. Rev. Genet. 2017;18:441–451. doi: 10.1038/nrg.2017.32. [DOI] [PubMed] [Google Scholar]
14.Do C., Shearer A., Suzuki M., Terry M.B., Gelernter J., Greally J.M., Tycko B. Genetic–epigenetic interactions in cis: a major focus in the post-GWAS era. Genome Biol. 2017;18 doi: 10.1186/S13059-017-1250-Y. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Merrill S.M., Gladish N., Fu M.P., Moore S.R., Konwar C., Giesbrecht G.F., MacIssac J.L., Kobor M.S., Letourneau N.L. Associations of peripheral blood DNA methylation and estimated monocyte proportion differences during infancy with toddler attachment style. Attach. Hum. Dev. 2023;25:132–161. doi: 10.1080/14616734.2021.1938872. [DOI] [PubMed] [Google Scholar]
16.McEwen L.M., Morin A.M., Edgar R.D., MacIsaac J.L., Jones M.J., Dow W.H., Rosero-Bixby L., Kobor M.S., Rehkopf D.H. Differential DNA methylation and lymphocyte proportions in a Costa Rican high longevity region. Epigenet. Chromatin. 2017;10:21. doi: 10.1186/s13072-017-0128-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Li J., Li L., Wang Y., Huang G., Li X., Xie Z., Zhou Z. Insights Into the Role of DNA Methylation in Immune Cell Development and Autoimmune Disease. Front. Cell Dev. Biol. 2021;9 doi: 10.3389/fcell.2021.757318. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.R Core Team R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. 2021 https://www.R-project.org [Google Scholar]
19.RStudio Team . 2022. RStudio: Integrated Development Environment for R. (RStudio). http://www.rstudio.com. [Google Scholar]
20.Gentleman R.C., Carey V.J., Bates D.M., Bolstad B., Dettling M., Dudoit S., Ellis B., Gautier L., Ge Y., Gentry J., et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004;5:R80. doi: 10.1186/gb-2004-5-10-r80. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Bibikova M., Barnes B., Tsan C., Ho V., Klotzle B., Le J.M., Delano D., Zhang L., Schroth G.P., Gunderson K.L., et al. High density DNA methylation array with single CpG site resolution. Genomics. 2011;98:288–295. doi: 10.1016/j.ygeno.2011.07.007. [DOI] [PubMed] [Google Scholar]
22.Pidsley R., Zotenko E., Peters T.J., Lawrence M.G., Risbridger G.P., Molloy P., Van Djik S., Muhlhausler B., Stirzaker C., Clark S.J. Critical evaluation of the Illumina MethylationEPIC BeadChip microarray for whole-genome DNA methylation profiling. Genome Biol. 2016;17 doi: 10.1186/s13059-016-1066-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Zhuang B.C., Jude M.S., Konwar C., Ryan C.P., Whitehead J., Engelbrecht H.-R., MacIsaac J.L., Dever K., Toan T.K., Korinek K., et al. Comparison of Infinium MethylationEPIC v2.0 to v1.0 for human population epigenetics: considerations for addressing EPIC version differences in DNA methylation-based tools. bioRxiv. 2024 doi: 10.1101/2024.07.02.600461. Preprint at. [DOI] [Google Scholar]
24.Titus A.J., Houseman E.A., Johnson K.C., Christensen B.C. methyLiftover: cross-platform DNA methylation data integration. Bioinformatics. 2016;32:2517–2519. doi: 10.1093/bioinformatics/btw180. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Vanderlinden L.A., Johnson R.K., Carry P.M., Dong F., DeMeo D.L., Yang I.V., Norris J.M., Kechris K. An effective processing pipeline for harmonizing DNA methylation data from Illumina’s 450K and EPIC platforms for epidemiological studies. BMC Res. Notes. 2021;14:352. doi: 10.1186/s13104-021-05741-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Gervin K., Salas L.A., Bakulski K.M., van Zelm M.C., Koestler D.C., Wiencke J.K., Duijts L., Moll H.A., Kelsey K.T., Kobor M.S., et al. Systematic evaluation and validation of reference and library selection methods for deconvolution of cord blood DNA methylation data. Clin. Epigenetics. 2019;11 doi: 10.1186/s13148-019-0717-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Aryee M.J., Jaffe A.E., Corrada-Bravo H., Ladd-Acosta C., Feinberg A.P., Hansen K.D., Irizarry R.A. Minfi: A flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics. 2014;30:1363–1369. doi: 10.1093/bioinformatics/btu049. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Montaño C.M., Irizarry R.A., Kaufmann W.E., Talbot K., Gur R.E., Feinberg A.P., Taub M.A. Measuring cell-type specific differential methylation in human brain tissue. Genome Biol. 2013;14:R94. doi: 10.1186/gb-2013-14-8-r94. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Murat K., Grüning B., Poterlowicz P.W., Westgate G., Tobin D.J., Poterlowicz K. Ewastools: Infinium Human Methylation BeadChip pipeline for population epigenetics integrated into Galaxy. GigaScience. 2020;9 doi: 10.1093/gigascience/giaa049. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Koestler D.C., Jones M.J., Usset J., Christensen B.C., Butler R.A., Kobor M.S., Wiencke J.K., Kelsey K.T. Improving cell mixture deconvolution by identifying optimal DNA methylation libraries (IDOL. BMC Bioinf. 2016;17 doi: 10.1186/s12859-016-0943-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Salas L.A., Zhang Z., Koestler D.C., Butler R.A., Hansen H.M., Molinaro A.M., Wiencke J.K., Kelsey K.T., Christensen B.C. Enhanced cell deconvolution of peripheral blood using DNA methylation for high-resolution immune profiling. Nat. Commun. 2022;13:761. doi: 10.1038/s41467-021-27864-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Zheng S.C., Webster A.P., Dong D., Feber A., Graham D.G., Sullivan R., Jevons S., Lovat L.B., Beck S., Widschwendter M., Teschendorff A.E. A novel cell-type deconvolution algorithm reveals substantial contamination by immune cells in saliva, buccal and cervix. Epigenomics. 2018;10:925–940. doi: 10.2217/epi-2018-0037. [DOI] [PubMed] [Google Scholar]
33.Hicks S.C., Irizarry R.A. MethylCC: Technology-independent estimation of cell type composition using differentially methylated regions. Genome Biol. 2019;20 doi: 10.1186/s13059-019-1827-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Zhang Z., Wiencke J.K., Kelsey K.T., Koestler D.C., Molinaro A.M., Pike S.C., Karra P., Christensen B.C., Salas L.A. Hierarchical deconvolution for extensive cell type resolution in the human brain using DNA methylation. Front. Neurosci. 2023;17 doi: 10.3389/fnins.2023.1198243. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Cai M., Zhou J., McKennan C., Wang J. scMD facilitates cell type deconvolution using single-cell DNA methylation references. Commun. Biol. 2024;7:1. doi: 10.1038/s42003-023-05690-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Guintivano J., Aryee M.J., Kaminsky Z.A. A cell epigenotype specific model for the correction of brain cellular heterogeneity bias and its application to age, brain region and major depression. Epigenetics. 2013;8:290–302. doi: 10.4161/epi.23924. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Yuan V., Hui D., Yin Y., Peñaherrera M.S., Beristain A.G., Robinson W.P. Cell-specific characterization of the placental methylome. BMC Genom. 2021;22 doi: 10.1186/s12864-020-07186-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Maié T., Schmidt M., Erz M., Wagner W., G. Costa I. CimpleG: finding simple CpG methylation signatures. Genome Biol. 2023;24:161. doi: 10.1186/s13059-023-03000-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Liu Y. scDeconv: an R package to deconvolve bulk DNA methylation data with scRNA-seq data and paired bulk RNA–DNA methylation data. Brief. Bioinform. 2022;23 doi: 10.1093/bib/bbac150. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Li S., Zeng W., Ni X., Liu Q., Li W., Stackpole M.L., Zhou Y., Gower A., Krysan K., Ahuja P., et al. Comprehensive tissue deconvolution of cell-free DNA by deep learning for disease diagnosis and monitoring. Proc. Natl. Acad. Sci. USA. 2023;120 doi: 10.1073/pnas.2305236120. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Caggiano C., Celona B., Garton F., Mefford J., Black B.L., Henderson R., Lomen-Hoerth C., Dahl A., Zaitlen N. Comprehensive cell type decomposition of circulating cell-free DNA with CelFiE. Nat. Commun. 2021;12:2717. doi: 10.1038/s41467-021-22901-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Jeong Y., Rohr K., Lutsik P. MethylBERT: A Transformer-based model for read-level DNA methylation pattern identification and tumour deconvolution. bioRxiv. 2023 doi: 10.1101/2023.10.29.564590. Preprint at. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Arneson D., Yang X., Wang K. MethylResolver—a method for deconvoluting bulk DNA methylation profiles into known and unknown cell contents. Commun. Biol. 2020;3:422. doi: 10.1038/s42003-020-01146-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Zhang Z., Wiencke J.K., Kelsey K.T., Koestler D.C., Christensen B.C., Salas L.A. HiTIMED: hierarchical tumor immune microenvironment epigenetic deconvolution for accurate cell type resolution in the tumor microenvironment using tumor-type-specific DNA methylation data. J. Transl. Med. 2022;20:516. doi: 10.1186/s12967-022-03736-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Chakravarthy A., Furness A., Joshi K., Ghorani E., Ford K., Ward M.J., King E.V., Lechner M., Marafioti T., Quezada S.A., et al. Pan-cancer deconvolution of tumour composition using DNA methylation. Nat. Commun. 2018;9:3220. doi: 10.1038/s41467-018-05570-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Zhang H., Cai R., Dai J., Sun W. EMeth: An EM algorithm for cell type decomposition based on DNA methylation data. Sci. Rep. 2021;11:5717. doi: 10.1038/s41598-021-84864-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
47.He D., Chen M., Wang W., Song C., Qin Y. Deconvolution of tumor composition using partially available DNA methylation data. BMC Bioinf. 2022;23:355. doi: 10.1186/s12859-022-04893-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Middleton L.Y.M., Dou J., Fisher J., Heiss J.A., Nguyen V.K., Just A.C., Faul J., Ware E.B., Mitchell C., Colacino J.A., M Bakulski K. Saliva cell type DNA methylation reference panel for epidemiological studies in children. Epigenetics. 2022;17:161–177. doi: 10.1080/15592294.2021.1890874. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Zhu T., Liu J., Beck S., Pan S., Capper D., Lechner M., Thirlwell C., Breeze C.E., Teschendorff A.E. A pan-tissue DNA methylation atlas enables in silico decomposition of human tissue methylomes at cell-type resolution. Nat. Methods. 2022;19:296–306. doi: 10.1038/s41592-022-01412-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Muse M.E., Carroll C.D., Salas L.A., Karagas M.R., Christensen B.C. Application of Novel Breast Biospecimen Cell-Type Adjustment Identifies Shared DNA Methylation Alterations in Breast Tissue and Milk with Breast Cancer-Risk Factors. Biomarkers Prevention. 2023;32:550–560. doi: 10.1158/1055-9965.EPI-22-0405. [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Consortium R.E., A K., W M. Integrative analysis of 111 reference human epigenomes. Nature. 2015;518:317–329. doi: 10.1038/nature14248. [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Moss J., Magenheim J., Neiman D., Zemmour H., Loyfer N., Korach A., Samet Y., Maoz M., Druid H., Arner P., et al. Comprehensive human cell-type methylation atlas reveals origins of circulating cell-free DNA in health and disease. Nat. Commun. 2018;9:5068. doi: 10.1038/s41467-018-07466-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Cai M., Yue M., Chen T., Liu J., Forno E., Lu X., Billiar T., Celedón J., McKennan C., Chen W., Wang J. Robust and accurate estimation of cellular fraction from tissue omics data via ensemble deconvolution. Bioinformatics. 2022;38:3004–3010. doi: 10.1093/bioinformatics/btac279. [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Cheng A.P., Burnham P., Lee J.R., Cheng M.P., Suthanthiran M., Dadhania D., De Vlaminck I. A cell-free DNA metagenomic sequencing assay that integrates the host injury response to infection. Proc. Natl. Acad. Sci. USA. 2019;116:18738–18744. doi: 10.1073/pnas.1906320116. [DOI] [PMC free article] [PubMed] [Google Scholar]
55.George-Gay B., Parker K. Understanding the complete blood count with differential. J. Perianesth. Nurs. 2003;18:96–117. doi: 10.1053/jpan.2003.50013. [DOI] [PubMed] [Google Scholar]
56.Salas L.A., Koestler D.C., Butler R.A., Hansen H.M., Wiencke J.K., Kelsey K.T., Christensen B.C. An optimized library for reference-based deconvolution of whole-blood biospecimens assayed using the Illumina HumanMethylationEPIC BeadArray. Genome Biol. 2018;19:64. doi: 10.1186/s13059-018-1448-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
57.Dieckmann L., Cruceanu C., Lahti-Pulkkinen M., Lahti J., Kvist T., Laivuori H., Sammallahti S., Villa P.M., Suomalainen-König S., Rancourt R.C., et al. Reliability of a novel approach for reference-based cell type estimation in human placental DNA methylation studies. Cell. Mol. Life Sci. 2022;79:115. doi: 10.1007/s00018-021-04091-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
58.Kaushal A., Zhang H., Karmaus W.J., Wang J.S. Which methods to choose to correct cell types in genome-scale blood-derived DNA methylation data? BMC Bioinf. 2015;16:P7. doi: 10.1186/1471-2105-16-S15-P7. [DOI] [Google Scholar]
59.Teschendorff A.E., Relton C.L. Statistical and integrative system-level analysis of DNA methylation data. Nat. Rev. Genet. 2018;19:129–147. doi: 10.1038/nrg.2017.86. [DOI] [PubMed] [Google Scholar]
60.Houseman E.A., Accomando W.P., Koestler D.C., Christensen B.C., Marsit C.J., Nelson H.H., Wiencke J.K., Kelsey K.T. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinf. 2012;13:86. doi: 10.1186/1471-2105-13-86. [DOI] [PMC free article] [PubMed] [Google Scholar]
61.Newman A.M., Liu C.L., Green M.R., Gentles A.J., Feng W., Xu Y., Hoang C.D., Diehn M., Alizadeh A.A. Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods. 2015;12:453–457. doi: 10.1038/nmeth.3337. [DOI] [PMC free article] [PubMed] [Google Scholar]
62.Teschendorff A.E., Breeze C.E., Zheng S.C., Beck S. A comparison of reference-based algorithms for correcting cell-type heterogeneity in Epigenome-Wide Association Studies. BMC Bioinf. 2017;18:105. doi: 10.1186/s12859-017-1511-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
63.Fortin J.-P., Triche T.J., Hansen K.D. Preprocessing, normalization and integration of the Illumina HumanMethylationEPIC array with minfi. Bioinformatics. 2017;33:558–560. doi: 10.1093/bioinformatics/btw691. [DOI] [PMC free article] [PubMed] [Google Scholar]
64.Teschendorff A.E., Zhu T., Breeze C.E., Beck S. EPISCORE: cell type deconvolution of bulk tissue DNA methylomes from single-cell RNA-Seq data. Genome Biol. 2020;21:221. doi: 10.1186/s13059-020-02126-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
65.Fridley B., Wang X., editors. Statistical Genomics. Springer US; 2023. [DOI] [Google Scholar]
66.Leek J.T., Johnson W.E., Parker H.S., Jaffe A.E., Storey J.D. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics. 2012;28:882–883. doi: 10.1093/bioinformatics/bts034. [DOI] [PMC free article] [PubMed] [Google Scholar]
67.Zou J., Lippert C., Heckerman D., Aryee M., Listgarten J. Epigenome-wide association studies without the need for cell-type composition. Nat. Methods. 2014;11:309–311. doi: 10.1038/nmeth.2815. [DOI] [PubMed] [Google Scholar]
68.Rahmani E., Zaitlen N., Baran Y., Eng C., Hu D., Galanter J., Oh S., Burchard E.G., Eskin E., Zou J., Halperin E. Sparse PCA corrects for cell type heterogeneity in epigenome-wide association studies. Nat. Methods. 2016;13:443–445. doi: 10.1038/nmeth.3809. [DOI] [PMC free article] [PubMed] [Google Scholar]
69.Houseman E.A., Molitor J., Marsit C.J. Reference-free cell mixture adjustments in analysis of DNA methylation data. Bioinformatics. 2014;30:1431–1439. doi: 10.1093/bioinformatics/btu029. [DOI] [PMC free article] [PubMed] [Google Scholar]
70.Houseman E.A., Kile M.L., Christiani D.C., Ince T.A., Kelsey K.T., Marsit C.J. Reference-free deconvolution of DNA methylation data and mediation by cell composition effects. BMC Bioinf. 2016;17:259. doi: 10.1186/s12859-016-1140-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
71.Zheng S.C., Beck S., Jaffe A.E., Koestler D.C., Hansen K.D., Houseman A.E., Irizarry R.A., Teschendorff A.E. Correcting for cell-type heterogeneity in epigenome-wide association studies: revisiting previous analyses. Nat. Methods. 2017;14:216–217. doi: 10.1038/nmeth.4187. [DOI] [PMC free article] [PubMed] [Google Scholar]
72.Kaushal A., Zhang H., Karmaus W.J.J., Ray M., Torres M.A., Smith A.K., Wang S.L. Comparison of different cell type correction methods for genome-scale epigenetics studies. BMC Bioinf. 2017;18 doi: 10.1186/s12859-017-1611-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
73.Troyanskaya O., Cantor M., Sherlock G., Brown P., Hastie T., Tibshirani R., Botstein D., Altman R.B. Missing value estimation methods for DNA microarrays. Bioinformatics. 2001;17:520–525. doi: 10.1093/bioinformatics/17.6.520. [DOI] [PubMed] [Google Scholar]
74.Di Lena P., Sala C., Prodi A., Nardini C. Missing value estimation methods for DNA methylation data. Bioinformatics. 2019;35:3786–3793. doi: 10.1093/bioinformatics/btz134. [DOI] [PubMed] [Google Scholar]
75.Lena P.D., Sala C., Prodi A., Nardini C. Methylation data imputation performances under different representations and missingness patterns. BMC Bioinf. 2020;21:268. doi: 10.1186/s12859-020-03592-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
76.Lutsik P., Slawski M., Gasparoni G., Vedeneev N., Hein M., Walter J. MeDeCom: Discovery and quantification of latent components of heterogeneous methylomes. Genome Biol. 2017;18 doi: 10.1186/s13059-017-1182-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
77.Thompson M., Chen Z.J., Rahmani E., Halperin E. CONFINED: Distinguishing biological from technical sources of variation by leveraging multiple methylation datasets. Genome Biol. 2019;20 doi: 10.1186/s13059-019-1743-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
78.Onuchic V., Hartmaier R.J., Boone D.N., Samuels M.L., Patel R.Y., White W.M., Garovic V.D., Oesterreich S., Roth M.E., Lee A.V., Milosavljevic A. Epigenomic deconvolution of breast tumors reveals metabolic coupling between constituent cell types. Cell Rep. 2016;17:2075–2086. doi: 10.1016/j.celrep.2016.10.057. [DOI] [PMC free article] [PubMed] [Google Scholar]
79.Rahmani E., Schweiger R., Shenhav L., Wingert T., Hofer I., Gabel E., Eskin E., Halperin E. BayesCCE: a Bayesian framework for estimating cell-type composition from DNA methylation without the need for methylation reference. Genome Biol. 2018;19:141. doi: 10.1186/s13059-018-1513-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
80.Zhang W., Xu H., Qiao R., Zhong B., Zhang X., Gu J., Zhang X., Wei L., Wang X. ARIC: accurate and robust inference of cell type proportions from bulk gene expression or DNA methylation data. Brief. Bioinform. 2022;23 doi: 10.1093/bib/bbab362. [DOI] [PubMed] [Google Scholar]
81.Zhang W., Wu H., Li Z. Complete deconvolution of DNA methylation signals from complex tissues: a geometric approach. Bioinformatics. 2021;37:1052–1059. doi: 10.1093/bioinformatics/btaa930. [DOI] [PMC free article] [PubMed] [Google Scholar]
82.Chen L., Wu C.-T., Wang N., Herrington D.M., Clarke R., Wang Y. debCAM: a bioconductor R package for fully unsupervised deconvolution of complex tissues. Bioinformatics. 2020;36:3927–3929. doi: 10.1093/bioinformatics/btaa205. [DOI] [PMC free article] [PubMed] [Google Scholar]
83.Li Z., Wu H. TOAST: Improving reference-free cell composition estimation by cross-cell type differential analysis. Genome Biol. 2019;20:190. doi: 10.1186/s13059-019-1778-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
84.Zou J.Y. In: Population Epigenetics: Methods and Protocols Methods in Molecular Biology. Haggarty P., Harrison K., editors. Springer; 2017. Correcting for Sample Heterogeneity in Methylome-Wide Association Studies; pp. 107–114. [DOI] [PubMed] [Google Scholar]
85.Scherer M., Nazarov P.V., Toth R., Sahay S., Kaoma T., Maurer V., Vedeneev N., Plass C., Lengauer T., Walter J., Lutsik P. Reference-free deconvolution, visualization and interpretation of complex DNA methylation data using DecompPipeline, MeDeCom and FactorViz. Nat. Protoc. 2020;15:3240–3263. doi: 10.1038/s41596-020-0369-6. [DOI] [PubMed] [Google Scholar]
86.Ng B., White C.C., Klein H.-U., Sieberts S.K., McCabe C., Patrick E., Xu J., Yu L., Gaiteri C., Bennett D.A., et al. An xQTL map integrates the genetic architecture of the human brain’s transcriptome and epigenome. Nat. Neurosci. 2017;20:1418–1426. doi: 10.1038/nn.4632. [DOI] [PMC free article] [PubMed] [Google Scholar]
87.Su X., Yan X., Tsai C.-L. Linear regression. WIREs Computational Stats. 2012;4:275–294. doi: 10.1002/wics.1198. [DOI] [Google Scholar]
88.Alin A. Multicollinearity. WIREs Computational Stats. 2010;2:370–374. doi: 10.1002/wics.84. [DOI] [Google Scholar]
89.Shrestha N. Detecting Multicollinearity in Regression Analysis. Am. J. Appl. Math. Stat. 2020;8:39–42. doi: 10.12691/ajams-8-2-1. [DOI] [Google Scholar]
90.Green S.B. How Many Subjects Does It Take To Do A Regression Analysis. Multivariate Behav. Res. 1991;26:499–510. doi: 10.1207/s15327906mbr2603_7. [DOI] [PubMed] [Google Scholar]
91.Abonazel M.R., Rabie A.R. The impact of using robust estimations in regression models: an application on the Egyptian economy. J. Adv. Res. Appl. Math. Stat. 2019;4:8–16. [Google Scholar]
92.Leek J.T., Storey J.D. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 2007;3:1724–1735. doi: 10.1371/journal.pgen.0030161. [DOI] [PMC free article] [PubMed] [Google Scholar]
93.Rahmani E., Schweiger R., Rhead B., Criswell L.A., Barcellos L.F., Eskin E., Rosset S., Sankararaman S., Halperin E. Cell-type-specific resolution epigenetics without the need for cell sorting or single-cell biology. Nat. Commun. 2019;10:3417. doi: 10.1038/s41467-019-11052-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
94.Zheng S.C., Breeze C.E., Beck S., Teschendorff A.E. Identification of differentially methylated cell types in epigenome-wide association studies. Nat. Methods. 2018;15:1059–1066. doi: 10.1038/s41592-018-0213-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
95.Chen L., Li Z., Wu H. CeDAR: incorporating cell type hierarchy improves cell type-specific differential analyses in bulk omics data. Genome Biol. 2023;24:37. doi: 10.1186/s13059-023-02857-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
96.Newman A.M., Steen C.B., Liu C.L., Gentles A.J., Chaudhuri A.A., Scherer F., Khodadoust M.S., Esfahani M.S., Luca B.A., Steiner D., et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat. Biotechnol. 2019;37:773–782. doi: 10.1038/s41587-019-0114-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
97.Trapnell C., Cacchiarelli D., Grimsby J., Pokharel P., Li S., Morse M., Lennon N.J., Livak K.J., Mikkelsen T.S., Rinn J.L. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 2014;32:381–386. doi: 10.1038/nbt.2859. [DOI] [PMC free article] [PubMed] [Google Scholar]
98.Ji Z., Ji H. TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis. Nucleic Acids Res. 2016;44:e117. doi: 10.1093/nar/gkw430. [DOI] [PMC free article] [PubMed] [Google Scholar]
99.Van den Berge K., Roux de Bézieux H., Street K., Saelens W., Cannoodt R., Saeys Y., Dudoit S., Clement L. Trajectory-based differential expression analysis for single-cell sequencing data. Nat. Commun. 2020;11:1201. doi: 10.1038/s41467-020-14766-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
100.Fischer D.S., Theis F.J., Yosef N. Impulse model-based differential expression analysis of time course sequencing data. Nucleic Acids Res. 2018;46:e119. doi: 10.1093/nar/gky675. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib1] 1.Carter B., Zhao K. The epigenetic basis of cellular heterogeneity. Nat. Rev. Genet. 2021;22:235–250. doi: 10.1038/s41576-020-00300-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] 2.Eckersley-Maslin M.A., Alda-Catalinas C., Reik W. Dynamics of the epigenetic landscape during the maternal-to-zygotic transition. Nat. Rev. Mol. Cell Biol. 2018;19:436–450. doi: 10.1038/s41580-018-0008-z. [DOI] [PubMed] [Google Scholar]

[bib3] 3.Jullien J., Vodnala M., Pasque V., Oikawa M., Miyamoto K., Allen G., David S.A., Brochard V., Wang S., Bradshaw C., et al. Gene resistance to transcriptional reprogramming following nuclear transfer is directly mediated by multiple chromatin-repressive pathways. Mol. Cell. 2017;65:873–884.e8. doi: 10.1016/j.molcel.2017.01.030. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib4] 4.Izzo F., Lee S.C., Poran A., Chaligne R., Gaiti F., Gross B., Murali R.R., Deochand S.D., Ang C., Jones P.W., et al. DNA methylation disruption reshapes the hematopoietic differentiation landscape. Nat. Genet. 2020;52:378–387. doi: 10.1038/s41588-020-0595-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] 5.Henning A.N., Roychoudhuri R., Restifo N.P. Epigenetic control of CD8+ T cell differentiation. Nat. Rev. Immunol. 2018;18:340–356. doi: 10.1038/nri.2017.146. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib6] 6.Jaffe A.E., Irizarry R.A. Accounting for cellular heterogeneity is critical in epigenome-wide association studies. Genome Biol. 2014;15:R31. doi: 10.1186/gb-2014-15-2-r31. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] 7.Bergstedt J., Azzou S.A.K., Tsuo K., Jaquaniello A., Urrutia A., Rotival M., Lin D.T.S., MacIsaac J.L., Kobor M.S., Albert M.L., et al. The immune factors driving DNA methylation variation in human blood. Nat. Commun. 2022;13:5895. doi: 10.1038/s41467-022-33511-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib8] 8.Mattei A.L., Bailly N., Meissner A. DNA methylation: a historical perspective. Trends Genet. 2022;38:676–707. doi: 10.1016/j.tig.2022.03.010. [DOI] [PubMed] [Google Scholar]

[bib9] 9.Li M., Zou D., Li Z., Gao R., Sang J., Zhang Y., Li R., Xia L., Zhang T., Niu G., et al. EWAS Atlas: a curated knowledgebase of epigenome-wide association studies. Nucleic Acids Res. 2019;47:D983–D988. doi: 10.1093/nar/gky1027. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib10] 10.Wahl S., Drong A., Lehne B., Loh M., Scott W.R., Kunze S., Tsai P.C., Ried J.S., Zhang W., Yang Y., et al. Epigenome-wide association study of body mass index and the adverse outcomes of adiposity. Nature. 2017;541:81–86. doi: 10.1038/nature20784. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] 11.Li S., Wong E.M., Bui M., Nguyen T.L., Joo J.H.E., Stone J., Dite G.S., Giles G.G., Saffery R., Southey M.C., Hopper J.L. Causal effect of smoking on DNA methylation in peripheral blood: A twin and family study. Clin. Epigenet. 2018;10 doi: 10.1186/s13148-018-0452-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib12] 12.Gatev E., Inkster A.M., Negri G.L., Konwar C., Lussier A.A., Skakkebaek A., Sokolowski M.B., Gravholt C.H., Dunn E.C., Kobor M.S., Aristizabal M.J. Autosomal sex-associated co-methylated regions predict biological sex from DNA methylation. Nucleic Acids Res. 2021;49:9097–9116. doi: 10.1093/nar/gkab682. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib13] 13.Lappalainen T., Greally J.M. Associating cellular epigenetic models with human phenotypes. Nat. Rev. Genet. 2017;18:441–451. doi: 10.1038/nrg.2017.32. [DOI] [PubMed] [Google Scholar]

[bib14] 14.Do C., Shearer A., Suzuki M., Terry M.B., Gelernter J., Greally J.M., Tycko B. Genetic–epigenetic interactions in cis: a major focus in the post-GWAS era. Genome Biol. 2017;18 doi: 10.1186/S13059-017-1250-Y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib15] 15.Merrill S.M., Gladish N., Fu M.P., Moore S.R., Konwar C., Giesbrecht G.F., MacIssac J.L., Kobor M.S., Letourneau N.L. Associations of peripheral blood DNA methylation and estimated monocyte proportion differences during infancy with toddler attachment style. Attach. Hum. Dev. 2023;25:132–161. doi: 10.1080/14616734.2021.1938872. [DOI] [PubMed] [Google Scholar]

[bib16] 16.McEwen L.M., Morin A.M., Edgar R.D., MacIsaac J.L., Jones M.J., Dow W.H., Rosero-Bixby L., Kobor M.S., Rehkopf D.H. Differential DNA methylation and lymphocyte proportions in a Costa Rican high longevity region. Epigenet. Chromatin. 2017;10:21. doi: 10.1186/s13072-017-0128-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib17] 17.Li J., Li L., Wang Y., Huang G., Li X., Xie Z., Zhou Z. Insights Into the Role of DNA Methylation in Immune Cell Development and Autoimmune Disease. Front. Cell Dev. Biol. 2021;9 doi: 10.3389/fcell.2021.757318. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib18] 18.R Core Team R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. 2021 https://www.R-project.org [Google Scholar]

[bib19] 19.RStudio Team . 2022. RStudio: Integrated Development Environment for R. (RStudio). http://www.rstudio.com. [Google Scholar]

[bib20] 20.Gentleman R.C., Carey V.J., Bates D.M., Bolstad B., Dettling M., Dudoit S., Ellis B., Gautier L., Ge Y., Gentry J., et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004;5:R80. doi: 10.1186/gb-2004-5-10-r80. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib21] 21.Bibikova M., Barnes B., Tsan C., Ho V., Klotzle B., Le J.M., Delano D., Zhang L., Schroth G.P., Gunderson K.L., et al. High density DNA methylation array with single CpG site resolution. Genomics. 2011;98:288–295. doi: 10.1016/j.ygeno.2011.07.007. [DOI] [PubMed] [Google Scholar]

[bib22] 22.Pidsley R., Zotenko E., Peters T.J., Lawrence M.G., Risbridger G.P., Molloy P., Van Djik S., Muhlhausler B., Stirzaker C., Clark S.J. Critical evaluation of the Illumina MethylationEPIC BeadChip microarray for whole-genome DNA methylation profiling. Genome Biol. 2016;17 doi: 10.1186/s13059-016-1066-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib23] 23.Zhuang B.C., Jude M.S., Konwar C., Ryan C.P., Whitehead J., Engelbrecht H.-R., MacIsaac J.L., Dever K., Toan T.K., Korinek K., et al. Comparison of Infinium MethylationEPIC v2.0 to v1.0 for human population epigenetics: considerations for addressing EPIC version differences in DNA methylation-based tools. bioRxiv. 2024 doi: 10.1101/2024.07.02.600461. Preprint at. [DOI] [Google Scholar]

[bib24] 24.Titus A.J., Houseman E.A., Johnson K.C., Christensen B.C. methyLiftover: cross-platform DNA methylation data integration. Bioinformatics. 2016;32:2517–2519. doi: 10.1093/bioinformatics/btw180. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib25] 25.Vanderlinden L.A., Johnson R.K., Carry P.M., Dong F., DeMeo D.L., Yang I.V., Norris J.M., Kechris K. An effective processing pipeline for harmonizing DNA methylation data from Illumina’s 450K and EPIC platforms for epidemiological studies. BMC Res. Notes. 2021;14:352. doi: 10.1186/s13104-021-05741-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib29] 26.Gervin K., Salas L.A., Bakulski K.M., van Zelm M.C., Koestler D.C., Wiencke J.K., Duijts L., Moll H.A., Kelsey K.T., Kobor M.S., et al. Systematic evaluation and validation of reference and library selection methods for deconvolution of cord blood DNA methylation data. Clin. Epigenetics. 2019;11 doi: 10.1186/s13148-019-0717-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib45] 27.Aryee M.J., Jaffe A.E., Corrada-Bravo H., Ladd-Acosta C., Feinberg A.P., Hansen K.D., Irizarry R.A. Minfi: A flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics. 2014;30:1363–1369. doi: 10.1093/bioinformatics/btu049. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib46] 28.Montaño C.M., Irizarry R.A., Kaufmann W.E., Talbot K., Gur R.E., Feinberg A.P., Taub M.A. Measuring cell-type specific differential methylation in human brain tissue. Genome Biol. 2013;14:R94. doi: 10.1186/gb-2013-14-8-r94. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib47] 29.Murat K., Grüning B., Poterlowicz P.W., Westgate G., Tobin D.J., Poterlowicz K. Ewastools: Infinium Human Methylation BeadChip pipeline for population epigenetics integrated into Galaxy. GigaScience. 2020;9 doi: 10.1093/gigascience/giaa049. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib26] 30.Koestler D.C., Jones M.J., Usset J., Christensen B.C., Butler R.A., Kobor M.S., Wiencke J.K., Kelsey K.T. Improving cell mixture deconvolution by identifying optimal DNA methylation libraries (IDOL. BMC Bioinf. 2016;17 doi: 10.1186/s12859-016-0943-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib27] 31.Salas L.A., Zhang Z., Koestler D.C., Butler R.A., Hansen H.M., Molinaro A.M., Wiencke J.K., Kelsey K.T., Christensen B.C. Enhanced cell deconvolution of peripheral blood using DNA methylation for high-resolution immune profiling. Nat. Commun. 2022;13:761. doi: 10.1038/s41467-021-27864-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib35] 32.Zheng S.C., Webster A.P., Dong D., Feber A., Graham D.G., Sullivan R., Jevons S., Lovat L.B., Beck S., Widschwendter M., Teschendorff A.E. A novel cell-type deconvolution algorithm reveals substantial contamination by immune cells in saliva, buccal and cervix. Epigenomics. 2018;10:925–940. doi: 10.2217/epi-2018-0037. [DOI] [PubMed] [Google Scholar]

[bib48] 33.Hicks S.C., Irizarry R.A. MethylCC: Technology-independent estimation of cell type composition using differentially methylated regions. Genome Biol. 2019;20 doi: 10.1186/s13059-019-1827-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib40] 34.Zhang Z., Wiencke J.K., Kelsey K.T., Koestler D.C., Molinaro A.M., Pike S.C., Karra P., Christensen B.C., Salas L.A. Hierarchical deconvolution for extensive cell type resolution in the human brain using DNA methylation. Front. Neurosci. 2023;17 doi: 10.3389/fnins.2023.1198243. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib50] 35.Cai M., Zhou J., McKennan C., Wang J. scMD facilitates cell type deconvolution using single-cell DNA methylation references. Commun. Biol. 2024;7:1. doi: 10.1038/s42003-023-05690-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib51] 36.Guintivano J., Aryee M.J., Kaminsky Z.A. A cell epigenotype specific model for the correction of brain cellular heterogeneity bias and its application to age, brain region and major depression. Epigenetics. 2013;8:290–302. doi: 10.4161/epi.23924. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib52] 37.Yuan V., Hui D., Yin Y., Peñaherrera M.S., Beristain A.G., Robinson W.P. Cell-specific characterization of the placental methylome. BMC Genom. 2021;22 doi: 10.1186/s12864-020-07186-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib53] 38.Maié T., Schmidt M., Erz M., Wagner W., G. Costa I. CimpleG: finding simple CpG methylation signatures. Genome Biol. 2023;24:161. doi: 10.1186/s13059-023-03000-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib43] 39.Liu Y. scDeconv: an R package to deconvolve bulk DNA methylation data with scRNA-seq data and paired bulk RNA–DNA methylation data. Brief. Bioinform. 2022;23 doi: 10.1093/bib/bbac150. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib54] 40.Li S., Zeng W., Ni X., Liu Q., Li W., Stackpole M.L., Zhou Y., Gower A., Krysan K., Ahuja P., et al. Comprehensive tissue deconvolution of cell-free DNA by deep learning for disease diagnosis and monitoring. Proc. Natl. Acad. Sci. USA. 2023;120 doi: 10.1073/pnas.2305236120. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib55] 41.Caggiano C., Celona B., Garton F., Mefford J., Black B.L., Henderson R., Lomen-Hoerth C., Dahl A., Zaitlen N. Comprehensive cell type decomposition of circulating cell-free DNA with CelFiE. Nat. Commun. 2021;12:2717. doi: 10.1038/s41467-021-22901-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib56] 42.Jeong Y., Rohr K., Lutsik P. MethylBERT: A Transformer-based model for read-level DNA methylation pattern identification and tumour deconvolution. bioRxiv. 2023 doi: 10.1101/2023.10.29.564590. Preprint at. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib57] 43.Arneson D., Yang X., Wang K. MethylResolver—a method for deconvoluting bulk DNA methylation profiles into known and unknown cell contents. Commun. Biol. 2020;3:422. doi: 10.1038/s42003-020-01146-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib58] 44.Zhang Z., Wiencke J.K., Kelsey K.T., Koestler D.C., Christensen B.C., Salas L.A. HiTIMED: hierarchical tumor immune microenvironment epigenetic deconvolution for accurate cell type resolution in the tumor microenvironment using tumor-type-specific DNA methylation data. J. Transl. Med. 2022;20:516. doi: 10.1186/s12967-022-03736-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib59] 45.Chakravarthy A., Furness A., Joshi K., Ghorani E., Ford K., Ward M.J., King E.V., Lechner M., Marafioti T., Quezada S.A., et al. Pan-cancer deconvolution of tumour composition using DNA methylation. Nat. Commun. 2018;9:3220. doi: 10.1038/s41467-018-05570-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib60] 46.Zhang H., Cai R., Dai J., Sun W. EMeth: An EM algorithm for cell type decomposition based on DNA methylation data. Sci. Rep. 2021;11:5717. doi: 10.1038/s41598-021-84864-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib61] 47.He D., Chen M., Wang W., Song C., Qin Y. Deconvolution of tumor composition using partially available DNA methylation data. BMC Bioinf. 2022;23:355. doi: 10.1186/s12859-022-04893-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib62] 48.Middleton L.Y.M., Dou J., Fisher J., Heiss J.A., Nguyen V.K., Just A.C., Faul J., Ware E.B., Mitchell C., Colacino J.A., M Bakulski K. Saliva cell type DNA methylation reference panel for epidemiological studies in children. Epigenetics. 2022;17:161–177. doi: 10.1080/15592294.2021.1890874. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib63] 49.Zhu T., Liu J., Beck S., Pan S., Capper D., Lechner M., Thirlwell C., Breeze C.E., Teschendorff A.E. A pan-tissue DNA methylation atlas enables in silico decomposition of human tissue methylomes at cell-type resolution. Nat. Methods. 2022;19:296–306. doi: 10.1038/s41592-022-01412-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib64] 50.Muse M.E., Carroll C.D., Salas L.A., Karagas M.R., Christensen B.C. Application of Novel Breast Biospecimen Cell-Type Adjustment Identifies Shared DNA Methylation Alterations in Breast Tissue and Milk with Breast Cancer-Risk Factors. Biomarkers Prevention. 2023;32:550–560. doi: 10.1158/1055-9965.EPI-22-0405. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib65] 51.Consortium R.E., A K., W M. Integrative analysis of 111 reference human epigenomes. Nature. 2015;518:317–329. doi: 10.1038/nature14248. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib66] 52.Moss J., Magenheim J., Neiman D., Zemmour H., Loyfer N., Korach A., Samet Y., Maoz M., Druid H., Arner P., et al. Comprehensive human cell-type methylation atlas reveals origins of circulating cell-free DNA in health and disease. Nat. Commun. 2018;9:5068. doi: 10.1038/s41467-018-07466-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib67] 53.Cai M., Yue M., Chen T., Liu J., Forno E., Lu X., Billiar T., Celedón J., McKennan C., Chen W., Wang J. Robust and accurate estimation of cellular fraction from tissue omics data via ensemble deconvolution. Bioinformatics. 2022;38:3004–3010. doi: 10.1093/bioinformatics/btac279. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib68] 54.Cheng A.P., Burnham P., Lee J.R., Cheng M.P., Suthanthiran M., Dadhania D., De Vlaminck I. A cell-free DNA metagenomic sequencing assay that integrates the host injury response to infection. Proc. Natl. Acad. Sci. USA. 2019;116:18738–18744. doi: 10.1073/pnas.1906320116. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib28] 55.George-Gay B., Parker K. Understanding the complete blood count with differential. J. Perianesth. Nurs. 2003;18:96–117. doi: 10.1053/jpan.2003.50013. [DOI] [PubMed] [Google Scholar]

[bib30] 56.Salas L.A., Koestler D.C., Butler R.A., Hansen H.M., Wiencke J.K., Kelsey K.T., Christensen B.C. An optimized library for reference-based deconvolution of whole-blood biospecimens assayed using the Illumina HumanMethylationEPIC BeadArray. Genome Biol. 2018;19:64. doi: 10.1186/s13059-018-1448-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib31] 57.Dieckmann L., Cruceanu C., Lahti-Pulkkinen M., Lahti J., Kvist T., Laivuori H., Sammallahti S., Villa P.M., Suomalainen-König S., Rancourt R.C., et al. Reliability of a novel approach for reference-based cell type estimation in human placental DNA methylation studies. Cell. Mol. Life Sci. 2022;79:115. doi: 10.1007/s00018-021-04091-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib32] 58.Kaushal A., Zhang H., Karmaus W.J., Wang J.S. Which methods to choose to correct cell types in genome-scale blood-derived DNA methylation data? BMC Bioinf. 2015;16:P7. doi: 10.1186/1471-2105-16-S15-P7. [DOI] [Google Scholar]

[bib33] 59.Teschendorff A.E., Relton C.L. Statistical and integrative system-level analysis of DNA methylation data. Nat. Rev. Genet. 2018;19:129–147. doi: 10.1038/nrg.2017.86. [DOI] [PubMed] [Google Scholar]

[bib34] 60.Houseman E.A., Accomando W.P., Koestler D.C., Christensen B.C., Marsit C.J., Nelson H.H., Wiencke J.K., Kelsey K.T. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinf. 2012;13:86. doi: 10.1186/1471-2105-13-86. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib36] 61.Newman A.M., Liu C.L., Green M.R., Gentles A.J., Feng W., Xu Y., Hoang C.D., Diehn M., Alizadeh A.A. Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods. 2015;12:453–457. doi: 10.1038/nmeth.3337. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib37] 62.Teschendorff A.E., Breeze C.E., Zheng S.C., Beck S. A comparison of reference-based algorithms for correcting cell-type heterogeneity in Epigenome-Wide Association Studies. BMC Bioinf. 2017;18:105. doi: 10.1186/s12859-017-1511-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib38] 63.Fortin J.-P., Triche T.J., Hansen K.D. Preprocessing, normalization and integration of the Illumina HumanMethylationEPIC array with minfi. Bioinformatics. 2017;33:558–560. doi: 10.1093/bioinformatics/btw691. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib41] 64.Teschendorff A.E., Zhu T., Breeze C.E., Beck S. EPISCORE: cell type deconvolution of bulk tissue DNA methylomes from single-cell RNA-Seq data. Genome Biol. 2020;21:221. doi: 10.1186/s13059-020-02126-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib42] 65.Fridley B., Wang X., editors. Statistical Genomics. Springer US; 2023. [DOI] [Google Scholar]

[bib69] 66.Leek J.T., Johnson W.E., Parker H.S., Jaffe A.E., Storey J.D. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics. 2012;28:882–883. doi: 10.1093/bioinformatics/bts034. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib70] 67.Zou J., Lippert C., Heckerman D., Aryee M., Listgarten J. Epigenome-wide association studies without the need for cell-type composition. Nat. Methods. 2014;11:309–311. doi: 10.1038/nmeth.2815. [DOI] [PubMed] [Google Scholar]

[bib71] 68.Rahmani E., Zaitlen N., Baran Y., Eng C., Hu D., Galanter J., Oh S., Burchard E.G., Eskin E., Zou J., Halperin E. Sparse PCA corrects for cell type heterogeneity in epigenome-wide association studies. Nat. Methods. 2016;13:443–445. doi: 10.1038/nmeth.3809. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib72] 69.Houseman E.A., Molitor J., Marsit C.J. Reference-free cell mixture adjustments in analysis of DNA methylation data. Bioinformatics. 2014;30:1431–1439. doi: 10.1093/bioinformatics/btu029. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib73] 70.Houseman E.A., Kile M.L., Christiani D.C., Ince T.A., Kelsey K.T., Marsit C.J. Reference-free deconvolution of DNA methylation data and mediation by cell composition effects. BMC Bioinf. 2016;17:259. doi: 10.1186/s12859-016-1140-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib74] 71.Zheng S.C., Beck S., Jaffe A.E., Koestler D.C., Hansen K.D., Houseman A.E., Irizarry R.A., Teschendorff A.E. Correcting for cell-type heterogeneity in epigenome-wide association studies: revisiting previous analyses. Nat. Methods. 2017;14:216–217. doi: 10.1038/nmeth.4187. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib75] 72.Kaushal A., Zhang H., Karmaus W.J.J., Ray M., Torres M.A., Smith A.K., Wang S.L. Comparison of different cell type correction methods for genome-scale epigenetics studies. BMC Bioinf. 2017;18 doi: 10.1186/s12859-017-1611-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib76] 73.Troyanskaya O., Cantor M., Sherlock G., Brown P., Hastie T., Tibshirani R., Botstein D., Altman R.B. Missing value estimation methods for DNA microarrays. Bioinformatics. 2001;17:520–525. doi: 10.1093/bioinformatics/17.6.520. [DOI] [PubMed] [Google Scholar]

[bib77] 74.Di Lena P., Sala C., Prodi A., Nardini C. Missing value estimation methods for DNA methylation data. Bioinformatics. 2019;35:3786–3793. doi: 10.1093/bioinformatics/btz134. [DOI] [PubMed] [Google Scholar]

[bib78] 75.Lena P.D., Sala C., Prodi A., Nardini C. Methylation data imputation performances under different representations and missingness patterns. BMC Bioinf. 2020;21:268. doi: 10.1186/s12859-020-03592-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib80] 76.Lutsik P., Slawski M., Gasparoni G., Vedeneev N., Hein M., Walter J. MeDeCom: Discovery and quantification of latent components of heterogeneous methylomes. Genome Biol. 2017;18 doi: 10.1186/s13059-017-1182-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib81] 77.Thompson M., Chen Z.J., Rahmani E., Halperin E. CONFINED: Distinguishing biological from technical sources of variation by leveraging multiple methylation datasets. Genome Biol. 2019;20 doi: 10.1186/s13059-019-1743-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib82] 78.Onuchic V., Hartmaier R.J., Boone D.N., Samuels M.L., Patel R.Y., White W.M., Garovic V.D., Oesterreich S., Roth M.E., Lee A.V., Milosavljevic A. Epigenomic deconvolution of breast tumors reveals metabolic coupling between constituent cell types. Cell Rep. 2016;17:2075–2086. doi: 10.1016/j.celrep.2016.10.057. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib83] 79.Rahmani E., Schweiger R., Shenhav L., Wingert T., Hofer I., Gabel E., Eskin E., Halperin E. BayesCCE: a Bayesian framework for estimating cell-type composition from DNA methylation without the need for methylation reference. Genome Biol. 2018;19:141. doi: 10.1186/s13059-018-1513-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib84] 80.Zhang W., Xu H., Qiao R., Zhong B., Zhang X., Gu J., Zhang X., Wei L., Wang X. ARIC: accurate and robust inference of cell type proportions from bulk gene expression or DNA methylation data. Brief. Bioinform. 2022;23 doi: 10.1093/bib/bbab362. [DOI] [PubMed] [Google Scholar]

[bib85] 81.Zhang W., Wu H., Li Z. Complete deconvolution of DNA methylation signals from complex tissues: a geometric approach. Bioinformatics. 2021;37:1052–1059. doi: 10.1093/bioinformatics/btaa930. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib86] 82.Chen L., Wu C.-T., Wang N., Herrington D.M., Clarke R., Wang Y. debCAM: a bioconductor R package for fully unsupervised deconvolution of complex tissues. Bioinformatics. 2020;36:3927–3929. doi: 10.1093/bioinformatics/btaa205. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib87] 83.Li Z., Wu H. TOAST: Improving reference-free cell composition estimation by cross-cell type differential analysis. Genome Biol. 2019;20:190. doi: 10.1186/s13059-019-1778-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib88] 84.Zou J.Y. In: Population Epigenetics: Methods and Protocols Methods in Molecular Biology. Haggarty P., Harrison K., editors. Springer; 2017. Correcting for Sample Heterogeneity in Methylome-Wide Association Studies; pp. 107–114. [DOI] [PubMed] [Google Scholar]

[bib89] 85.Scherer M., Nazarov P.V., Toth R., Sahay S., Kaoma T., Maurer V., Vedeneev N., Plass C., Lengauer T., Walter J., Lutsik P. Reference-free deconvolution, visualization and interpretation of complex DNA methylation data using DecompPipeline, MeDeCom and FactorViz. Nat. Protoc. 2020;15:3240–3263. doi: 10.1038/s41596-020-0369-6. [DOI] [PubMed] [Google Scholar]

[bib79] 86.Ng B., White C.C., Klein H.-U., Sieberts S.K., McCabe C., Patrick E., Xu J., Yu L., Gaiteri C., Bennett D.A., et al. An xQTL map integrates the genetic architecture of the human brain’s transcriptome and epigenome. Nat. Neurosci. 2017;20:1418–1426. doi: 10.1038/nn.4632. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib90] 87.Su X., Yan X., Tsai C.-L. Linear regression. WIREs Computational Stats. 2012;4:275–294. doi: 10.1002/wics.1198. [DOI] [Google Scholar]

[bib91] 88.Alin A. Multicollinearity. WIREs Computational Stats. 2010;2:370–374. doi: 10.1002/wics.84. [DOI] [Google Scholar]

[bib92] 89.Shrestha N. Detecting Multicollinearity in Regression Analysis. Am. J. Appl. Math. Stat. 2020;8:39–42. doi: 10.12691/ajams-8-2-1. [DOI] [Google Scholar]

[bib93] 90.Green S.B. How Many Subjects Does It Take To Do A Regression Analysis. Multivariate Behav. Res. 1991;26:499–510. doi: 10.1207/s15327906mbr2603_7. [DOI] [PubMed] [Google Scholar]

[bib94] 91.Abonazel M.R., Rabie A.R. The impact of using robust estimations in regression models: an application on the Egyptian economy. J. Adv. Res. Appl. Math. Stat. 2019;4:8–16. [Google Scholar]

[bib95] 92.Leek J.T., Storey J.D. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 2007;3:1724–1735. doi: 10.1371/journal.pgen.0030161. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib96] 93.Rahmani E., Schweiger R., Rhead B., Criswell L.A., Barcellos L.F., Eskin E., Rosset S., Sankararaman S., Halperin E. Cell-type-specific resolution epigenetics without the need for cell sorting or single-cell biology. Nat. Commun. 2019;10:3417. doi: 10.1038/s41467-019-11052-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib97] 94.Zheng S.C., Breeze C.E., Beck S., Teschendorff A.E. Identification of differentially methylated cell types in epigenome-wide association studies. Nat. Methods. 2018;15:1059–1066. doi: 10.1038/s41592-018-0213-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib98] 95.Chen L., Li Z., Wu H. CeDAR: incorporating cell type hierarchy improves cell type-specific differential analyses in bulk omics data. Genome Biol. 2023;24:37. doi: 10.1186/s13059-023-02857-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib99] 96.Newman A.M., Steen C.B., Liu C.L., Gentles A.J., Chaudhuri A.A., Scherer F., Khodadoust M.S., Esfahani M.S., Luca B.A., Steiner D., et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat. Biotechnol. 2019;37:773–782. doi: 10.1038/s41587-019-0114-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib100] 97.Trapnell C., Cacchiarelli D., Grimsby J., Pokharel P., Li S., Morse M., Lennon N.J., Livak K.J., Mikkelsen T.S., Rinn J.L. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 2014;32:381–386. doi: 10.1038/nbt.2859. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib101] 98.Ji Z., Ji H. TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis. Nucleic Acids Res. 2016;44:e117. doi: 10.1093/nar/gkw430. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib102] 99.Van den Berge K., Roux de Bézieux H., Street K., Saelens W., Cannoodt R., Saeys Y., Dudoit S., Clement L. Trajectory-based differential expression analysis for single-cell sequencing data. Nat. Commun. 2020;11:1201. doi: 10.1038/s41467-020-14766-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib103] 100.Fischer D.S., Theis F.J., Yosef N. Impulse model-based differential expression analysis of time course sequencing data. Nucleic Acids Res. 2018;46:e119. doi: 10.1093/nar/gky675. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Examining cellular heterogeneity in human DNA methylation studies: Overview and recommendations

Maggie Po-Yuan Fu

Sarah Martin Merrill

Keegan Korthauer

Michael Steffen Kobor

Summary

Introduction

Figure 1.

Box 1. Pseudocode for package installation in R.

Table 1.

Box 2. Pseudocode for reading raw IDAT files into R, and minimal preprocessing for beta matrix extraction.

Obtaining ISCH information

Reference-based ISCH prediction

Figure 2.

Example: Estimating cell type proportions in blood

Box 3. Pseudocode for estimating blood cell type proportions.

Box 4. Pseudocode for obtaining coefficient table of mean cell-type-specific DNAme levels from estimateCellCounts2.

Box 5. Pseudocode for estimating buccal swabs or saliva cell type proportions.

Example: Estimating cell type proportion in buccal swabs or saliva

Example: Estimating cell type proportion in the brain

Box 6. Pseudocode for estimating brain cell type proportions.

Multi-omics ISCH predictors

Reference-free ISCH prediction

Box 7. Pseudocode for reference-free ISCH prediction with sva.

Box 8. Pseudocode for comparing mean variance explained, as measured by R2, by different ISCH estimation methods.

Table 2.

Accounting for intersample differences in cellular composition

Use robust linear regression and include ISCH information as covariates

Box 9. Pseudocode for principal component calculations of ISCH variables.

Box 10. Pseudocode for accounting for ISCH variables in EWAS with robust regression.

Adjusting for ISCH by residualization

Box 11. Pseudocode for residualizing ISCH variability on DNAme levels.

Accounting for noncompositional estimates

Investigating cell-type-specific associations of variables of interest and DNAme

Box 12. Pseudocode for identifying cell-type-specific associations with CellDMC.

Box 13. Pseudocode for identifying cell-type-specific associations with CeDAR.

Box 14. Pseudocode for identifying cell-type-specific associations with TCA.

Conclusion

Resource availability

Acknowledgments

Author contributions

Declaration of interests

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases