Skip to main content
STAR Protocols logoLink to STAR Protocols
. 2023 May 15;4(2):102307. doi: 10.1016/j.xpro.2023.102307

Integrating single-cell transcriptomes, chromatin accessibility, and multiomics analysis of mesoderm-induced embryonic stem cells

Kyung Dae Ko 1,4,5,, Kan Jiang 2,4,5,∗∗, Stefania Dell’Orso 3, Vittorio Sartorelli 1,6,∗∗∗
PMCID: PMC10199178  PMID: 37192048

Summary

Here, we present workflows for integrating independent transcriptomic and chromatin accessibility datasets and analyzing multiomics. First, we describe steps for integrating independent transcriptomic and chromatin accessibility measurements. Next, we detail multimodal analysis of transcriptomes and chromatin accessibility performed in the same sample. We demonstrate their use by analyzing datasets obtained from mouse embryonic stem cells induced to differentiate toward mesoderm-like, myogenic, or neurogenic phenotypes.

For complete details on the use and execution of this protocol, please refer to Khateb et al.1

Subject areas: Bioinformatics, Computer sciences

Graphical abstract

graphic file with name fx1.jpg

Highlights

  • Integration of scRNA-seq and scATAC-seq from independent datasets

  • Multiomics of snRNA-seq and snATAC-seq from the same sample

  • Inference of cell states from multiomics pseudotime


Publisher’s note: Undertaking any experimental protocol requires adherence to local institutional guidelines for laboratory safety and ethics.


Here, we present workflows for integrating independent transcriptomic and chromatin accessibility datasets and analyzing multiomics. First, we describe steps for integrating independent transcriptomic and chromatin accessibility measurements. Next, we detail multimodal analysis of transcriptomes and chromatin accessibility performed in the same sample. We demonstrate their use by analyzing datasets obtained from mouse embryonic stem cells induced to differentiate toward mesoderm-like, myogenic, or neurogenic phenotypes.

Before you begin

Hardware preparation

A computer with a MacOS or Window operation system and network connection is required. The RAM requirement depends on the number of cells to be analyzed. 16 GB RAM should be sufficient for an initial analysis. If more than 10,000 cells are analyzed, computer clusters over 32 GB with a Linux operation system are required.

Software preparation

Inline graphicTiming: 1 h(for step 1)

The applications described in this section are required for the analysis of single cell (sc)RNA- seq, single cell (sc)ATAC-seq analysis, integration of scRNA-seq and scATAC-seq datasets, and multiomics analysis.

  • 1.

    Prepare docker environment troubleshooting 1.

    For scRNA-seq, scATAC-seq, multiomics analysis, and data integration, single-cell analysis tools in R platform are required. To avoid conflicts of R libraries installation, docker developing environment is used.
    • a.
      Access docker webpage (https://www.docker.com/) and install the latest version of Docker Desktop.
    • b.
      Pull docker image from docker hub.
      > docker pull holyone70/mesoderm_pipeline:mesoderm_pipeline
    • c.
      Run docker image to prepare R developing environment.
      > docker run -e PASSWORD=rstudio -p 8787:8787 --name mesoderm_pipeline holyone70/mesoderm_pipeline:mesoderm_pipeline
    • d.
      Run web browser and put local address of R server (http://localhost:8787).
    • e.
      Put username (rstudio) and password (rstudio).
    • f.
      Check availability of R packages troubleshooting 2.
    • g.
      Check availability of data files troubleshooting 3.
    • h.
      Scripts order for running.
      • i.
        scRNA_analysis.R.
      • ii.
        scATAC_analysis.R.
      • iii.
        int_scRNA.R.
      • iv.
        int_scRNA_scATAC.R.
      • v.
        multiomics_anal.R.
    • i.
      How to run the R scripts.
      • i.
        Clean the environment by clicking the broom symbol located at upright corner.
      • ii.
        Go to file → open the star_protocol project → select the R script of interest to run at the File, Packages, Help panel.
        Note: To pull and run docker image, terminal is used in Mac OS and LINUX, and power shell is used in Window OS. Command lines to download data were located on the top of each R script, which were commented.
        Inline graphicCRITICAL: To successfully run Multiomics_anal.R, minimal the Docker resources requirement would be 32 g memory, 8 CPU and 4 GB swap.

Data collection

Inline graphicTiming:30min(for step2)

Single cell datasets analyzed in this protocol were deposited into GEO repository (GSE198730). scRNA-seq datasets consist of barcodes, features, and matrix, and scATAC-seq datasets contain barcodes, fragments, matrix, and peaks. The time points for each dataset are described in Figure 1.

Figure 1.

Figure 1

Scheme illustrating ESCs differentiation time points

Key resources table

REAGENT or RESOURCE SOURCE IDENTIFIER
Deposited data

Single cell RNA-seq datasets Khateb et al.1 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE198730 (GSE198730_aPSM scRNA_rep1_barcodes.tsv.gz
, GSE198730_aPSM _scRNA_rep1_features.tsv.gz, GSE198730_aPSM _scRNA_rep1_matrix.mtx.gz)
Single cell ATAC-seq datasets Khateb et al.1 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE198730 (aPSM _scATAC_rep1_filtered_peak_bc_matrix.h5)
Single cell omics datasets Khateb et al.1 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE198730 (GSE198730_HIFLR_snRNA_barcodes.tsv.gz, GSE198730_HIFLR _snRNA_features.tsv.gz, GSE198730_HIFLR_snRNA_matrix.mtx.gz, GSE198730_HIFLR_snATAC_fragments.tsv.gz
GSE198730_HIFLR_snATAC_fragments.tsv.gz.tbi.gz)
Github repository Single cell RNA-seq, Single cell ATAC-seq, Multiome single nuclei ATAC and gene expression https://github.com/LMSCGR/mesoderm_induced_ESCs_pipeline (HIFLR_snATAC_fragments.tsv.gz.tbi,aPSM_fragments.tsv.gz.tbi,cell_cycle.txt,naive_instructed_esc.csv,aPSM_f.txt)

Software and algorithms

BioRender https://biorender.com/
R v4.2.2 The R Project for Statistical Computing https://www.r-project.org/
RStudio server (v 2022.12.0+353) RStudio Team2 https://posit.co/
Seurat v4.3.0 Stuart et al.3 https://cran.r-project.org/web/packages/Seurat/index.html
Signac v1.9.0 Stuart et al.4 https://satijalab.org/signac
Harmony v0.1.1 Korsunsky et al.5 https://github.com/immunogenomics/harmony
Monocle3 v1.3.1 Cao et al.6 https://cole-trapnell-lab.github.io/monocle3/
JASPAR 2020 v 0.99.10 Fornes et al.7 https://bioconductor.org/packages/release/data/annotation/html/JASPAR2020.html
TFBSTools v 1.36.0 Tan and Lenhard8 https://bioconductor.org/packages/release/bioc/html/TFBSTools.html
SeuratWrappers v0.3.1 https://github.com/satijalab/seurat-wrappers

Other

Local computer – memory: 16GB required, 32GB recommended; processors: 4 required, 8 recommended N/A N/A

Step-by-step method details

Part 1: Single cell RNA seq analysis

Inline graphicTiming: 1 h(for step 1 to step 9)

In this section, we describe essential steps to analyze scRNA-seq datasets.

library(dplyr)

library(Seurat)

library(monocle3)

library(SeuratWrappers)

# for plotting

library(ggplot2)

library(patchwork)

set.seed(1234)

aPSM.matrix <- Read10X(data.dir ="./aPSM_scRNA/filtered_feature_bc_matrix/")

  • 2.

    Create Seurat object.

aPSM <- CreateSeuratObject(counts = aPSM.matrix, min.cells = 3, min.features = 200, project = “ aPSM”)

Note: Options for min.cell and min.features are selected as default values from Seurat tutorials (https://satijalab.org/seurat/articles/pbmc3k_tutorial.html). File names should be barcodes.tsv.gz, features.tsv.gz, and matrix.mtx.gz. For Window OS, “.\\filtered_feature_bc_matrix\\ can be used.

aPSM[["percent.mt"]] <- PercentageFeatureSet(aPSM, pattern = "ˆmt-")

 VlnPlot(aPSM, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)

plot1 <- FeatureScatter(aPSM, feature1 = "nCount_RNA", feature2 = "percent.mt")

plot2 <- FeatureScatter(aPSM, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")

plot1+plot2

aPSM <- subset(aPSM, subset = nFeature_RNA > 0 & nFeature_RNA < 8000 & percent.mt < 20)

  • 4.

    Preprocess data and select features for the analysis.

aPSM <- NormalizeData(object = aPSM, normalization.method = "LogNormalize", scale.factor = 1e4)

aPSM <- FindVariableFeatures(aPSM, selection.method = "vst", nfeatures = 2000)

aPSM_top10 <- head(VariableFeatures(aPSM), 10)

plot1 <- VariableFeaturePlot(aPSM)

plot2 <- LabelPoints(plot = plot1, points = aPSM_top10, repel = TRUE)

plot1+plot2

aPSM.all.genes <- rownames(aPSM)

aPSM <- ScaleData(aPSM, features = aPSM.all.genes)

  • 5.

    Filter cell cycle genes.

convertHumanGeneList <- function(x){

 require("biomaRt")

 human <- useMart("ensembl", dataset = "hsapiens_gene_ensembl" , host = "https://dec2021.archive.ensembl.org/")

 mouse <- useMart("ensembl", dataset = "mmusculus_gene_ensembl" ,host = "https://dec2021.archive.ensembl.org/")

 tmp <- getLDS(attributes = c("hgnc_symbol"), filters = "hgnc_symbol", values = x , mart = human, attributesL = c("mgi_symbol"), martL = mouse, uniqueRows=TRUE)

 mousex <- unique(tmp[,2])

 return(mousex)}

s.genes <- convertHumanGeneList(cc.genes.updated.2019$s.genes)

g2m.genes <- convertHumanGeneList(cc.genes.updated.2019$g2m.genes)

cell_cycle <- t(read.csv(file="aPSM_scRNA/cell_cycle.txt",header=F))[,1]

filtered_genes <- c(s.genes,cell_cycle)

  • 6.

    Filter cell cycle genes, reduce dimensions and establish dataset dimensionality (Figure 2B).

aPSM <- RunPCA(object = aPSM, features = VariableFeatures(object = aPSM), verbose = FALSE)

aPSM <- CellCycleScoring(aPSM, s.features = filtered_genes, g2m.features = g2m.genes, set.ident = TRUE)

aPSM <- ScaleData(aPSM, vars.to.regress = c("S.Score", "G2M.Score"), features = rownames(aPSM))

aPSM <- JackStraw(aPSM, num.replicate = 100)

aPSM <- ScoreJackStraw(aPSM, dims = 1:20)

ElbowPlot(object = aPSM,ndims =50)

  • 7.

    Cluster and visualize cells (Figure 3A).

aPSM <- FindNeighbors(object = aPSM, dims = 1:30)

aPSM <- FindClusters(object = aPSM, resolution = 0.25)

aPSM <- RunTSNE(object = aPSM, dims = 1:30)

aPSM <- RunUMAP(object = aPSM, dims = 1:30)

DimPlot(object=aPSM,reduction='umap',label=T)+labs(title = " aPSM")

save(aPSM,file="aPSM_scRNA.RData")

  • 8.

    Analyze unique features of each cluster.

aPSM.markers <- FindAllMarkers(aPSM, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)

aPSM.markers_table <- aPSM.markers %>%group_by(cluster) %>% slice_max(n = 20, order_by = avg_log2FC)

Note: Options for min.pct and logfc.threshold are selected as default values from Seurat tutorials (https://satijalab.org/seurat/articles/pbmc3k_tutorial.html).

  • 9.

    Visualize clusters pseudotime (Figure 3B).

DefaultAssay(aPSM) <- "RNA"

aPSM_cds <- as.cell_data_set(aPSM)

aPSM_cds <- cluster_cells(aPSM_cds,reduction="UMAP",k = 30,resolution = 0.00012)

aPSM_cds <- learn_graph(aPSM_cds, close_loop = F,use_partition = T,learn_graph_control =list(minimal_branch_len=5))

plot_cells(aPSM_cds, label_groups_by_cluster = T, label_leaves = F, label_branch_points = T,graph_label_size = 3)

aPSM.min.umap <- which.min(unlist(FetchData(aPSM, "UMAP_2")))

aPSM.min.umap <- colnames(aPSM)[aPSM.min.umap]

aPSM_cds <- order_cells(aPSM_cds, root_cells = aPSM.min.umap)

plot_cells(aPSM_cds, color_cells_by = "pseudotime", label_cell_groups =T, label_leaves = F, label_branch_points = F,show_trajectory_graph = T,graph_label_size = 3,label_groups_by_cluster = T)

Figure 2.

Figure 2

scRNA-seq data quality control

(A) Violin plots of scRNA-seq data of aPSM scRNA-seq data. mRNA counts (nCount_RNA), number of detected genes(nFeature_RNA), mitochondria gene percentage (percent.mt).

(B) Elbowplot of aPSM scRNA-seq data describing the standard deviations of the principal components (PC).

Figure 3.

Figure 3

Visualization of aPSM scRNA-seq data

(A) UMAP plot of aPSM scRNA-seq data.

(B) Pseudotime plot of aPSM scRNA-seq data. The heatmap represents units of progress, with 1 located at the root of the trajectory.

Part 1: Single cell ATAC seq analysis

Inline graphicTiming: 1 h(for step 10 to step 17)

In this section, we describe steps to evaluate chromatin accessibility using scATAC-seq.

  • 10.

    Load datasets using Signac package.

library(Signac)

library(Seurat)

library(GenomeInfoDb)

library(EnsDb.Mmusculus.v79)

library(patchwork)

set.seed(1234)

aPSM.counts <- Read10X_h5("./aPSM_scATAC/ filtered_peak_bc_matrix.h5")

aPSM_meta <- read.table("./aPSM_scATAC/singlecell.csv.gz", sep = ",", header = TRUE, row.names = 1)

aPSM_chrom_assay <- CreateChromatinAssay(

 counts = aPSM.counts,

 sep = c(":","-"),

 genome = 'mm10',

 fragments = './aPSM_scATAC/filtered_feature_bc_matrix/fragments.tsv.gz', min.cells = 3, min.features = 100)

Note: Options for min.cell and min.features are selected as default values from Signac tutorials (https://stuartlab.org/signac/articles/pbmc_vignette.html).

  • 11.

    Create Seurat object.

aPSM_atac <- CreateSeuratObject(

counts = aPSM_chrom_assay,

assay = 'aPSM_peaks',

project = 'aPSM_atac',

meta.data = aPSM_meta[colnames(aPSM_chrom_assay),])

  • 12.

    Add annotation information.

annotations <- GetGRangesFromEnsDb(ensdb = EnsDb.Mmusculus.v79)

# change to UCSC style since the data was mapped to mm10

seqlevelsStyle(annotations) <- 'UCSC'

genome(annotations) <- "mm10"

# add the gene information to the object

Annotation(aPSM_atac) <- annotations

  • 13.

    Select cells for analysis through QC (Figure 4A).

aPSM_atac <- NucleosomeSignal(object = aPSM_atac)

# compute TSS enrichment score per cell

aPSM_atac <- TSSEnrichment(object = aPSM_atac, fast = FALSE)

aPSM_atac$pct_reads_in_peaks <- aPSM_atac$peak_region_fragments / aPSM_atac$passed_filters ∗ 100

aPSM_atac$blacklist_ratio <- aPSM_atac$blacklist_region_fragments / aPSM_atac$peak_region_fragments

VlnPlot(

 object = aPSM_atac,

 features = c('pct_reads_in_peaks', 'peak_region_fragments',

 'TSS.enrichment', 'blacklist_ratio', 'nucleosome_signal'),pt.size = 0.1, ncol = 5)

  • 14.

    Preprocess data and select features for the analysis.

FeatureScatter(aPSM_atac, feature1 = "peak_region_fragments", feature2 = "nCount_aPSM_peaks")

aPSM_atac <- subset(

 x = aPSM_atac,

 subset = peak_region_fragments > 2586 &

  peak_region_fragments < 20000 & pct_reads_in_peaks > 15 & blacklist_ratio < 0.05)

ncol(aPSM_atac)

VlnPlot(

 object = aPSM_atac,

features = c('nucleosome_signal','peak_region_fragments'),pt.size = 0.1) + NoLegend()

FeatureScatter(aPSM_atac, feature1 = "peak_region_fragments", feature2 = "nCount_aPSM_peaks")

  • 15.

    Reduce dimensions.

aPSM_atac <- RunTFIDF(aPSM_atac)

aPSM_atac <- FindTopFeatures(aPSM_atac, min.cutoff = 'q0')

aPSM_atac <- RunSVD(

 object = aPSM_atac, assay = 'aPSM_peaks',

 reduction.key = 'LSI_', reduction.name = 'lsi')

  • 16.

    Cluster and visualize cells (Figure 4B).

library(ggplot2)

aPSM_atac <- RunUMAP(object = aPSM_atac, reduction = 'lsi', dims = 1:40)

aPSM_atac <- RunTSNE(object = aPSM_atac, reduction = 'lsi', dims = 1:40)

aPSM_atac <- FindNeighbors(object = aPSM_atac, reduction = 'lsi', dims = 1:40)

aPSM_atac <- FindClusters(object = aPSM_atac, verbose = FALSE,resolution=0.25)

DimPlot(object = aPSM_atac, label = F,reduction = 'umap') +labs(title = " aPSM scATAC")

  • 17.

    Calculate gene activities and add them to Seurat object.

aPSM_gene.activities <- GeneActivity(aPSM_atac)

save(aPSM_gene.activities,file="aPSM_atac_gene.activities.RData")

# add the gene activity matrix to the Seurat object as a new assay and normalize it

aPSM_atac[['RNA']] <- CreateAssayObject(counts = aPSM_gene.activities)

aPSM_atac <- NormalizeData(

 object = aPSM_atac, assay = 'RNA', normalization.method = 'LogNormalize',scale.factor = median(aPSM_atac$nCount_RNA) )

save(aPSM_atac,file="aPSM_scATAC.RData")

Figure 4.

Figure 4

Quality control and visualization of integration between scATAC and scRNA-seq data

(A) Features distribution of aPSM scATAC-seq data.

(B) UMAP plot of aPSM scATAC-seq datasets.

(C) Features distribution between naïve and instructed scRNA-seq datasets.

(D) UMAP plots before and after Harmony integration.

Part 1: Integrated data analysis

Inline graphicTiming:1h(for step 18 to step 19)

In this section, we describe steps to integrate and analyze data from different platforms. Users can infer the changes of gene expressions during time points or relations between gene expressions and chromatin accessibility during cellular differentiation.

  • 18.
    Integrate single cell RNA seq datasets.
    • a.
      Prepare R library for the integration.
      library(dplyr)
      library(Seurat)
      library(harmony)
      library(data.table)
      library(parallel)
      set.seed(1234)
      # Set number of cores to use
      NCORES = 1
      meta <- fread("naive_instructed_esc.csv")
    • b.
      Load datasets.
      data_dir <- list("./naive_scRNA/","./instructed_scRNA/")
      mat.list <- list()
      soupx.used <- list()
      for(i in 1:length(data_dir)){
       mat.list[[i]] <- Read10X(data.dir = paste0(data_dir[i], 'filtered_feature_bc_matrix'))
       soupx.used[[i]] <- F}
      cat(sum(unlist(lapply(mat.list, ncol))),"cells (total) loaded...\n")
      sample_num<-min(ncol(mat.list[[1]]),ncol(mat.list[[2]]))
      sel.id<-sample(colnames(mat.list[[2]]), size=sample_num, replace=FALSE)
      mat.list[[2]]<-mat.list[[2]][,sel.id]
      Note: Files should be in filtered_feature_bc_matrix folder, and file names should be barcodes.tsv.gz, features.tsv.gz, and matrix.mtx.gz.
    • c.
      Create Seurat objects.
      seu.list <- list()
      seu.list <- mclapply(
       mat.list,
       FUN = function(mat){
        return(CreateSeuratObject(
         counts = mat, min.features = 200, min.cells = 3,project = 'naive_instructed_data'))
       }, mc.cores = NCORES)
      for(i in 1:length(seu.list)){
       cat(' ------------------------------------\n',
        '--- Processing dataset number ', i, '-\n',
        '------------------------------------\n')
       # Add meta data
       for(md in colnames(meta)){
        seu.list[[i]][[md]] <- meta[[md]][i]
       }
       # add %MT
       seu.list[[i]][["percent.mt"]] <- PercentageFeatureSet(seu.list[[i]], pattern = "mt-")
       # Filter out low quality cells according to the metrics defined above
       seu.list[[i]] <- subset(seu.list[[i]],
            subset = nFeature_RNA > 1600 & nFeature_RNA < 8000 & percent.mt < 20)
       # Only mito and floor filtering; trying to find doublets
      }
      cat((sum(unlist(lapply(mat.list, ncol)))-sum(unlist(lapply(seu.list, ncol)))),"cells (total) removed...\n")
    • d.
      Preprocess Seurat objects.
      seuPreProcess <- function(seu, assay='RNA', n.pcs=30, res=0.25){
       pca.name = paste0('pca_', assay)
       pca.key = paste0(pca.name,'_')
       umap.name = paste0('umap_', assay)
       seu = NormalizeData(
        seu
       ) %>% FindVariableFeatures(
        assay = assay,
        selection.method = "vst",
        nfeatures = 2000,
        verbose = F
       ) %>% ScaleData(
        assay = assay
       ) %>% RunPCA(
        assay = assay,
        reduction.name = pca.name,
        reduction.key = pca.key,
        verbose = F,
        npcs = n.pcs
      )
      n.pcs.use =n.pcs
       # FindNeighbors %>% RunUMAP, FindClusters
       seu <- FindNeighbors(
        seu,
        reduction = pca.name,
        dims = 1:n.pcs.use,
        force.recalc = TRUE,
        verbose = FALSE
       ) %>% RunUMAP(
        reduction = pca.name,
        dims = 1:n.pcs.use,
        reduction.name=umap.name
       )
       seu@reductions[[umap.name]]@misc$n.pcs.used <- n.pcs.use
       seu <- FindClusters(object = seu,resolution = res)
       seu[[paste0('RNA_res.',res)]] <- as.numeric(seu@active.ident)
       return(seu)
      }
      # preprocess each dataset individually
      seu.list <- lapply(seu.list, seuPreProcess)
    • e.
      Merge datasets (Figure 4C).
      tmp.list <- list()
      for(i in 1:length(seu.list)){
       DefaultAssay(seu.list[[i]]) <- "RNA"
       tmp.list[[i]] <- DietSeurat(seu.list[[i]], assays = "RNA")
      }
      # merge tmp count matrices
       scMuscle.pref.seurat <- merge(
        tmp.list[[1]],
        y = tmp.list[[2]]
      )
      VlnPlot(
       scMuscle.pref.seurat,
       features = c(
        'nCount_RNA',
        'nFeature_RNA',
        'percent.mt'
       ),
       group.by = 'source',
       pt.size = 0
      )
    • f.
      Preprocess merged data.
      # Seurat preprocessing on merged data ----
      DefaultAssay(scMuscle.pref.seurat) <- 'RNA'
      scMuscle.pref.seurat <-
       NormalizeData(
        scMuscle.pref.seurat, assay = 'RNA'
       ) %>% FindVariableFeatures(
        selection.method = 'vst',
        nfeatures = 2000,verbose = TRUE
       ) %>% ScaleData(
        assay = 'RNA',
        verbose = TRUE
       ) %>% RunPCA(
        assay = 'RNA',
        reduction.name = 'pca_RNA',
        reduction.key = 'pca_RNA_',
        verbose = TRUE,
        npcs = 50
       )
      ElbowPlot(scMuscle.pref.seurat, reduction = 'pca_RNA', ndims = 50)
    • g.
      Find clusters for individual datasets.
      n.pcs = 30
      scMuscle.pref.seurat <-
       RunUMAP(
        scMuscle.pref.seurat, reduction = 'pca_RNA',
        dims = 1:n.pcs, reduction.name='umap_RNA'
       ) %>% FindNeighbors(
        reduction = 'pca_RNA',
        dims = 1:n.pcs,
        force.recalc = TRUE,
        verbose = F
       )
      scMuscle.pref.seurat <- FindClusters(object = scMuscle.pref.seurat, resolution = 0.25)
      scMuscle.pref.seurat[['RNA_res.0.25']] <- as.numeric(scMuscle.pref.seurat@active.ident)
    • h.
      Integrate datasets using Harmony package.
      scMuscle.pref.seurat <-
       scMuscle.pref.seurat %>% RunHarmony(
        group.by.vars=c('sample'), reduction='pca_RNA',
        assay='RNA',plot_convergence = TRUE,verbose=TRUE)
      scMuscle.pref.seurat <-
       scMuscle.pref.seurat %>% RunUMAP(
        reduction = 'harmony', dims = 1:n.pcs,
        reduction.name='umap_harmony')
      scMuscle.pref.seurat@reductions$umap_harmony@misc$n.pcs.used <- n.pcs
      scMuscle.pref.seurat <-
       scMuscle.pref.seurat %>% FindNeighbors(
        reduction = 'harmony',dims = 1:n.pcs,
        graph.name = 'harmony_snn',force.recalc = TRUE,
        verbose = FALSE)
      scMuscle.pref.seurat <- FindClusters(
       object = scMuscle.pref.seurat,resolution = 1.0,
       graph.name='harmony_snn')
      scMuscle.pref.seurat[['harmony_res.1.0']] <- as.numeric(scMuscle.pref.seurat@active.ident)
      scMuscle.pref.seurat <- FindClusters(
       object = scMuscle.pref.seurat,
       resolution = 2.0, graph.name='harmony_snn')
      scMuscle.pref.seurat[['harmony_res.2.0']] <- as.numeric(scMuscle.pref.seurat@active.ident)
    • i.
      Validate integrated results (Figure 4D).
      library(cowplot)
      library(ggplot2)
      p1<-DimPlot(object = scMuscle.pref.seurat, reduction = "umap_RNA", pt.size = .1, group.by = "sample")+labs(title = "Merged by Seurat")
      p2<-DimPlot(object = scMuscle.pref.seurat, reduction = "umap_harmony", pt.size = .1, group.by = "sample")+labs(title = "Merged by Seurat with Harmony")
      p1+p2
      save(scMuscle.pref.seurat,file="naive_instructed_scRNA_ESCs.RData")
  • 19.
    Integrate scATAC-seq dataset with scRNA-seq dataset.
    • a.
      Prepare R library for integration.
      library(Signac)
      library(Seurat)
      library(GenomeInfoDb)
      library(EnsDb.Mmusculus.v79)
      library(patchwork)
      library(ggplot2)
      set.seed(1234)
    • b.
      Load datasets.
      load("aPSM_scRNA.RData")
      load("aPSM_scATAC.RData")
    • c.
      Infer relations between scRNA-seq and scATAC-seq.
      DefaultAssay(aPSM_atac) <- 'RNA'
      ncol(aPSM_atac)
      transfer.anchors <- FindTransferAnchors(
       reference = aPSM, query = aPSM_atac, k.anchor = 20,
       k.filter = 200, reduction = 'cca', dims = 1:30)
      predicted.labels <- TransferData(
       anchorset = transfer.anchors,
       refdata = aPSM$seurat_clusters,
       weight.reduction = aPSM_atac[['lsi']],
       dims = 2:30)
      save(transfer.anchors,file="transfer.anchors_aPSM_atac.RData")
      aPSM_atac <- AddMetaData(object =aPSM_atac, metadata = predicted.labels)
      save(aPSM_atac,file="aPSM_atac_meta.RData")
    • d.
      Visualize the clusters of the integrated datasets.
      DimPlot(object = aPSM_atac, label = F,reduction = 'umap',group.by ='predicted.id' ) +labs(title = " aPSM scATAC")
      DimPlot(object = aPSM, label = F,reduction = 'umap') +labs(title = " aPSM")

Part 2: Multiomics analysis

Inline graphicTiming: 1 h(for step 20 and step 21)

In this section, we describe major steps on how to perform multimodal analysis.

  • 20.
    Data preprocessing.
    • a.
      Load the libraries and setup working directory.
      library(Seurat)
      library(Signac)
      library(patchwork)
      library(monocle3)
      library(SeuratWrappers)
      library(EnsDb.Mmusculus.v79)
      library(GenomeInfoDb)
      library(ggplot2)
      library(dplyr)
      set.Seed(1234)
      setwd(getwd())
    • b.
      Load snRNA and snATAC data and create Seurat object.
      Star.data <- Read10x(data.dir = " ./multiomics/filtered_feature_bc_matrix/”)
      # Extract RNA and ATAC data
      rna_counts <- Star.data$`Gene Expression`
      atac_counts <- Star.data$Peaks
      # Create Seurat object containing snRNA data
      Star <- CreateSeuratObject(counts = rna_counts, project = "Star", min.cells=5, min.features = 100, assay = "RNA")
      Inline graphicCRITICAL: HIFLR_snRNA_barcodes.tsv.gz, HIFLR_snRNA_features.tsv.gz, and HIFLR_snRNA_matrix.mtx.gz are the files generated by CellRanger-arc v2.0.0. Files should be kept in the same folder, named as filtered_feature_bc_matrix.
    • c.
      Load snATAC-seq fragments files.
      grange.counts <- StringToGRanges(rownames(atac_counts), sep = c(":", "-"))
      grange.use <- seqnames(grange.counts) %in% standardChromosomes(grange.counts)
      atac_counts <- atac_counts[as.vector(grange.use), ]
    • d.
      Add annotation.
      annotation <- GetGRangesFromEnsDb(ensdb = EnsDb.Mmusculus.v79)
      seqlevelsStyle(annotation) <- "UCSC"
      genome(annotation) <- "mm10"
    • e.
      Create snATAC assay.
      fragpath <- " ./multiomics/filtered_feature_bc_matrix/fragments.tsv.gz"
      Star[["ATAC"]] <- CreateChromatinAssay(counts = atac_counts, sep = c(":", "-"), genome = 'mm10', fragments = fragpath, min.cells = 10, annotation = annotation)
    • f.
      Downsize the dataset.
      set.seed(111)
      Star <- subset(x = Star, downsample = 6000)
      save(Star, file="Star_ds6k.RData")
      load("./Star_ds6k.RData")
      Inline graphicCRITICAL: To load the snATAC-seq fragments file properly, fragments.tsv.gz.tbi file is required to be in the same folder.
      Inline graphicCRITICAL: Use only peaks in standard chromosomes and set sequence level style as UCSC.
    • g.
      Quality control:
      • i.
        Calculate percentage of mitochondrial genes in snRNA-seq.
      • ii.
        Compute both TSS enrichment score and nucleosome signal metrics in Signac for snATAC-seq (Figure 5A).
        DefaultAssay(Star) <- "RNA"
        Star[["percent.mt"]] <- PercentageFeatureSet(Star, pattern = "ˆmt-")
        VlnPlot(Star, features = c("nCount_RNA", "nFeature_RNA", "percent.mt"), ncol = 3, log = TRUE, pt.size = 0) + NoLegend()
        DefaultAssay(Star) <- "ATAC"
        Star <- NucleosomeSignal(Star)
        Star <- TSSEnrichment(object=Star, fast=FALSE)
        VlnPlot(Star, features = c("nCount_ATAC", "nFeature_ATAC", "TSS.enrichment", "nucleosome_signal"), ncol = 4, log = TRUE, pt.size = 0) + NoLegend()
        Note: Low-quality cells refer to potential damaged cells, empty droplets, cell doublets, or multiplets.
      • iii.
        Remove low quality cells (Figure 5B).
        Star <- subset(x = Star,
         subset = nCount_RNA < 100000 &
         nCount_RNA > 1200 &
         nCount_ATAC < 1e5 &
         nCount_ATAC > 1e2 &
         nucleosome_signal < 2.5 &
         TSS.enrichment > 3 &
         Percent.mt < 10)
        saveRDS(Star, file="Star.RData")
        VlnPlot(Star, features = c("nCount_RNA", "nFeature_RNA", "percent.mt"), ncol = 3, log = TRUE, pt.size = 0) + NoLegend()
        VlnPlot(Star, features = c("nCount_ATAC", "nFeature_ATAC", "TSS.enrichment", "nucleosome_signal"), ncol = 4, log = TRUE, pt.size = 0) + NoLegend()
        Inline graphicCRITICAL: The filtering criteria are dataset specific. Chose a cutoff to avoid losing unique cell populations or to include noisy cells.
  • 21.
    WNN analysis.
    • a.
      Perform normalization and dimensional reduction of snRNA-seq and snATAC-seq assays independently and individually.
      # snRNA analysis
      DefaultAssay(Star) <- "RNA"
      Star <- SCTransform(Star, verbose = FALSE) %>% RunPCA() %>% RunUMAP(dims = 1:30, reduction.name = 'umap', reduction.key = 'UMAP_')
      # snATAC analysis
      DefaultAssay(Star) <- "ATAC"
      Star <- RunTFIDF(Star)
      Star <- FindTopFeatures(Star, min.cutoff = 'q0')
      Star <- RunSVD(Star)
      Star <- RunUMAP(Star, reduction = 'lsi', dims = 2:30,
      reduction.name = "umap.atac", reduction.key = "atacUMAP_")
      Note: In snATAC-seq assay, the first dimension is typically correlated with sequencing depth rather than biological variation. It is thus excluded in the UMAP computing.
    • b.
      Learn cell-specific modality weights and construct a WNN graph.
      Star <- FindMultiModalNeighbors(Star, reduction.list = list("pca", "lsi"),dims.list = list(1:30, 2:30))
      Star <- RunUMAP(Star, nn.name = "weighted.nn",
       reduction.name = "umap.wnn", reduction.key ="wnnUMAP_")
      Star <- FindClusters(Star, graph.name = "wsnn",
       resolution = 0.8, algorithm = 3, verbose = FALSE)
    • c.
      Visualize the clusters. (Figure 6A).
      p1 <- DimPlot(Star, reduction = "umap", group.by = "seurat_clusters", label = TRUE, label.size = 8, repel = TRUE) + ggtitle("RNA")
      p2 <- DimPlot(Star, reduction = "umap.atac",group.by = "seurat_clusters", label = TRUE, label.size = 8, repel = TRUE) + ggtitle("ATAC")
      p3 <- DimPlot(Star, reduction = "umap.wnn", group.by = "seurat_clusters", label = TRUE, label.size = 8, repel = TRUE) + ggtitle("WNN")
      p1+p2+p3 &NoLegend()
    • d.
      snRNA-seq analysis: Characterization and annotation of cell states are achieved by identifying marker genes via differential expression analysis in both pseudotemporal ordering identified clusters and WNN clusters. Cell types are defined using known gene markers. For example, Myod1, Myog, and Myf5 are myogenic markers and Ascl1, Neurod4, and Nhlh1 are neurogenic markers. Pax7 drives both myogenesis and neurogenesis. Meis1 and Pbx1 are anterior presomitic mesoderm (aPSM) markers. As an example, here we analyze myogenic genes Myod1 and Myog.
      • i.
        Find markers.
        DefaultAssay(Star) <- "RNA"
        Star.rna.markers <- FindAllMarkers(Star, assay = "RNA", only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)
        Star.rna.markers %>%
         group_by(cluster) %>%
         top_n(n = 2, wt = avg_log2FC)
      • ii.
        Add cell states annotations.
        Star <- RenameIdents(Star, '10' = 'cell_5','11' = 'cell_2')
        Star <- RenameIdents(Star, '5' = 'cell_4','6' = 'cell_1','7' = 'cell_1','8' = 'cell_3','9' = 'cell_5')
        Star <- RenameIdents(Star, '0' = 'cell_1','1' = 'cell_2','2' = 'cell_1','3' = 'cell_1','4' = 'cell_3')
        Star$celltype <- Idents(Star)
        Inline graphicCRITICAL: Cell states can be assigned with known markers. Writing the Star.rna.markers into a file and studying the markers potentially used to annotate the clusters would be recommended.
      • iii.
        Visualize the cell states (Figure 6D).
        p1 <- DimPlot(Star, reduction = "umap", group.by = " celltype", label = FALSE, label.size = 8, repel = TRUE) + ggtitle("RNA")
        p2 <- DimPlot(Star, reduction = "umap.atac",group.by = " celltype", label = FALSE, label.size = 8, repel = TRUE) + ggtitle("ATAC")
        p3 <- DimPlot(Star, reduction = "umap.wnn", group.by = " celltype", label = FALSE, label.size = 8, repel = TRUE) + ggtitle("WNN")
        p1+p2+p3
    • e.
      snATAC-seq analysis.
      • i.
        Load libraries.
        library(chromVAR)
        library(motifmatchr)
        library(JASPAR2020)
        library(TFBSTools)
        library(BSgenome.Mmusculus.UCSC.mm10)
      • ii.
        Find snATAC markers.
        DefaultAssay(Star) <- "ATAC"
        Star.atac.markers <- FindAllMarkers(Star, assay = "ATAC", only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)
        Star.atac.markers %>%
         group_by(cluster) %>%
         top_n(n = 2, wt = avg_log2FC)
      • iii.
        Add motif information.
        pwm_set <- getMatrixSet(x = JASPAR2020, opts = list(collection = "CORE", tax_group = 'vertebrates', all_versions = FALSE))
        Star <- AddMotifs(
         object = Star,
         genome = BSgenome.Mmusculus.UCSC.mm10,
         pfm = pwm_set,
         assay="ATAC")
      • iv.
        Computer motif activities.
        Star <- RunChromVAR(
         object = Star,
         genome = BSgenome.Mmusculus.UCSC.mm10)

Figure 5.

Figure 5

Multiomics data quality control

(A) snRNA and snATAC QC plot before removing low quality cells.

(B) snRNA and snATAC QC plot after removing low quality cells.

Figure 6.

Figure 6

Characterization and annotation of cell states

(A) UMAP visualization of the clustering based on snRNA-seq, snATAC-seq, and WNN analysis before cell state annotation.

(B) Pseudotime single cell trajectory plot. The heatmap represents units of progress, with 1 located at the root of the trajectory.

(C) Cell states derived from pseudotime trajectory inference. State 2 and state 4 are marked NA (not assigned), since they may represent transitioning states and could not be unambiguously assigned to a specific developmental stage.

(D) UMAP visualization of cell states after annotated clustering.

Part 2: Data visualization

Inline graphicTiming: 1 h(for step 22)

In this section, we describe steps to do data visualization.

  • 22.
    Pseudotime analysis:
    • a.
      Convert Seurat object to Monocle object.
      DefaultAssay(Star) <- "RNA"
      set.seed(22)
      cds <- SeuratWrappers::as.cell_data_set(Star, assay = "RNA", reduction = "umap", group.by = "celltype")
      cds@rowRanges@elementMetadata@listData[["gene_short_name"]] <- rownames(Star[["RNA"]])
    • b.
      Create CDS object.
      cds <- preprocess_cds(cds, method = "PCA")
      cds <- reduce_dimension(cds, preprocess_method = "PCA",umap.n_neighbors= 14L, reduction_method = "UMAP")
      cds <- cluster_cells(cds, reduction_method = "UMAP")
      cds <- learn_graph(cds, use_partition = FALSE, close_loop = FALSE)
    • c.
      Set the root with Seurat clusters 0 and order cells.
      cell_ids <- colnames(cds)[Star$seurat_clusters == "0"]
      closest_vertex <- cds@principal_graph_aux[["UMAP"]]$pr_graph_cell_proj_closest_vertex
      closest_vertex <- as.matrix(closest_vertex[colnames(cds), ])
      closest_vertex <- closest_vertex[cell_ids, ]
      closest_vertex <- as.numeric(names(which.max(table(closest_vertex))))
      mst <- principal_graph(cds)$UMAP
      root_pr_nodes <- igraph::V(mst)$name[closest_vertex]
      rowData(cds)$gene_name <- rownames(cds)
      rowData(cds)$gene_short_name <- rowData(cds)$gene_name
      cds <- order_cells(cds, root_pr_nodes = root_pr_nodes)
    • d.
      Visualize trajectory plot (Figure 6B).
      plot_cells(cds, color_cells_by = "pseudotime",
         label_cell_groups =T, label_leaves = F,
         label_branch_points = F,show_trajectory_graph = T,
         graph_label_size = 3, label_groups_by_cluster = T)
    • e.
      Visualize cell states derived from trajectory inference (Figure 6C).
      plot_cells(cds, color_cells_by = "cluster", cell_size = 1,
         label_cell_groups = TRUE, group_label_size = 4,
         show_trajectory_graph = FALSE,
         label_branch_points = FALSE,
         label_roots = FALSE,
         label_leaves = FALSE)
    • f.
      Visualize paired-plots expression of Myod1 and Myog (Figure 7A).
      Star.seur <- as.Seurat(cds, assay = NULL, clusters = "UMAP")
      Star.seur <- AddMetaData(Star.seur,metadata= cds@principal_graph_aux$UMAP$pseudotime,
      col.name = "monocle3_pseudotime")
      FeaturePlot(Star.seur,features = c("Myod1","Myog"),
         reduction ="UMAP",combine = T,
         blend = TRUE, blend.threshold = 0.0,
         min.cutoff = 0,max.cutoff = 6)
    • g.
      Visualize Footprinting plots (Figure 7B).
      Star_135 <- subset(x = Star, idents = c("cell_1", "cell_3", "cell_5"), invert = FALSE)
      DefaultAssay(Star_135) <- "ATAC"
      Star_135 <- Footprint(
       object = Star_135,
       motif.name = c("MYOG", "MYOD1"),
       genome = BSgenome.Mmusculus.UCSC.mm10)
      PlotFootprint(Star_135, features = c("MYOD1")) + patchwork::plot_layout(ncol = 1)
      PlotFootprint(Star_135, features = c("MYOG")) + patchwork::plot_layout(ncol = 1)
      Note: Cell_1 is aPSM cells, Cell_3 is a neurogenic cluster, and Cell_5 is a myogenic cluster.

Figure 7.

Figure 7

Visualization of myogenic cells

(A) Individual and paired-plots expression of Myod1 and Myog in cell states derived from pseudotime trajectory inference.

(B) Myod1 and Myog footprinting profile in aPSM, neurogenic and myogenenic clusters.

Expected outcomes

This protocol provides a resource to profile transcriptional and chromatin accessibility features of pluripotent, mesoderm-induced ESCs and ESC-derived cell lineages. Expression profiles and chromatin accessibility are determined for each developmental timepoint. Transcriptomics changes across differentiation time points are revealed through integrating pipeline of individual scRNA-seq datasets (protocol 1:integrated data analysis-step1), and correlation between transcriptomic expression and chromatin accessibility through integrating pipeline between scRNA-seq and scATAC-seq datasets (protocol 1:integrated data analysis-step2). In addition, multiomics datasets can be visualized and inferred through multiomics analysis pipeline (protocol 2: omics analysis).

Limitations

The protocols are based on R library called Seurat under R-R studio schema. If users need to run the protocols in high-performance computing environments, they require R batch modules such as Swarm. Furthermore, the parameters of data integration are decided by the heuristic hyperparameter tuning for the datasets under specific time points. Therefore, we need to develop an automatic tuning module to explore optimal hyperparameters for new datasets. In addition, users can compare the outputs from these protocols with results from other single cell packages such as SCANPY, if a module to convert schema between R and Python is developed.

Troubleshooting

Problem 1

Unable to run the docker image with docker desktop.

Potential solution

In the software preparation step, it is important to follow the steps in Docker_manual_mac.docx or Docker_manual_windowOS.docx and set up the docker desktop environment properly.

Problem 2

R packages cannot be loaded by “library” command.

Potential solution

Run the codes in R environment below:

p <- installed.packages()

colnames(p)

If the packages cannot be found after running the codes, visit the Bioconductor website (https://www.bioconductor.org/), search a package, and follow guidelines. If the package cannot be found in Bioconductor, run install.packages(“package_name”) in R environment. More details and examples can be found in Software_preparation.R.

Problem 3

Data files cannot be loaded.

Potential solution

Check whether the files are in the folder. If they are, check their name.

Problem 4

Monocle3 failed to be installed.

Potential solution

  • Install the monocle3: Monocle3 runs in the R statistical computing environment. R version 4.2.2 or higher will be needed.

  • Install a few Bioconductor dependencies that aren’t automatically installed.

BiocManager::intall(c('BiocGenerics', 'DelayedArray', 'DelayedMatrixStats','limma', 'lme4', 'S4Vector', 'SingleCellExperiment', 'SummarizedExperiment', 'batchelor', 'Matrix.utils', 'HDF5Array', 'terra', 'ggrastr'))

  • Install monocle3 through the cole-trapnell-lab GitHub: To ensure the monocle3 was installed correctly, start a new R session, and run.

install.packages('devtools')

devtools::install_github('cole-trapnell-lab/monocle3')

library(monocle3)

Inline graphicCRITICAL: monocle3 installation is tricky. Some troubleshooting will be found at cole-trapnell-lab GitHub ( https://cole-trapnell-lab.github.io/monocle3/docs/installation)

Problem 5

Plots cannot be drawn.

Potential solution

Run the codes in R environment below:

gg2 <- try(find.package("ggplot2"), silent = TRUE)

gg2

If the packages cannot be found after running the codes, run install.packages(“ggplot2”) in R environment.

Resource availability

Lead contact

Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Vittorio Sartorelli (sartorev@mail.nih.gov).

Materials availability

This study did not generate unique reagents.

Data and code availability

Original data and codes have been deposited to Zenodo: https://doi.org/10.5281/zenodo.7224723.

Acknowledgments

We thank the NIAMS Genomic Technology, Biodata Mining and Discovery, Flow Cytometry, and Light Imaging Sections. Dr. Hong-Wei Sun and Dr. Stephen Brooks (Biodata Mining and Discovery Section) provided useful suggestions for data analysis. This study utilized the high-performance computational capabilities of the Helix Systems at the NIH, Bethesda, MD, USA (https://helix.nih.gov/). This work was supported in part by the Intramural Research Program of the NIAMS at the NIH (grants AR041126 and AR041164 to V.S.).

Author contributions

K.D.K. and K.J. analyzed and interpreted data and drafted the manuscript. S.D.O. and V.S. edited the manuscript and supervised the project.

Declaration of interests

The authors declare no competing interests.

Contributor Information

Kyung Dae Ko, Email: kyungdae.ko@nih.gov.

Kan Jiang, Email: kan.jiang@nih.gov.

Vittorio Sartorelli, Email: vittorio.sartorelli@nih.gov.

References

  • 1.Khateb M., Perovanovic J., Ko K.D., Jiang K., Feng X., Acevedo-Luna N., Chal J., Ciuffoli V., Genzor P., Simone J., et al. Transcriptomics, regulatory syntax, and enhancer identification in mesoderm-induced ESCs at single-cell resolution. Cell Rep. 2022;40:111219. doi: 10.1016/j.celrep.2022.111219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.RStudio Team . RStudio. Integrated Development for R; 2022. [Google Scholar]
  • 3.Stuart T., Butler A., Hoffman P., Hafemeister C., Papalexi E., Mauck W.M., 3rd, Hao Y., Stoeckius M., Smibert P., Satija R. Comprehensive integration of single-cell data. Cell. 2019;177:1888–1902.e21. doi: 10.1016/j.cell.2019.05.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Stuart T., Srivastava A., Madad S., Lareau C.A., Satija R. Single-cell chromatin state analysis with Signac. Nat. Methods. 2021;18:1333–1341. doi: 10.1038/s41592-021-01282-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Korsunsky I., Millard N., Fan J., Slowikowski K., Zhang F., Wei K., Baglaenko Y., Brenner M., Loh P.R., Raychaudhuri S. Fast, sensitive and accurate integration of single-cell data with harmony. Nat. Methods. 2019;16:1289–1296. doi: 10.1038/s41592-019-0619-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Cao J., Spielmann M., Qiu X., Huang X., Ibrahim D.M., Hill A.J., Zhang F., Mundlos S., Christiansen L., Steemers F.J., et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature. 2019;566:496–502. doi: 10.1038/s41586-019-0969-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Fornes O., Castro-Mondragon J.A., Khan A., van der Lee R., Zhang X., Richmond P.A., Modi B.P., Correard S., Gheorghe M., Baranašić D., et al. Jaspar 2020: update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 2020;48:D87–D92. doi: 10.1093/nar/gkz1001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Tan G., Lenhard B. TFBSTools: an R/bioconductor package for transcription factor binding site analysis. Bioinformatics. 2016;32:1555–1556. doi: 10.1093/bioinformatics/btw024. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Original data and codes have been deposited to Zenodo: https://doi.org/10.5281/zenodo.7224723.


Articles from STAR Protocols are provided here courtesy of Elsevier

RESOURCES