Integrating single-cell transcriptomes, chromatin accessibility, and multiomics analysis of mesoderm-induced embryonic stem cells

Kyung Dae Ko; Kan Jiang; Stefania Dell’Orso; Vittorio Sartorelli

doi:10.1016/j.xpro.2023.102307

. 2023 May 15;4(2):102307. doi: 10.1016/j.xpro.2023.102307

Integrating single-cell transcriptomes, chromatin accessibility, and multiomics analysis of mesoderm-induced embryonic stem cells

Kyung Dae Ko ^1,^4,^5,^∗, Kan Jiang ^2,^4,^5,^∗∗, Stefania Dell’Orso ³, Vittorio Sartorelli ^1,^6,^∗∗∗

PMCID: PMC10199178 PMID: 37192048

Summary

Here, we present workflows for integrating independent transcriptomic and chromatin accessibility datasets and analyzing multiomics. First, we describe steps for integrating independent transcriptomic and chromatin accessibility measurements. Next, we detail multimodal analysis of transcriptomes and chromatin accessibility performed in the same sample. We demonstrate their use by analyzing datasets obtained from mouse embryonic stem cells induced to differentiate toward mesoderm-like, myogenic, or neurogenic phenotypes.

For complete details on the use and execution of this protocol, please refer to Khateb et al.¹

Subject areas: Bioinformatics, Computer sciences

Graphical abstract

Highlights

•
Integration of scRNA-seq and scATAC-seq from independent datasets
•
Multiomics of snRNA-seq and snATAC-seq from the same sample
•
Inference of cell states from multiomics pseudotime

Publisher’s note: Undertaking any experimental protocol requires adherence to local institutional guidelines for laboratory safety and ethics.

Before you begin

Hardware preparation

A computer with a MacOS or Window operation system and network connection is required. The RAM requirement depends on the number of cells to be analyzed. 16 GB RAM should be sufficient for an initial analysis. If more than 10,000 cells are analyzed, computer clusters over 32 GB with a Linux operation system are required.

Software preparation

Timing: 1 h(for step 1)

The applications described in this section are required for the analysis of single cell (sc)RNA- seq, single cell (sc)ATAC-seq analysis, integration of scRNA-seq and scATAC-seq datasets, and multiomics analysis.

1.
Prepare docker environment troubleshooting 1.
For scRNA-seq, scATAC-seq, multiomics analysis, and data integration, single-cell analysis tools in R platform are required. To avoid conflicts of R libraries installation, docker developing environment is used.
- a.
  Access docker webpage (https://www.docker.com/) and install the latest version of Docker Desktop.
- b.
  Pull docker image from docker hub.
  > docker pull holyone70/mesoderm_pipeline:mesoderm_pipeline
- c.
  Run docker image to prepare R developing environment.
  > docker run -e PASSWORD=rstudio -p 8787:8787 --name mesoderm_pipeline holyone70/mesoderm_pipeline:mesoderm_pipeline
- d.
  Run web browser and put local address of R server (http://localhost:8787).
- e.
  Put username (rstudio) and password (rstudio).
- f.
  Check availability of R packages troubleshooting 2.
- g.
  Check availability of data files troubleshooting 3.
- h.
  Scripts order for running.
  - i.
    scRNA_analysis.R.
  - ii.
    scATAC_analysis.R.
  - iii.
    int_scRNA.R.
  - iv.
    int_scRNA_scATAC.R.
  - v.
    multiomics_anal.R.
- i.
  How to run the R scripts.
  - i.
    Clean the environment by clicking the broom symbol located at upright corner.
  - ii.
    Go to file → open the star_protocol project → select the R script of interest to run at the File, Packages, Help panel.
    Note: To pull and run docker image, terminal is used in Mac OS and LINUX, and power shell is used in Window OS. Command lines to download data were located on the top of each R script, which were commented.
    
    CRITICAL: To successfully run Multiomics_anal.R, minimal the Docker resources requirement would be 32 g memory, 8 CPU and 4 GB swap.

Data collection

Timing:30min(for step2)

Single cell datasets analyzed in this protocol were deposited into GEO repository (GSE198730). scRNA-seq datasets consist of barcodes, features, and matrix, and scATAC-seq datasets contain barcodes, fragments, matrix, and peaks. The time points for each dataset are described in Figure 1.

Scheme illustrating ESCs differentiation time points

Key resources table

REAGENT or RESOURCE	SOURCE	IDENTIFIER
Deposited data

Single cell RNA-seq datasets	Khateb et al.¹	https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE198730 (GSE198730_aPSM scRNA_rep1_barcodes.tsv.gz , GSE198730_aPSM _scRNA_rep1_features.tsv.gz, GSE198730_aPSM _scRNA_rep1_matrix.mtx.gz)
Single cell ATAC-seq datasets	Khateb et al.¹	https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE198730 (aPSM _scATAC_rep1_filtered_peak_bc_matrix.h5)
Single cell omics datasets	Khateb et al.¹	https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE198730 (GSE198730_HIFLR_snRNA_barcodes.tsv.gz, GSE198730_HIFLR _snRNA_features.tsv.gz, GSE198730_HIFLR_snRNA_matrix.mtx.gz, GSE198730_HIFLR_snATAC_fragments.tsv.gz GSE198730_HIFLR_snATAC_fragments.tsv.gz.tbi.gz)
Github repository	Single cell RNA-seq, Single cell ATAC-seq, Multiome single nuclei ATAC and gene expression	https://github.com/LMSCGR/mesoderm_induced_ESCs_pipeline (HIFLR_snATAC_fragments.tsv.gz.tbi,aPSM_fragments.tsv.gz.tbi,cell_cycle.txt,naive_instructed_esc.csv,aPSM_f.txt)

Software and algorithms

BioRender		https://biorender.com/
R v4.2.2	The R Project for Statistical Computing	https://www.r-project.org/
RStudio server (v 2022.12.0+353)	RStudio Team²	https://posit.co/
Seurat v4.3.0	Stuart et al.³	https://cran.r-project.org/web/packages/Seurat/index.html
Signac v1.9.0	Stuart et al.⁴	https://satijalab.org/signac
Harmony v0.1.1	Korsunsky et al.⁵	https://github.com/immunogenomics/harmony
Monocle3 v1.3.1	Cao et al.⁶	https://cole-trapnell-lab.github.io/monocle3/
JASPAR 2020 v 0.99.10	Fornes et al.⁷	https://bioconductor.org/packages/release/data/annotation/html/JASPAR2020.html
TFBSTools v 1.36.0	Tan and Lenhard⁸	https://bioconductor.org/packages/release/bioc/html/TFBSTools.html
SeuratWrappers v0.3.1		https://github.com/satijalab/seurat-wrappers

Other

Local computer – memory: 16GB required, 32GB recommended; processors: 4 required, 8 recommended	N/A	N/A

Open in a new tab

Step-by-step method details

Part 1: Single cell RNA seq analysis

Timing: 1 h(for step 1 to step 9)

In this section, we describe essential steps to analyze scRNA-seq datasets.

1.
Load datasets using Seurat package troubleshooting 4.

library(dplyr)

library(Seurat)

library(monocle3)

library(SeuratWrappers)

# for plotting

library(ggplot2)

library(patchwork)

set.seed(1234)

aPSM.matrix <- Read10X(data.dir ="./aPSM_scRNA/filtered_feature_bc_matrix/")

2.
Create Seurat object.

aPSM <- CreateSeuratObject(counts = aPSM.matrix, min.cells = 3, min.features = 200, project = “ aPSM”)

Note: Options for min.cell and min.features are selected as default values from Seurat tutorials (https://satijalab.org/seurat/articles/pbmc3k_tutorial.html). File names should be barcodes.tsv.gz, features.tsv.gz, and matrix.mtx.gz. For Window OS, “.\\filtered_feature_bc_matrix\\ can be used.

3.
Select cells for the analysis through quality control (QC) (Figure 2A). Troubleshooting 5.

aPSM[["percent.mt"]] <- PercentageFeatureSet(aPSM, pattern = "ˆmt-")

VlnPlot(aPSM, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)

plot1 <- FeatureScatter(aPSM, feature1 = "nCount_RNA", feature2 = "percent.mt")

plot2 <- FeatureScatter(aPSM, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")

plot1+plot2

aPSM <- subset(aPSM, subset = nFeature_RNA > 0 & nFeature_RNA < 8000 & percent.mt < 20)

4.
Preprocess data and select features for the analysis.

aPSM <- NormalizeData(object = aPSM, normalization.method = "LogNormalize", scale.factor = 1e4)

aPSM <- FindVariableFeatures(aPSM, selection.method = "vst", nfeatures = 2000)

aPSM_top10 <- head(VariableFeatures(aPSM), 10)

plot1 <- VariableFeaturePlot(aPSM)

plot2 <- LabelPoints(plot = plot1, points = aPSM_top10, repel = TRUE)

plot1+plot2

aPSM.all.genes <- rownames(aPSM)

aPSM <- ScaleData(aPSM, features = aPSM.all.genes)

5.
Filter cell cycle genes.

convertHumanGeneList <- function(x){

require("biomaRt")

human <- useMart("ensembl", dataset = "hsapiens_gene_ensembl" , host = "https://dec2021.archive.ensembl.org/")

mouse <- useMart("ensembl", dataset = "mmusculus_gene_ensembl" ,host = "https://dec2021.archive.ensembl.org/")

tmp <- getLDS(attributes = c("hgnc_symbol"), filters = "hgnc_symbol", values = x , mart = human, attributesL = c("mgi_symbol"), martL = mouse, uniqueRows=TRUE)

mousex <- unique(tmp[,2])

return(mousex)}

s.genes <- convertHumanGeneList(cc.genes.updated.2019$s.genes)

g2m.genes <- convertHumanGeneList(cc.genes.updated.2019$g2m.genes)

cell_cycle <- t(read.csv(file="aPSM_scRNA/cell_cycle.txt",header=F))[,1]

filtered_genes <- c(s.genes,cell_cycle)

6.
Filter cell cycle genes, reduce dimensions and establish dataset dimensionality (Figure 2B).

aPSM <- RunPCA(object = aPSM, features = VariableFeatures(object = aPSM), verbose = FALSE)

aPSM <- CellCycleScoring(aPSM, s.features = filtered_genes, g2m.features = g2m.genes, set.ident = TRUE)

aPSM <- ScaleData(aPSM, vars.to.regress = c("S.Score", "G2M.Score"), features = rownames(aPSM))

aPSM <- JackStraw(aPSM, num.replicate = 100)

aPSM <- ScoreJackStraw(aPSM, dims = 1:20)

ElbowPlot(object = aPSM,ndims =50)

7.
Cluster and visualize cells (Figure 3A).

aPSM <- FindNeighbors(object = aPSM, dims = 1:30)

aPSM <- FindClusters(object = aPSM, resolution = 0.25)

aPSM <- RunTSNE(object = aPSM, dims = 1:30)

aPSM <- RunUMAP(object = aPSM, dims = 1:30)

DimPlot(object=aPSM,reduction='umap',label=T)+labs(title = " aPSM")

save(aPSM,file="aPSM_scRNA.RData")

8.
Analyze unique features of each cluster.

aPSM.markers <- FindAllMarkers(aPSM, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)

aPSM.markers_table <- aPSM.markers %>%group_by(cluster) %>% slice_max(n = 20, order_by = avg_log2FC)

Note: Options for min.pct and logfc.threshold are selected as default values from Seurat tutorials (https://satijalab.org/seurat/articles/pbmc3k_tutorial.html).

9.
Visualize clusters pseudotime (Figure 3B).

DefaultAssay(aPSM) <- "RNA"

aPSM_cds <- as.cell_data_set(aPSM)

aPSM_cds <- cluster_cells(aPSM_cds,reduction="UMAP",k = 30,resolution = 0.00012)

aPSM_cds <- learn_graph(aPSM_cds, close_loop = F,use_partition = T,learn_graph_control =list(minimal_branch_len=5))

plot_cells(aPSM_cds, label_groups_by_cluster = T, label_leaves = F, label_branch_points = T,graph_label_size = 3)

aPSM.min.umap <- which.min(unlist(FetchData(aPSM, "UMAP_2")))

aPSM.min.umap <- colnames(aPSM)[aPSM.min.umap]

aPSM_cds <- order_cells(aPSM_cds, root_cells = aPSM.min.umap)

plot_cells(aPSM_cds, color_cells_by = "pseudotime", label_cell_groups =T, label_leaves = F, label_branch_points = F,show_trajectory_graph = T,graph_label_size = 3,label_groups_by_cluster = T)

scRNA-seq data quality control

(A) Violin plots of scRNA-seq data of aPSM scRNA-seq data. mRNA counts (nCount_RNA), number of detected genes(nFeature_RNA), mitochondria gene percentage (percent.mt).

(B) Elbowplot of aPSM scRNA-seq data describing the standard deviations of the principal components (PC).

Visualization of aPSM scRNA-seq data

(A) UMAP plot of aPSM scRNA-seq data.

(B) Pseudotime plot of aPSM scRNA-seq data. The heatmap represents units of progress, with 1 located at the root of the trajectory.

Part 1: Single cell ATAC seq analysis

Timing: 1 h(for step 10 to step 17)

In this section, we describe steps to evaluate chromatin accessibility using scATAC-seq.

10.
Load datasets using Signac package.

library(Signac)

library(Seurat)

library(GenomeInfoDb)

library(EnsDb.Mmusculus.v79)

library(patchwork)

set.seed(1234)

aPSM.counts <- Read10X_h5("./aPSM_scATAC/ filtered_peak_bc_matrix.h5")

aPSM_meta <- read.table("./aPSM_scATAC/singlecell.csv.gz", sep = ",", header = TRUE, row.names = 1)

aPSM_chrom_assay <- CreateChromatinAssay(

counts = aPSM.counts,

sep = c(":","-"),

genome = 'mm10',

fragments = './aPSM_scATAC/filtered_feature_bc_matrix/fragments.tsv.gz', min.cells = 3, min.features = 100)

Note: Options for min.cell and min.features are selected as default values from Signac tutorials (https://stuartlab.org/signac/articles/pbmc_vignette.html).

11.
Create Seurat object.

aPSM_atac <- CreateSeuratObject(

counts = aPSM_chrom_assay,

assay = 'aPSM_peaks',

project = 'aPSM_atac',

meta.data = aPSM_meta[colnames(aPSM_chrom_assay),])

12.
Add annotation information.

annotations <- GetGRangesFromEnsDb(ensdb = EnsDb.Mmusculus.v79)

# change to UCSC style since the data was mapped to mm10

seqlevelsStyle(annotations) <- 'UCSC'

genome(annotations) <- "mm10"

# add the gene information to the object

Annotation(aPSM_atac) <- annotations

13.
Select cells for analysis through QC (Figure 4A).

aPSM_atac <- NucleosomeSignal(object = aPSM_atac)

# compute TSS enrichment score per cell

aPSM_atac <- TSSEnrichment(object = aPSM_atac, fast = FALSE)

aPSM_atac$pct_reads_in_peaks <- aPSM_atac$peak_region_fragments / aPSM_atac$passed_filters ∗ 100

aPSM_atac$blacklist_ratio <- aPSM_atac$blacklist_region_fragments / aPSM_atac$peak_region_fragments

VlnPlot(

object = aPSM_atac,

features = c('pct_reads_in_peaks', 'peak_region_fragments',

'TSS.enrichment', 'blacklist_ratio', 'nucleosome_signal'),pt.size = 0.1, ncol = 5)

14.
Preprocess data and select features for the analysis.

FeatureScatter(aPSM_atac, feature1 = "peak_region_fragments", feature2 = "nCount_aPSM_peaks")

aPSM_atac <- subset(

x = aPSM_atac,

subset = peak_region_fragments > 2586 &

peak_region_fragments < 20000 & pct_reads_in_peaks > 15 & blacklist_ratio < 0.05)

ncol(aPSM_atac)

VlnPlot(

object = aPSM_atac,

features = c('nucleosome_signal','peak_region_fragments'),pt.size = 0.1) + NoLegend()

FeatureScatter(aPSM_atac, feature1 = "peak_region_fragments", feature2 = "nCount_aPSM_peaks")

15.
Reduce dimensions.

aPSM_atac <- RunTFIDF(aPSM_atac)

aPSM_atac <- FindTopFeatures(aPSM_atac, min.cutoff = 'q0')

aPSM_atac <- RunSVD(

object = aPSM_atac, assay = 'aPSM_peaks',

reduction.key = 'LSI_', reduction.name = 'lsi')

16.
Cluster and visualize cells (Figure 4B).

library(ggplot2)

aPSM_atac <- RunUMAP(object = aPSM_atac, reduction = 'lsi', dims = 1:40)

aPSM_atac <- RunTSNE(object = aPSM_atac, reduction = 'lsi', dims = 1:40)

aPSM_atac <- FindNeighbors(object = aPSM_atac, reduction = 'lsi', dims = 1:40)

aPSM_atac <- FindClusters(object = aPSM_atac, verbose = FALSE,resolution=0.25)

DimPlot(object = aPSM_atac, label = F,reduction = 'umap') +labs(title = " aPSM scATAC")

17.
Calculate gene activities and add them to Seurat object.

aPSM_gene.activities <- GeneActivity(aPSM_atac)

save(aPSM_gene.activities,file="aPSM_atac_gene.activities.RData")

# add the gene activity matrix to the Seurat object as a new assay and normalize it

aPSM_atac[['RNA']] <- CreateAssayObject(counts = aPSM_gene.activities)

aPSM_atac <- NormalizeData(

object = aPSM_atac, assay = 'RNA', normalization.method = 'LogNormalize',scale.factor = median(aPSM_atac$nCount_RNA) )

save(aPSM_atac,file="aPSM_scATAC.RData")

Quality control and visualization of integration between scATAC and scRNA-seq data

(A) Features distribution of aPSM scATAC-seq data.

(B) UMAP plot of aPSM scATAC-seq datasets.

(C) Features distribution between naïve and instructed scRNA-seq datasets.

(D) UMAP plots before and after Harmony integration.

Part 1: Integrated data analysis

Timing:1h(for step 18 to step 19)

In this section, we describe steps to integrate and analyze data from different platforms. Users can infer the changes of gene expressions during time points or relations between gene expressions and chromatin accessibility during cellular differentiation.

18.
Integrate single cell RNA seq datasets.
- a.
  Prepare R library for the integration.
  library(dplyr)
  
  library(Seurat)
  
  library(harmony)
  
  library(data.table)
  
  library(parallel)
  
  set.seed(1234)
  
  # Set number of cores to use
  
  NCORES = 1
  
  meta <- fread("naive_instructed_esc.csv")
- b.
  Load datasets.
  data_dir <- list("./naive_scRNA/","./instructed_scRNA/")
  
  mat.list <- list()
  
  soupx.used <- list()
  
  for(i in 1:length(data_dir)){
  
  mat.list[[i]] <- Read10X(data.dir = paste0(data_dir[i], 'filtered_feature_bc_matrix'))
  
  soupx.used[[i]] <- F}
  
  cat(sum(unlist(lapply(mat.list, ncol))),"cells (total) loaded...\n")
  
  sample_num<-min(ncol(mat.list[[1]]),ncol(mat.list[[2]]))
  
  sel.id<-sample(colnames(mat.list[[2]]), size=sample_num, replace=FALSE)
  
  mat.list[[2]]<-mat.list[[2]][,sel.id]
  Note: Files should be in filtered_feature_bc_matrix folder, and file names should be barcodes.tsv.gz, features.tsv.gz, and matrix.mtx.gz.
- c.
  Create Seurat objects.
  seu.list <- list()
  
  seu.list <- mclapply(
  
  mat.list,
  
  FUN = function(mat){
  
    return(CreateSeuratObject(
  
     counts = mat, min.features = 200, min.cells = 3,project = 'naive_instructed_data'))
  
  }, mc.cores = NCORES)
  
  for(i in 1:length(seu.list)){
  
  cat(' ------------------------------------\n',
  
    '--- Processing dataset number ', i, '-\n',
  
    '------------------------------------\n')
  
  # Add meta data
  
  for(md in colnames(meta)){
  
    seu.list[[i]][[md]] <- meta[[md]][i]
  
  }
  
  # add %MT
  
  seu.list[[i]][["percent.mt"]] <- PercentageFeatureSet(seu.list[[i]], pattern = "mt-")
  
  # Filter out low quality cells according to the metrics defined above
  
  seu.list[[i]] <- subset(seu.list[[i]],
  
        subset = nFeature_RNA > 1600 & nFeature_RNA < 8000 & percent.mt < 20)
  
  # Only mito and floor filtering; trying to find doublets
  
  }
  
  cat((sum(unlist(lapply(mat.list, ncol)))-sum(unlist(lapply(seu.list, ncol)))),"cells (total) removed...\n")
- d.
  Preprocess Seurat objects.
  seuPreProcess <- function(seu, assay='RNA', n.pcs=30, res=0.25){
  
  pca.name = paste0('pca_', assay)
  
  pca.key = paste0(pca.name,'_')
  
  umap.name = paste0('umap_', assay)
  
  seu = NormalizeData(
  
    seu
  
  ) %>% FindVariableFeatures(
  
    assay = assay,
  
    selection.method = "vst",
  
    nfeatures = 2000,
  
    verbose = F
  
  ) %>% ScaleData(
  
    assay = assay
  
  ) %>% RunPCA(
  
    assay = assay,
  
    reduction.name = pca.name,
  
    reduction.key = pca.key,
  
    verbose = F,
  
    npcs = n.pcs
  
  )
  
  n.pcs.use =n.pcs
  
  # FindNeighbors %>% RunUMAP, FindClusters
  
  seu <- FindNeighbors(
  
    seu,
  
    reduction = pca.name,
  
    dims = 1:n.pcs.use,
  
    force.recalc = TRUE,
  
    verbose = FALSE
  
  ) %>% RunUMAP(
  
    reduction = pca.name,
  
    dims = 1:n.pcs.use,
  
    reduction.name=umap.name
  
  )
  
  seu@reductions[[umap.name]]@misc$n.pcs.used <- n.pcs.use
  
  seu <- FindClusters(object = seu,resolution = res)
  
  seu[[paste0('RNA_res.',res)]] <- as.numeric(seu@active.ident)
  
  return(seu)
  
  }
  
  # preprocess each dataset individually
  
  seu.list <- lapply(seu.list, seuPreProcess)
- e.
  Merge datasets (Figure 4C).
  tmp.list <- list()
  
  for(i in 1:length(seu.list)){
  
  DefaultAssay(seu.list[[i]]) <- "RNA"
  
  tmp.list[[i]] <- DietSeurat(seu.list[[i]], assays = "RNA")
  
  }
  
  # merge tmp count matrices
  
  scMuscle.pref.seurat <- merge(
  
    tmp.list[[1]],
  
    y = tmp.list[[2]]
  
  )
  
  VlnPlot(
  
  scMuscle.pref.seurat,
  
  features = c(
  
    'nCount_RNA',
  
    'nFeature_RNA',
  
    'percent.mt'
  
  ),
  
  group.by = 'source',
  
  pt.size = 0
  
  )
- f.
  Preprocess merged data.
  # Seurat preprocessing on merged data ----
  
  DefaultAssay(scMuscle.pref.seurat) <- 'RNA'
  
  scMuscle.pref.seurat <-
  
  NormalizeData(
  
    scMuscle.pref.seurat, assay = 'RNA'
  
  ) %>% FindVariableFeatures(
  
    selection.method = 'vst',
  
    nfeatures = 2000,verbose = TRUE
  
  ) %>% ScaleData(
  
    assay = 'RNA',
  
    verbose = TRUE
  
  ) %>% RunPCA(
  
    assay = 'RNA',
  
    reduction.name = 'pca_RNA',
  
    reduction.key = 'pca_RNA_',
  
    verbose = TRUE,
  
    npcs = 50
  
  )
  
  ElbowPlot(scMuscle.pref.seurat, reduction = 'pca_RNA', ndims = 50)
- g.
  Find clusters for individual datasets.
  n.pcs = 30
  
  scMuscle.pref.seurat <-
  
  RunUMAP(
  
    scMuscle.pref.seurat, reduction = 'pca_RNA',
  
    dims = 1:n.pcs, reduction.name='umap_RNA'
  
  ) %>% FindNeighbors(
  
    reduction = 'pca_RNA',
  
    dims = 1:n.pcs,
  
    force.recalc = TRUE,
  
    verbose = F
  
  )
  
  scMuscle.pref.seurat <- FindClusters(object = scMuscle.pref.seurat, resolution = 0.25)
  
  scMuscle.pref.seurat[['RNA_res.0.25']] <- as.numeric(scMuscle.pref.seurat@active.ident)
- h.
  Integrate datasets using Harmony package.
  scMuscle.pref.seurat <-
  
  scMuscle.pref.seurat %>% RunHarmony(
  
    group.by.vars=c('sample'), reduction='pca_RNA',
  
    assay='RNA',plot_convergence = TRUE,verbose=TRUE)
  
  scMuscle.pref.seurat <-
  
  scMuscle.pref.seurat %>% RunUMAP(
  
    reduction = 'harmony', dims = 1:n.pcs,
  
    reduction.name='umap_harmony')
  
  scMuscle.pref.seurat@reductions$umap_harmony@misc$n.pcs.used <- n.pcs
  
  scMuscle.pref.seurat <-
  
  scMuscle.pref.seurat %>% FindNeighbors(
  
    reduction = 'harmony',dims = 1:n.pcs,
  
    graph.name = 'harmony_snn',force.recalc = TRUE,
  
    verbose = FALSE)
  
  scMuscle.pref.seurat <- FindClusters(
  
  object = scMuscle.pref.seurat,resolution = 1.0,
  
  graph.name='harmony_snn')
  
  scMuscle.pref.seurat[['harmony_res.1.0']] <- as.numeric(scMuscle.pref.seurat@active.ident)
  
  scMuscle.pref.seurat <- FindClusters(
  
  object = scMuscle.pref.seurat,
  
  resolution = 2.0, graph.name='harmony_snn')
  
  scMuscle.pref.seurat[['harmony_res.2.0']] <- as.numeric(scMuscle.pref.seurat@active.ident)
- i.
  Validate integrated results (Figure 4D).
  library(cowplot)
  
  library(ggplot2)
  
  p1<-DimPlot(object = scMuscle.pref.seurat, reduction = "umap_RNA", pt.size = .1, group.by = "sample")+labs(title = "Merged by Seurat")
  
  p2<-DimPlot(object = scMuscle.pref.seurat, reduction = "umap_harmony", pt.size = .1, group.by = "sample")+labs(title = "Merged by Seurat with Harmony")
  
  p1+p2
  
  save(scMuscle.pref.seurat,file="naive_instructed_scRNA_ESCs.RData")

19.
Integrate scATAC-seq dataset with scRNA-seq dataset.
- a.
  Prepare R library for integration.
  library(Signac)
  
  library(Seurat)
  
  library(GenomeInfoDb)
  
  library(EnsDb.Mmusculus.v79)
  
  library(patchwork)
  
  library(ggplot2)
  
  set.seed(1234)
- b.
  Load datasets.
  load("aPSM_scRNA.RData")
  
  load("aPSM_scATAC.RData")
- c.
  Infer relations between scRNA-seq and scATAC-seq.
  DefaultAssay(aPSM_atac) <- 'RNA'
  
  ncol(aPSM_atac)
  
  transfer.anchors <- FindTransferAnchors(
  
  reference = aPSM, query = aPSM_atac, k.anchor = 20,
  
  k.filter = 200, reduction = 'cca', dims = 1:30)
  
  predicted.labels <- TransferData(
  
  anchorset = transfer.anchors,
  
  refdata = aPSM$seurat_clusters,
  
  weight.reduction = aPSM_atac[['lsi']],
  
  dims = 2:30)
  
  save(transfer.anchors,file="transfer.anchors_aPSM_atac.RData")
  
  aPSM_atac <- AddMetaData(object =aPSM_atac, metadata = predicted.labels)
  
  save(aPSM_atac,file="aPSM_atac_meta.RData")
- d.
  Visualize the clusters of the integrated datasets.
  DimPlot(object = aPSM_atac, label = F,reduction = 'umap',group.by ='predicted.id' ) +labs(title = " aPSM scATAC")
  
  DimPlot(object = aPSM, label = F,reduction = 'umap') +labs(title = " aPSM")

Part 2: Multiomics analysis

Timing: 1 h(for step 20 and step 21)

In this section, we describe major steps on how to perform multimodal analysis.

20.
Data preprocessing.
- a.
  Load the libraries and setup working directory.
  library(Seurat)
  
  library(Signac)
  
  library(patchwork)
  
  library(monocle3)
  
  library(SeuratWrappers)
  
  library(EnsDb.Mmusculus.v79)
  
  library(GenomeInfoDb)
  
  library(ggplot2)
  
  library(dplyr)
  
  set.Seed(1234)
  
  setwd(getwd())
- b.
  Load snRNA and snATAC data and create Seurat object.
  Star.data <- Read10x(data.dir = " ./multiomics/filtered_feature_bc_matrix/”)
  
  # Extract RNA and ATAC data
  
  rna_counts <- Star.data$`Gene Expression`
  
  atac_counts <- Star.data$Peaks
  
  # Create Seurat object containing snRNA data
  
  Star <- CreateSeuratObject(counts = rna_counts, project = "Star", min.cells=5, min.features = 100, assay = "RNA")
  CRITICAL: HIFLR_snRNA_barcodes.tsv.gz, HIFLR_snRNA_features.tsv.gz, and HIFLR_snRNA_matrix.mtx.gz are the files generated by CellRanger-arc v2.0.0. Files should be kept in the same folder, named as filtered_feature_bc_matrix.
- c.
  Load snATAC-seq fragments files.
  grange.counts <- StringToGRanges(rownames(atac_counts), sep = c(":", "-"))
  
  grange.use <- seqnames(grange.counts) %in% standardChromosomes(grange.counts)
  
  atac_counts <- atac_counts[as.vector(grange.use), ]
- d.
  Add annotation.
  annotation <- GetGRangesFromEnsDb(ensdb = EnsDb.Mmusculus.v79)
  
  seqlevelsStyle(annotation) <- "UCSC"
  
  genome(annotation) <- "mm10"
- e.
  Create snATAC assay.
  fragpath <- " ./multiomics/filtered_feature_bc_matrix/fragments.tsv.gz"
  
  Star[["ATAC"]] <- CreateChromatinAssay(counts = atac_counts, sep = c(":", "-"), genome = 'mm10', fragments = fragpath, min.cells = 10, annotation = annotation)
- f.
  Downsize the dataset.
  set.seed(111)
  
  Star <- subset(x = Star, downsample = 6000)
  
  save(Star, file="Star_ds6k.RData")
  
  load("./Star_ds6k.RData")
  CRITICAL: To load the snATAC-seq fragments file properly, fragments.tsv.gz.tbi file is required to be in the same folder.
  
  CRITICAL: Use only peaks in standard chromosomes and set sequence level style as UCSC.
- g.
  Quality control:
  - i.
    Calculate percentage of mitochondrial genes in snRNA-seq.
  - ii.
    Compute both TSS enrichment score and nucleosome signal metrics in Signac for snATAC-seq (Figure 5A).
    DefaultAssay(Star) <- "RNA"
    
    Star[["percent.mt"]] <- PercentageFeatureSet(Star, pattern = "ˆmt-")
    
    VlnPlot(Star, features = c("nCount_RNA", "nFeature_RNA", "percent.mt"), ncol = 3, log = TRUE, pt.size = 0) + NoLegend()
    
    DefaultAssay(Star) <- "ATAC"
    
    Star <- NucleosomeSignal(Star)
    
    Star <- TSSEnrichment(object=Star, fast=FALSE)
    
    VlnPlot(Star, features = c("nCount_ATAC", "nFeature_ATAC", "TSS.enrichment", "nucleosome_signal"), ncol = 4, log = TRUE, pt.size = 0) + NoLegend()
    Note: Low-quality cells refer to potential damaged cells, empty droplets, cell doublets, or multiplets.
  - iii.
    Remove low quality cells (Figure 5B).
    Star <- subset(x = Star,
    
    subset = nCount_RNA < 100000 &
    
    nCount_RNA > 1200 &
    
    nCount_ATAC < 1e5 &
    
    nCount_ATAC > 1e2 &
    
    nucleosome_signal < 2.5 &
    
    TSS.enrichment > 3 &
    
    Percent.mt < 10)
    
    saveRDS(Star, file="Star.RData")
    
    VlnPlot(Star, features = c("nCount_RNA", "nFeature_RNA", "percent.mt"), ncol = 3, log = TRUE, pt.size = 0) + NoLegend()
    
    VlnPlot(Star, features = c("nCount_ATAC", "nFeature_ATAC", "TSS.enrichment", "nucleosome_signal"), ncol = 4, log = TRUE, pt.size = 0) + NoLegend()
    CRITICAL: The filtering criteria are dataset specific. Chose a cutoff to avoid losing unique cell populations or to include noisy cells.

21.
WNN analysis.
- a.
  Perform normalization and dimensional reduction of snRNA-seq and snATAC-seq assays independently and individually.
  # snRNA analysis
  
  DefaultAssay(Star) <- "RNA"
  
  Star <- SCTransform(Star, verbose = FALSE) %>% RunPCA() %>% RunUMAP(dims = 1:30, reduction.name = 'umap', reduction.key = 'UMAP_')
  
  # snATAC analysis
  
  DefaultAssay(Star) <- "ATAC"
  
  Star <- RunTFIDF(Star)
  
  Star <- FindTopFeatures(Star, min.cutoff = 'q0')
  
  Star <- RunSVD(Star)
  
  Star <- RunUMAP(Star, reduction = 'lsi', dims = 2:30,
  
  reduction.name = "umap.atac", reduction.key = "atacUMAP_")
  Note: In snATAC-seq assay, the first dimension is typically correlated with sequencing depth rather than biological variation. It is thus excluded in the UMAP computing.
- b.
  Learn cell-specific modality weights and construct a WNN graph.
  Star <- FindMultiModalNeighbors(Star, reduction.list = list("pca", "lsi"),dims.list = list(1:30, 2:30))
  
  Star <- RunUMAP(Star, nn.name = "weighted.nn",
  
  reduction.name = "umap.wnn", reduction.key ="wnnUMAP_")
  
  Star <- FindClusters(Star, graph.name = "wsnn",
  
  resolution = 0.8, algorithm = 3, verbose = FALSE)
- c.
  Visualize the clusters. (Figure 6A).
  p1 <- DimPlot(Star, reduction = "umap", group.by = "seurat_clusters", label = TRUE, label.size = 8, repel = TRUE) + ggtitle("RNA")
  
  p2 <- DimPlot(Star, reduction = "umap.atac",group.by = "seurat_clusters", label = TRUE, label.size = 8, repel = TRUE) + ggtitle("ATAC")
  
  p3 <- DimPlot(Star, reduction = "umap.wnn", group.by = "seurat_clusters", label = TRUE, label.size = 8, repel = TRUE) + ggtitle("WNN")
  
  p1+p2+p3 &NoLegend()
- d.
  snRNA-seq analysis: Characterization and annotation of cell states are achieved by identifying marker genes via differential expression analysis in both pseudotemporal ordering identified clusters and WNN clusters. Cell types are defined using known gene markers. For example, Myod1, Myog, and Myf5 are myogenic markers and Ascl1, Neurod4, and Nhlh1 are neurogenic markers. Pax7 drives both myogenesis and neurogenesis. Meis1 and Pbx1 are anterior presomitic mesoderm (aPSM) markers. As an example, here we analyze myogenic genes Myod1 and Myog.
  - i.
    Find markers.
    DefaultAssay(Star) <- "RNA"
    
    Star.rna.markers <- FindAllMarkers(Star, assay = "RNA", only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)
    
    Star.rna.markers %>%
    
    group_by(cluster) %>%
    
    top_n(n = 2, wt = avg_log2FC)
  - ii.
    Add cell states annotations.
    Star <- RenameIdents(Star, '10' = 'cell_5','11' = 'cell_2')
    
    Star <- RenameIdents(Star, '5' = 'cell_4','6' = 'cell_1','7' = 'cell_1','8' = 'cell_3','9' = 'cell_5')
    
    Star <- RenameIdents(Star, '0' = 'cell_1','1' = 'cell_2','2' = 'cell_1','3' = 'cell_1','4' = 'cell_3')
    
    Star$celltype <- Idents(Star)
    CRITICAL: Cell states can be assigned with known markers. Writing the Star.rna.markers into a file and studying the markers potentially used to annotate the clusters would be recommended.
  - iii.
    Visualize the cell states (Figure 6D).
    p1 <- DimPlot(Star, reduction = "umap", group.by = " celltype", label = FALSE, label.size = 8, repel = TRUE) + ggtitle("RNA")
    
    p2 <- DimPlot(Star, reduction = "umap.atac",group.by = " celltype", label = FALSE, label.size = 8, repel = TRUE) + ggtitle("ATAC")
    
    p3 <- DimPlot(Star, reduction = "umap.wnn", group.by = " celltype", label = FALSE, label.size = 8, repel = TRUE) + ggtitle("WNN")
    
    p1+p2+p3
- e.
  snATAC-seq analysis.
  - i.
    Load libraries.
    library(chromVAR)
    
    library(motifmatchr)
    
    library(JASPAR2020)
    
    library(TFBSTools)
    
    library(BSgenome.Mmusculus.UCSC.mm10)
  - ii.
    Find snATAC markers.
    DefaultAssay(Star) <- "ATAC"
    
    Star.atac.markers <- FindAllMarkers(Star, assay = "ATAC", only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)
    
    Star.atac.markers %>%
    
    group_by(cluster) %>%
    
    top_n(n = 2, wt = avg_log2FC)
  - iii.
    Add motif information.
    pwm_set <- getMatrixSet(x = JASPAR2020, opts = list(collection = "CORE", tax_group = 'vertebrates', all_versions = FALSE))
    
    Star <- AddMotifs(
    
    object = Star,
    
    genome = BSgenome.Mmusculus.UCSC.mm10,
    
    pfm = pwm_set,
    
    assay="ATAC")
  - iv.
    Computer motif activities.
    Star <- RunChromVAR(
    
    object = Star,
    
    genome = BSgenome.Mmusculus.UCSC.mm10)

Multiomics data quality control

(A) snRNA and snATAC QC plot before removing low quality cells.

(B) snRNA and snATAC QC plot after removing low quality cells.

Characterization and annotation of cell states

(A) UMAP visualization of the clustering based on snRNA-seq, snATAC-seq, and WNN analysis before cell state annotation.

(B) Pseudotime single cell trajectory plot. The heatmap represents units of progress, with 1 located at the root of the trajectory.

(C) Cell states derived from pseudotime trajectory inference. State 2 and state 4 are marked NA (not assigned), since they may represent transitioning states and could not be unambiguously assigned to a specific developmental stage.

(D) UMAP visualization of cell states after annotated clustering.

Part 2: Data visualization

Timing: 1 h(for step 22)

In this section, we describe steps to do data visualization.

22.
Pseudotime analysis:
- a.
  Convert Seurat object to Monocle object.
  DefaultAssay(Star) <- "RNA"
  
  set.seed(22)
  
  cds <- SeuratWrappers::as.cell_data_set(Star, assay = "RNA", reduction = "umap", group.by = "celltype")
  
  cds@rowRanges@elementMetadata@listData[["gene_short_name"]] <- rownames(Star[["RNA"]])
- b.
  Create CDS object.
  cds <- preprocess_cds(cds, method = "PCA")
  
  cds <- reduce_dimension(cds, preprocess_method = "PCA",umap.n_neighbors= 14L, reduction_method = "UMAP")
  
  cds <- cluster_cells(cds, reduction_method = "UMAP")
  
  cds <- learn_graph(cds, use_partition = FALSE, close_loop = FALSE)
- c.
  Set the root with Seurat clusters 0 and order cells.
  cell_ids <- colnames(cds)[Star$seurat_clusters == "0"]
  
  closest_vertex <- cds@principal_graph_aux[["UMAP"]]$pr_graph_cell_proj_closest_vertex
  
  closest_vertex <- as.matrix(closest_vertex[colnames(cds), ])
  
  closest_vertex <- closest_vertex[cell_ids, ]
  
  closest_vertex <- as.numeric(names(which.max(table(closest_vertex))))
  
  mst <- principal_graph(cds)$UMAP
  
  root_pr_nodes <- igraph::V(mst)$name[closest_vertex]
  
  rowData(cds)$gene_name <- rownames(cds)
  
  rowData(cds)$gene_short_name <- rowData(cds)$gene_name
  
  cds <- order_cells(cds, root_pr_nodes = root_pr_nodes)
- d.
  Visualize trajectory plot (Figure 6B).
  plot_cells(cds, color_cells_by = "pseudotime",
  
     label_cell_groups =T, label_leaves = F,
  
     label_branch_points = F,show_trajectory_graph = T,
  
     graph_label_size = 3, label_groups_by_cluster = T)
- e.
  Visualize cell states derived from trajectory inference (Figure 6C).
  plot_cells(cds, color_cells_by = "cluster", cell_size = 1,
  
     label_cell_groups = TRUE, group_label_size = 4,
  
     show_trajectory_graph = FALSE,
  
     label_branch_points = FALSE,
  
     label_roots = FALSE,
  
     label_leaves = FALSE)
- f.
  Visualize paired-plots expression of Myod1 and Myog (Figure 7A).
  Star.seur <- as.Seurat(cds, assay = NULL, clusters = "UMAP")
  
  Star.seur <- AddMetaData(Star.seur,metadata= cds@principal_graph_aux$UMAP$pseudotime,
  
  col.name = "monocle3_pseudotime")
  
  FeaturePlot(Star.seur,features = c("Myod1","Myog"),
  
     reduction ="UMAP",combine = T,
  
     blend = TRUE, blend.threshold = 0.0,
  
     min.cutoff = 0,max.cutoff = 6)
- g.
  Visualize Footprinting plots (Figure 7B).
  Star_135 <- subset(x = Star, idents = c("cell_1", "cell_3", "cell_5"), invert = FALSE)
  
  DefaultAssay(Star_135) <- "ATAC"
  
  Star_135 <- Footprint(
  
  object = Star_135,
  
  motif.name = c("MYOG", "MYOD1"),
  
  genome = BSgenome.Mmusculus.UCSC.mm10)
  
  PlotFootprint(Star_135, features = c("MYOD1")) + patchwork::plot_layout(ncol = 1)
  
  PlotFootprint(Star_135, features = c("MYOG")) + patchwork::plot_layout(ncol = 1)
  Note: Cell_1 is aPSM cells, Cell_3 is a neurogenic cluster, and Cell_5 is a myogenic cluster.

Visualization of myogenic cells

(A) Individual and paired-plots expression of Myod1 and Myog in cell states derived from pseudotime trajectory inference.

(B) Myod1 and Myog footprinting profile in aPSM, neurogenic and myogenenic clusters.

Expected outcomes

This protocol provides a resource to profile transcriptional and chromatin accessibility features of pluripotent, mesoderm-induced ESCs and ESC-derived cell lineages. Expression profiles and chromatin accessibility are determined for each developmental timepoint. Transcriptomics changes across differentiation time points are revealed through integrating pipeline of individual scRNA-seq datasets (protocol 1:integrated data analysis-step1), and correlation between transcriptomic expression and chromatin accessibility through integrating pipeline between scRNA-seq and scATAC-seq datasets (protocol 1:integrated data analysis-step2). In addition, multiomics datasets can be visualized and inferred through multiomics analysis pipeline (protocol 2: omics analysis).

Limitations

The protocols are based on R library called Seurat under R-R studio schema. If users need to run the protocols in high-performance computing environments, they require R batch modules such as Swarm. Furthermore, the parameters of data integration are decided by the heuristic hyperparameter tuning for the datasets under specific time points. Therefore, we need to develop an automatic tuning module to explore optimal hyperparameters for new datasets. In addition, users can compare the outputs from these protocols with results from other single cell packages such as SCANPY, if a module to convert schema between R and Python is developed.

Troubleshooting

Problem 1

Unable to run the docker image with docker desktop.

Potential solution

In the software preparation step, it is important to follow the steps in Docker_manual_mac.docx or Docker_manual_windowOS.docx and set up the docker desktop environment properly.

Problem 2

R packages cannot be loaded by “library” command.

Potential solution

Run the codes in R environment below:

p <- installed.packages()

colnames(p)

If the packages cannot be found after running the codes, visit the Bioconductor website (https://www.bioconductor.org/), search a package, and follow guidelines. If the package cannot be found in Bioconductor, run install.packages(“package_name”) in R environment. More details and examples can be found in Software_preparation.R.

Problem 3

Data files cannot be loaded.

Potential solution

Check whether the files are in the folder. If they are, check their name.

Problem 4

Monocle3 failed to be installed.

Potential solution

•
Install the monocle3: Monocle3 runs in the R statistical computing environment. R version 4.2.2 or higher will be needed.
•
Install a few Bioconductor dependencies that aren’t automatically installed.

BiocManager::intall(c('BiocGenerics', 'DelayedArray', 'DelayedMatrixStats','limma', 'lme4', 'S4Vector', 'SingleCellExperiment', 'SummarizedExperiment', 'batchelor', 'Matrix.utils', 'HDF5Array', 'terra', 'ggrastr'))

•
Install monocle3 through the cole-trapnell-lab GitHub: To ensure the monocle3 was installed correctly, start a new R session, and run.

install.packages('devtools')

devtools::install_github('cole-trapnell-lab/monocle3')

library(monocle3)

CRITICAL: monocle3 installation is tricky. Some troubleshooting will be found at cole-trapnell-lab GitHub ( https://cole-trapnell-lab.github.io/monocle3/docs/installation)

Problem 5

Plots cannot be drawn.

Potential solution

Run the codes in R environment below:

gg2 <- try(find.package("ggplot2"), silent = TRUE)

gg2

If the packages cannot be found after running the codes, run install.packages(“ggplot2”) in R environment.

Resource availability

Lead contact

Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Vittorio Sartorelli (sartorev@mail.nih.gov).

Materials availability

This study did not generate unique reagents.

Data and code availability

Original data and codes have been deposited to Zenodo: https://doi.org/10.5281/zenodo.7224723.

Acknowledgments

We thank the NIAMS Genomic Technology, Biodata Mining and Discovery, Flow Cytometry, and Light Imaging Sections. Dr. Hong-Wei Sun and Dr. Stephen Brooks (Biodata Mining and Discovery Section) provided useful suggestions for data analysis. This study utilized the high-performance computational capabilities of the Helix Systems at the NIH, Bethesda, MD, USA (https://helix.nih.gov/). This work was supported in part by the Intramural Research Program of the NIAMS at the NIH (grants AR041126 and AR041164 to V.S.).

Author contributions

K.D.K. and K.J. analyzed and interpreted data and drafted the manuscript. S.D.O. and V.S. edited the manuscript and supervised the project.

Declaration of interests

The authors declare no competing interests.

Contributor Information

Kyung Dae Ko, Email: kyungdae.ko@nih.gov.

Kan Jiang, Email: kan.jiang@nih.gov.

Vittorio Sartorelli, Email: vittorio.sartorelli@nih.gov.

References

1.Khateb M., Perovanovic J., Ko K.D., Jiang K., Feng X., Acevedo-Luna N., Chal J., Ciuffoli V., Genzor P., Simone J., et al. Transcriptomics, regulatory syntax, and enhancer identification in mesoderm-induced ESCs at single-cell resolution. Cell Rep. 2022;40:111219. doi: 10.1016/j.celrep.2022.111219. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.RStudio Team . RStudio. Integrated Development for R; 2022. [Google Scholar]
3.Stuart T., Butler A., Hoffman P., Hafemeister C., Papalexi E., Mauck W.M., 3rd, Hao Y., Stoeckius M., Smibert P., Satija R. Comprehensive integration of single-cell data. Cell. 2019;177:1888–1902.e21. doi: 10.1016/j.cell.2019.05.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Stuart T., Srivastava A., Madad S., Lareau C.A., Satija R. Single-cell chromatin state analysis with Signac. Nat. Methods. 2021;18:1333–1341. doi: 10.1038/s41592-021-01282-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Korsunsky I., Millard N., Fan J., Slowikowski K., Zhang F., Wei K., Baglaenko Y., Brenner M., Loh P.R., Raychaudhuri S. Fast, sensitive and accurate integration of single-cell data with harmony. Nat. Methods. 2019;16:1289–1296. doi: 10.1038/s41592-019-0619-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Cao J., Spielmann M., Qiu X., Huang X., Ibrahim D.M., Hill A.J., Zhang F., Mundlos S., Christiansen L., Steemers F.J., et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature. 2019;566:496–502. doi: 10.1038/s41586-019-0969-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Fornes O., Castro-Mondragon J.A., Khan A., van der Lee R., Zhang X., Richmond P.A., Modi B.P., Correard S., Gheorghe M., Baranašić D., et al. Jaspar 2020: update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 2020;48:D87–D92. doi: 10.1093/nar/gkz1001. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Tan G., Lenhard B. TFBSTools: an R/bioconductor package for transcription factor binding site analysis. Bioinformatics. 2016;32:1555–1556. doi: 10.1093/bioinformatics/btw024. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Original data and codes have been deposited to Zenodo: https://doi.org/10.5281/zenodo.7224723.

[bib1] 1.Khateb M., Perovanovic J., Ko K.D., Jiang K., Feng X., Acevedo-Luna N., Chal J., Ciuffoli V., Genzor P., Simone J., et al. Transcriptomics, regulatory syntax, and enhancer identification in mesoderm-induced ESCs at single-cell resolution. Cell Rep. 2022;40:111219. doi: 10.1016/j.celrep.2022.111219. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib10] 2.RStudio Team . RStudio. Integrated Development for R; 2022. [Google Scholar]

[bib2] 3.Stuart T., Butler A., Hoffman P., Hafemeister C., Papalexi E., Mauck W.M., 3rd, Hao Y., Stoeckius M., Smibert P., Satija R. Comprehensive integration of single-cell data. Cell. 2019;177:1888–1902.e21. doi: 10.1016/j.cell.2019.05.031. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] 4.Stuart T., Srivastava A., Madad S., Lareau C.A., Satija R. Single-cell chromatin state analysis with Signac. Nat. Methods. 2021;18:1333–1341. doi: 10.1038/s41592-021-01282-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib4] 5.Korsunsky I., Millard N., Fan J., Slowikowski K., Zhang F., Wei K., Baglaenko Y., Brenner M., Loh P.R., Raychaudhuri S. Fast, sensitive and accurate integration of single-cell data with harmony. Nat. Methods. 2019;16:1289–1296. doi: 10.1038/s41592-019-0619-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] 6.Cao J., Spielmann M., Qiu X., Huang X., Ibrahim D.M., Hill A.J., Zhang F., Mundlos S., Christiansen L., Steemers F.J., et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature. 2019;566:496–502. doi: 10.1038/s41586-019-0969-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib6] 7.Fornes O., Castro-Mondragon J.A., Khan A., van der Lee R., Zhang X., Richmond P.A., Modi B.P., Correard S., Gheorghe M., Baranašić D., et al. Jaspar 2020: update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 2020;48:D87–D92. doi: 10.1093/nar/gkz1001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] 8.Tan G., Lenhard B. TFBSTools: an R/bioconductor package for transcription factor binding site analysis. Bioinformatics. 2016;32:1555–1556. doi: 10.1093/bioinformatics/btw024. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Integrating single-cell transcriptomes, chromatin accessibility, and multiomics analysis of mesoderm-induced embryonic stem cells

Kyung Dae Ko

Kan Jiang

Stefania Dell’Orso

Vittorio Sartorelli

Summary

Graphical abstract

Highlights

Before you begin

Hardware preparation

Software preparation

Data collection

Figure 1.

Key resources table

Step-by-step method details

Part 1: Single cell RNA seq analysis

Figure 2.

Figure 3.

Part 1: Single cell ATAC seq analysis

Figure 4.

Part 1: Integrated data analysis

Part 2: Multiomics analysis

Figure 5.

Figure 6.

Part 2: Data visualization

Figure 7.

Expected outcomes

Limitations

Troubleshooting

Problem 1

Potential solution

Problem 2

Potential solution

Problem 3

Potential solution

Problem 4

Potential solution

Problem 5

Potential solution

Resource availability

Lead contact

Materials availability

Data and code availability

Acknowledgments

Author contributions

Declaration of interests

Contributor Information

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases