The example analysis is implemented by Signac.
The example data consist in ~9000 cells of human peripheral blood mononuclear cells (PBMCs) provided by 10x Genomics. The following files are used in this pipeline, all available through the 10x Genomics website: Raw data, Metadata, Fragments file.
First import Signac, Seurat, GenomeInfoDb, and some other packages we will be using for analyzing PBMCs.
options(warn=-1)
library(Signac)
library(leiden)
library(Seurat)
library(GenomeInfoDb)
library(EnsDb.Hsapiens.v75)
library(ggplot2)
library(patchwork)
set.seed(1234)
The first step of analysis is to load the cell-by-feature matrix and the fragment file as a Seurat object.
The matrix is analogous to the gene expression count matrix used to analyze single-cell RNA-seq. However, instead of genes, each row of the matrix represents a region of the genome (a peak/windows), that is predicted to represent a region of open chromatin. ScEpiTools will provide a task-specific ID after uploading the file.
The fragment file represents a full list of all unique fragments across all single cells. It is a substantially larger file, is slower to work with, and is stored on-disk (instead of in memory). However, the advantage of retaining this file is that it contains all fragments associated with each single cell, as opposed to only fragments that map to peaks.
Internal parameters:
task_id - The only identifier of each task.
task_id <- 'Example'
inputpath <- paste0("./", task_id, "/")
outputpath <- paste0("./", task_id, "/")
We start by creating a Seurat object. Signac uses a matrix and a fragment file as input.
User-defined parameters:
Matrix_file - The name of the matrix file you want to load. Default: ‘atac_v1_pbmc_10k_filtered_peak_bc_matrix.h5’.
Fragment_file - The name of the fragment file you want to load. Default: ‘atac_v1_pbmc_10k_fragments.tsv.gz’.
Metadata_file - The name of the metadata file you want to load. Default: ‘atac_v1_pbmc_10k_singlecell.csv’.
Reference_genome - ‘hg19’, ‘hg38’ or ‘mm10’ are the 3 currently recognised genomic version in this pipeline. Default: ‘hg19’.
Minimum_peaks - Minimum number of peaks required for a cell to pass filtering. Default: 200.
Minimum_cells - Minimum number of cells required for a peak to pass filtering. Default: 10.
Matrix_file <- "atac_v1_pbmc_10k_filtered_peak_bc_matrix.h5"
counts <- Read10X_h5(filename = paste0(inputpath, Matrix_file))
Metadata_file <- "atac_v1_pbmc_10k_singlecell.csv"
metadata <- read.csv(
file = paste0(inputpath, Metadata_file),
header = TRUE,
row.names = 1
)
Fragment_file <- "atac_v1_pbmc_10k_fragments.tsv.gz"
Reference_genome <- 'hg19'
Minimum_peaks <- 200
Minimum_cells <- 10
chrom_assay <- CreateChromatinAssay(
counts = counts,
sep = c(":", "-"),
genome = Reference_genome,
fragments = paste0(inputpath, Fragment_file),
min.cells = Minimum_cells,
min.features = Minimum_peaks
)
## Computing hash
example <- CreateSeuratObject(
counts = chrom_assay,
assay = "peaks",
meta.data = metadata
)
example
## An object of class Seurat
## 87561 features across 8728 samples within 1 assay
## Active assay: peaks (87561 features, 0 variable features)
We can see some of the additional information that can be contained in the ChromatinAssay, including motif information, gene annotations, and genome information.
example[['peaks']]
## ChromatinAssay data with 87561 features for 8728 cells
## Variable features: 0
## Genome: hg19
## Annotation present: FALSE
## Motifs present: FALSE
## Fragment files: 1
We can call granges on a Seurat object with a ChromatinAssay set as the active assay (or on a ChromatinAssay) to see the genomic ranges associated with each feature in the object.
granges(example)
## GRanges object with 87561 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## [1] chr1 565107-565550 *
## [2] chr1 569174-569639 *
## [3] chr1 713460-714823 *
## [4] chr1 752422-753038 *
## [5] chr1 762106-763359 *
## ... ... ... ...
## [87557] chrY 58993392-58993760 *
## [87558] chrY 58994571-58994823 *
## [87559] chrY 58996352-58997331 *
## [87560] chrY 59001782-59002175 *
## [87561] chrY 59017143-59017246 *
## -------
## seqinfo: 24 sequences from an unspecified genome; no seqlengths
We can also add gene annotations to the example object for the human genome. This will allow downstream functions to pull the gene annotation information directly from the object.
# extract gene annotations from EnsDb
if (Reference_genome == 'mm10') {
library(EnsDb.Mmusculus.v79)
annotations <- GetGRangesFromEnsDb(ensdb = EnsDb.Mmusculus.v79)
} else if (Reference_genome == 'hg38') {
library(EnsDb.Mmusculus.v86)
annotations <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86)
} else {
annotations <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v75)
}
# change to UCSC style since the data was mapped to hg19
seqlevelsStyle(annotations) <- 'UCSC'
# add the gene information to the object
Annotation(example) <- annotations
Next we can now compute some QC metrics for the scATAC-seq experiment.
User-defined parameters:
Minimum_fragments_in_peaks - Minimum number of fragments in peaks required for a cell to pass filtering. Default: 3000.
Maximum_fragments_in_peaks - Maximum number of fragments in peaks required for a cell to pass filtering. Default: 20000.
Minimum_fraction_in_peaks - Minimum fraction of fragments in peaks required for a cell to pass filtering. Default: 0.15.
Maximum_ratio_in_blacklist - Maximum ratio of reads in genomic blacklist regions required for a cell to pass filtering. Default: 0.05.
Maximum_nucleosome_signal - Maximum approximate ratio of mononucleosomal to nucleosome-free fragments required for a cell to pass filtering. Default: 0.04.
Minimum_TSS_enrichment - Minimum TSS enrichment scores required for a cell to pass filtering. Default: 2.
Nucleosome banding pattern: The histogram of DNA fragment sizes (determined from the paired-end sequencing reads) should exhibit a strong nucleosome banding pattern corresponding to the length of DNA wrapped around a single nucleosome. We calculate this per single cell, and quantify the approximate ratio of mononucleosomal to nucleosome-free fragments (stored as nucleosome_signal)
# compute nucleosome signal score per cell
example <- NucleosomeSignal(object = example)
Transcriptional start site (TSS) enrichment score. The ENCODE project has defined an ATAC-seq targeting score based on the ratio of fragments centered at the TSS to fragments in TSS-flanking regions (see https://www.encodeproject.org/data-standards/terms/). Poor ATAC-seq experiments typically will have a low TSS enrichment score. We can compute this metric for each cell with the TSSEnrichment() function, and the results are stored in metadata under the column name TSS.enrichment.
# compute TSS enrichment score per cell
example <- TSSEnrichment(object = example, fast = FALSE)
## Extracting TSS positions
## Finding + strand cut sites
## Finding - strand cut sites
## Computing mean insertion frequency in flanking regions
## Normalizing TSS score
Note that the following three metrics can be obtained from the output of CellRanger (which is stored in the object metadata), but can also be calculated for non-10x datasets using Signac.
Total number of fragments in peaks: A measure of cellular sequencing depth / complexity. Cells with very few reads may need to be excluded due to low sequencing depth. Cells with extremely high levels may represent doublets, nuclei clumps, or other artefacts.
Fraction of fragments in peaks: Represents the fraction of all fragments that fall within ATAC-seq peaks. Cells with low values (i.e. <15-20%) often represent low-quality cells or technical artifacts that should be removed. Note that this value can be sensitive to the set of peaks used.
Ratio reads in genomic blacklist regions. The ENCODE project has provided a list of blacklist regions, representing reads which are often associated with artefactual signal. Cells with a high proportion of reads mapping to these areas (compared to reads mapping to peaks) often represent technical artifacts and should be removed.
# add blacklist ratio and fraction of reads in peaks
example$pct_reads_in_peaks <- example$peak_region_fragments / example$passed_filters * 100
example$blacklist_ratio <- example$blacklist_region_fragments / example$peak_region_fragments
We can inspect the TSS enrichment scores by grouping the cells based on the score and plotting the accessibility signal over all TSS sites. Setting the fast=TRUE option in TSSEnrichment() will only compute the TSS enrichment score without storing the entire cell by position matrix of Tn5 insertion frequency for each cell, and can save memory. However, setting fast=TRUE will not allow downstream plotting of the TSS enrichment signal for different groups of cells using the TSSPlot() function, shown here:
options(repr.plot.width = 12, repr.plot.height = 6)
example$high.tss <- ifelse(example$TSS.enrichment > 2, 'High', 'Low')
p <- TSSPlot(example, group.by = 'high.tss') + NoLegend()
print(p)
ggsave(p, filename = paste0(outputpath, "results/figures/TSS_enrichment.png"), width = 12, height = 6)
We can also look at the fragment length periodicity for all the cells, and group by cells with high or low nucleosomal signal strength. You can see that cells that are outliers for the mononucleosomal / nucleosome-free ratio (based on the plots above) have different nucleosomal banding patterns. The remaining cells exhibit a pattern that is typical for a successful ATAC-seq experiment.
example$nucleosome_group <- ifelse(example$nucleosome_signal > 4, 'NS > 4', 'NS < 4')
p <- FragmentHistogram(object = example, group.by = 'nucleosome_group')
print(p)
ggsave(p, filename = paste0(outputpath, "results/figures/Fragment_Histogram.png"), width = 12, height = 6)
options(repr.plot.width = 15, repr.plot.height = 6)
p <- VlnPlot(
object = example,
features = c('pct_reads_in_peaks', 'peak_region_fragments',
'TSS.enrichment', 'blacklist_ratio', 'nucleosome_signal'),
pt.size = 0.5,
ncol = 5
)
print(p)
ggsave(p, filename = paste0(outputpath, "results/figures/ViolinPlot.png"), width = 15, height = 6)
Finally we remove cells that are outliers for these QC metrics.
Minimum_fragments_in_peaks <- 3000
Maximum_fragments_in_peaks <- 20000
Minimum_fraction_in_peaks <- 0.15
Maximum_ratio_in_blacklist <- 0.05
Maximum_nucleosome_signal <- 0.04
Minimum_TSS_enrichment <- 2
example <- subset(
x = example,
subset = peak_region_fragments > Minimum_fragments_in_peaks &
peak_region_fragments < Maximum_fragments_in_peaks &
pct_reads_in_peaks > Minimum_fraction_in_peaks*100 &
blacklist_ratio < Maximum_ratio_in_blacklist &
nucleosome_signal < Maximum_nucleosome_signal*100 &
TSS.enrichment > Minimum_TSS_enrichment
)
example
## An object of class Seurat
## 87561 features across 7060 samples within 1 assay
## Active assay: peaks (87561 features, 0 variable features)
Next we perform the dimensionality reduction and clustering after normalization and feature selection.
Signac performs term frequency-inverse document frequency (TF-IDF) normalization. This is a two-step normalization procedure, that both normalizes across cells to correct for differences in cellular sequencing depth, and across peaks to give higher values to more rare peaks.
example <- RunTFIDF(example)
## Performing TF-IDF normalization
The low dynamic range of scATAC-seq data makes it challenging to perform variable feature selection. Instead, we can choose to use only the top n% of features (peaks) for dimensional reduction, or remove features present in less than n cells with the FindTopFeatures() function.
Here, we will all features, though we note that we see very similar results when using only a subset of features (try setting min.cutoff to ‘q75’ to use the top 25% all peaks), with faster runtimes. Features used for dimensional reduction are automatically set as VariableFeatures() for the Seurat object by this function.
example <- FindTopFeatures(example, min.cutoff = 'q0')
We next run singular value decomposition (SVD) on the TD-IDF matrix, using the features (peaks) selected above. The combined steps of TF-IDF followed by SVD are known as latent semantic indexing (LSI).
example <- RunSVD(example)
## Running SVD
## Scaling cell embeddings
The first LSI component often captures sequencing depth (technical variation) rather than biological variation. If this is the case, the component should be removed from downstream analysis. We can assess the correlation between each LSI component and sequencing depth using the DepthCor() function.
Here we see there is a very strong correlation between the first LSI component and the total number of counts for the cell, so we will perform downstream steps without this component.
options(repr.plot.width = 12, repr.plot.height = 6)
p <- DepthCor(example)
print(p)
ggsave(p, filename = paste0(outputpath, "results/figures/Correlation_between_depth_and_reduced_dimension_components.png"), width = 12, height = 6)
Now that the cells are embedded in a low-dimensional space, we can perform graph-based clustering and non-linear dimension reduction for visualization. The functions RunUMAP(), FindNeighbors(), and FindClusters() all come from the Seurat package.
User-defined parameters:
Select_dimensions - Which dimensions to use as input features. Default: 30.
Visualization_method - The visualization method for displaying dimensionality reduction and clustering results, scEpiTools provides two choices, ‘umap’ and ‘tsne’. Default: ‘umap’.
Clustering_algorithm - Clustering algorithm for modularity optimization (1 = original Louvain algorithm; 2 = Louvain algorithm with multilevel refinement; 3 = SLM algorithm; 4 = Leiden algorithm). Default: 3.
Select_dimensions <- 30
Visualization_method <- c('umap')
for (Visualization_method_ in Visualization_method) {
if (Visualization_method_ == 'tsne') {
example <- RunTSNE(object = example, reduction = 'lsi', dims = 2:Select_dimensions)
} else {
example <- RunUMAP(object = example, reduction = 'lsi', dims = 2:Select_dimensions)
}
}
## 22:10:57 UMAP embedding parameters a = 0.9922 b = 1.112
## 22:10:57 Read 7060 rows and found 29 numeric columns
## 22:10:57 Using Annoy for neighbor search, n_neighbors = 30
## 22:10:57 Building Annoy index with metric = cosine, n_trees = 50
## 0% 10 20 30 40 50 60 70 80 90 100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 22:10:57 Writing NN index file to temp file /tmp/Rtmp9iuYEz/file57e515f77f1a
## 22:10:57 Searching Annoy index using 1 thread, search_k = 3000
## 22:10:59 Annoy recall = 100%
## 22:11:00 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 22:11:02 Initializing from normalized Laplacian + noise (using irlba)
## 22:11:02 Commencing optimization for 500 epochs, with 274384 positive edges
## 22:11:10 Optimization finished
example <- FindNeighbors(object = example, reduction = 'lsi', dims = 2:Select_dimensions)
## Computing nearest neighbor graph
## Computing SNN
Clustering_algorithm <- 3
example <- FindClusters(object = example, verbose = FALSE, algorithm = Clustering_algorithm)
for (Visualization_method_ in Visualization_method) {
options(repr.plot.width = 8, repr.plot.height = 6)
p <- DimPlot(object = example, label = TRUE, reduction = Visualization_method_) + NoLegend()
print(p)
ggsave(p, filename = paste0(outputpath, "results/figures/", Visualization_method_, ".png"), width = 8, height = 6)
}
To create a gene activity matrix, we extract gene coordinates and extend them to include the 2 kb upstream region. We then count the number of fragments for each cell that map to each of these regions, using the using the FeatureMatrix() function. These steps are automatically performed by the GeneActivity() function.
User-defined parameters:
Marker_genes - Marker genes to help interpret our ATAC-seq clusters. It is a string separated by ‘,’. Default: NULL.
gene.activities <- GeneActivity(example)
## Extracting gene coordinates
## Extracting reads overlapping genomic regions
Add the gene activity matrix to the Seurat object as a new assay and normalize it
example[['RNA']] <- CreateAssayObject(counts = gene.activities)
example <- NormalizeData(
object = example,
assay = 'RNA',
normalization.method = 'LogNormalize',
scale.factor = median(example$nCount_RNA)
)
Now we can visualize the activities of canonical marker genes to help interpret our ATAC-seq clusters.
In this example, we can begin to discern populations of monocytes, B, T, and NK cells based on these gene activity profiles.
DefaultAssay(example) <- 'RNA'
Marker_genes <- 'MS4A1,CD3D,LEF1,NKG7,TREM1,LYZ'
if (Marker_genes != "NULL"){
Marker_genes <- strsplit(Marker_genes, ",")[[1]]
for (Visualization_method_ in Visualization_method) {
options(repr.plot.width = 15, repr.plot.height = 8)
p <- FeaturePlot(
object = example,
reduction = Visualization_method_,
features = Marker_genes,
pt.size = 0.1,
max.cutoff = 'q95',
ncol = 3
)
print(p)
ggsave(p, filename = paste0(outputpath, "results/figures/Marker_gene_activity_", Visualization_method_, ".png"), width = 15, height = 8)
}
} else {print("There is no marker genes for this step!")}
To help interpret the scATAC-seq data, we can classify cells based on an scRNA-seq experiment from the same biological system (human PBMC).
User-defined parameters:
RNA_file - The name of the pre-processed scRNA-seq data you want to load. Default: NULL.
RNA_file <- paste0(inputpath, 'pbmc_10k_v3.rds')
if (file.exists(RNA_file)){
example_rna <- readRDS(RNA_file)
} else {print("There is no scRNA-seq data for this step!")}
if (file.exists(RNA_file)) {
transfer.anchors <- FindTransferAnchors(
reference = example_rna,
query = example,
reduction = 'cca'
)}
## Running CCA
## Merging objects
## Finding neighborhoods
## Finding anchors
## Found 17340 anchors
## Filtering anchors
## Retained 4050 anchors
if (file.exists(RNA_file)) {
predicted.labels <- TransferData(
anchorset = transfer.anchors,
refdata = example_rna$celltype,
weight.reduction = example[['lsi']],
dims = 2:30
)}
## Finding integration vectors
## Finding integration vector weights
## Predicting cell labels
if (file.exists(RNA_file)) {
example <- AddMetaData(object = example, metadata = predicted.labels)
for (Visualization_method_ in Visualization_method) {
options(repr.plot.width = 12, repr.plot.height = 6)
plot1 <- DimPlot(
object = example_rna,
reduction = Visualization_method_,
group.by = 'celltype',
label = TRUE,
repel = TRUE) + NoLegend() + ggtitle('scRNA-seq')
plot2 <- DimPlot(
object = example,
reduction = Visualization_method_,
group.by = 'predicted.id',
label = TRUE,
repel = TRUE) + NoLegend() + ggtitle('scATAC-seq')
p <- plot1 + plot2
print(p)
ggsave(p, filename = paste0(outputpath, "results/figures/Integration_scRNA-seq_", Visualization_method_, ".png"), width = 12, height = 6)
}
}
Annotate the scATAC-seq-derived clusters from scRNA-seq.
# example <- subset(example, idents = 14, invert = TRUE)
# example <- RenameIdents(
# object = example,
# '0' = 'CD14 Mono',
# '1' = 'CD4 Memory',
# '2' = 'CD8 Effector',
# '3' = 'CD4 Naive',
# '4' = 'CD14 Mono',
# '5' = 'DN T',
# '6' = 'CD8 Naive',
# '7' = 'NK CD56Dim',
# '8' = 'pre-B',
# '9' = 'CD16 Mono',
# '10' = 'pro-B',
# '11' = 'DC',
# '12' = 'NK CD56bright',
# '13' = 'pDC'
# )
To find differentially accessible regions between clusters of cells, we can perform a differential accessibility (DA) test.
User-defined parameters:
First_cluster_number - The first cluster number for comparison in the differential feature analysis. Default: 1.
Second_cluster_number - The second cluster number for comparison in the differential feature analysis. Default: 2.
First_cluster_number <- 3
Second_cluster_number <- 0
DefaultAssay(example) <- 'peaks'
da_peaks <- FindMarkers(
object = example,
ident.1 = First_cluster_number,
ident.2 = Second_cluster_number,
test.use = 'LR',
latent.vars = 'peak_region_fragments'
)
head(da_peaks)
## p_val avg_log2FC pct.1 pct.2 p_val_adj
## chr14-99721608-99741934 3.296097e-288 5.924366 0.879 0.017 2.886096e-283
## chr14-99695477-99720910 5.598969e-219 5.105227 0.825 0.021 4.902514e-214
## chr7-142501666-142511108 4.728614e-204 4.680409 0.767 0.032 4.140422e-199
## chr17-80084198-80086094 1.865399e-200 6.607317 0.695 0.006 1.633362e-195
## chr2-113581628-113594911 5.454551e-198 -4.698955 0.055 0.701 4.776059e-193
## chr6-44025105-44028184 3.963592e-192 -4.145363 0.063 0.650 3.470561e-187
for (Visualization_method_ in Visualization_method) {
options(repr.plot.width = 12, repr.plot.height = 6)
plot1 <- VlnPlot(
object = example,
features = rownames(da_peaks)[1],
pt.size = 0.1,
idents = c(First_cluster_number,Second_cluster_number)
)
plot2 <- FeaturePlot(
object = example,
reduction = Visualization_method_,
features = rownames(da_peaks)[1],
pt.size = 0.1
)
p <- plot1 | plot2
print(p)
ggsave(p, filename = paste0(outputpath, "results/figures/Differential_feature_analysis_", Visualization_method_, ".png"), width = 12, height = 6)
}
Look at the fold change accessibility between two groups of cells to find DA regions.
fc <- FoldChange(example, ident.1 = First_cluster_number, ident.2 = Second_cluster_number)
head(fc)
## avg_log2FC pct.1 pct.2
## chr1-565107-565550 1.3801269 0.012 0.004
## chr1-569174-569639 1.7622203 0.032 0.008
## chr1-713460-714823 0.6846989 0.421 0.207
## chr1-752422-753038 -3.7546046 0.010 0.096
## chr1-762106-763359 0.4754834 0.290 0.165
## chr1-779589-780271 4.2817365 0.051 0.002
Analysis of closest gene to each of these peaks.
open_1 <- rownames(da_peaks[da_peaks$avg_log2FC > 3, ])
open_2 <- rownames(da_peaks[da_peaks$avg_log2FC < -3, ])
closest_genes_1 <- ClosestFeature(example, regions = open_1)
closest_genes_2 <- ClosestFeature(example, regions = open_2)
head(closest_genes_1)
## tx_id gene_name gene_id gene_biotype type
## ENST00000443726 ENST00000443726 BCL11B ENSG00000127152 protein_coding cds
## ENST00000357195 ENST00000357195 BCL11B ENSG00000127152 protein_coding cds
## ENSE00002456092 ENST00000463701 PRSS1 ENSG00000204983 protein_coding exon
## ENST00000583593 ENST00000583593 CCDC57 ENSG00000176155 protein_coding cds
## ENST00000557595 ENST00000557595 AE000662.92 ENSG00000259003 protein_coding cds
## ENST00000455990 ENST00000455990 HOOK1 ENSG00000134709 protein_coding cds
## closest_region query_region distance
## ENST00000443726 chr14-99737498-99737555 chr14-99721608-99741934 0
## ENST00000357195 chr14-99697682-99697894 chr14-99695477-99720910 0
## ENSE00002456092 chr7-142460719-142460923 chr7-142501666-142511108 40742
## ENST00000583593 chr17-80085568-80085694 chr17-80084198-80086094 0
## ENST00000557595 chr14-23027629-23027664 chr14-23012859-23028219 0
## ENST00000455990 chr1-60280790-60280852 chr1-60279767-60281364 0
head(closest_genes_2)
## tx_id gene_name gene_id gene_biotype
## ENST00000432018 ENST00000432018 IL1B ENSG00000125538 protein_coding
## ENSE00001638912 ENST00000455005 RP5-1120P11.3 ENSG00000231881 lincRNA
## ENST00000568649 ENST00000568649 PPCDC ENSG00000138621 protein_coding
## ENST00000409245 ENST00000409245 TTC7A ENSG00000068724 protein_coding
## ENST00000445003 ENST00000445003 RP11-290F20.3 ENSG00000224397 lincRNA
## ENST00000560869 ENST00000560869 TNFAIP2 ENSG00000185215 protein_coding
## type closest_region query_region
## ENST00000432018 cds chr2-113593760-113593806 chr2-113581628-113594911
## ENSE00001638912 exon chr6-44041650-44042535 chr6-44025105-44028184
## ENST00000568649 cds chr15-75335782-75335877 chr15-75334903-75336779
## ENST00000409245 cds chr2-47300841-47301062 chr2-47297968-47301173
## ENST00000445003 gap chr20-48884201-48894027 chr20-48889794-48893313
## ENST00000560869 utr chr14-103589798-103590288 chr14-103587259-103591113
## distance
## ENST00000432018 0
## ENSE00001638912 13465
## ENST00000568649 0
## ENST00000409245 0
## ENST00000445003 0
## ENST00000560869 0
We can plot the frequency of Tn5 integration across regions of the genome for cells grouped by cluster, cell type, or any other metadata stored in the object for any genomic region using the CoveragePlot() function.
# set plotting order
# levels(example) <- c("CD4 Naive","CD4 Memory","CD8 Naive","CD8 Effector","DN T","NK CD56bright",
# "NK CD56Dim","pre-B",'pro-B',"pDC","DC","CD14 Mono","CD16 Mono")
options(repr.plot.width = 12, repr.plot.height = 12)
p <- CoveragePlot(
object = example,
region = rownames(da_peaks)[1],
extend.upstream = 40000,
extend.downstream = 20000
)
print(p)
ggsave(p, filename = paste0(outputpath, "results/figures/Genomic_regions_visualization.png"), width = 12, height = 12)
Write .rds-formatted file. You can get the results (a html file, a rds file and figure files) via your taskID.
saveRDS(example, file=paste0(outputpath, "results/", task_id, "_result.rds"))