The example analysis is implemented by Signac.
The example data consist in ~9000 cells of human peripheral blood mononuclear cells (PBMCs) provided by 10x Genomics. The following files are used in this pipeline, all available through the 10x Genomics website: Raw data, Metadata, Fragments file.

1 Initialization

First import Signac, Seurat, GenomeInfoDb, and some other packages we will be using for analyzing PBMCs.

options(warn=-1)
library(Signac)
library(leiden)
library(Seurat)
library(GenomeInfoDb)
library(EnsDb.Hsapiens.v75)
library(ggplot2)
library(patchwork)
set.seed(1234)

2 Load the data

The first step of analysis is to load the cell-by-feature matrix and the fragment file as a Seurat object.
The matrix is analogous to the gene expression count matrix used to analyze single-cell RNA-seq. However, instead of genes, each row of the matrix represents a region of the genome (a peak/windows), that is predicted to represent a region of open chromatin. ScEpiTools will provide a task-specific ID after uploading the file.
The fragment file represents a full list of all unique fragments across all single cells. It is a substantially larger file, is slower to work with, and is stored on-disk (instead of in memory). However, the advantage of retaining this file is that it contains all fragments associated with each single cell, as opposed to only fragments that map to peaks.

Internal parameters:
task_id - The only identifier of each task.

task_id <- 'Example'
inputpath <- paste0("./", task_id, "/")
outputpath <- paste0("./", task_id, "/")

2.1 Load matrix and fragment

We start by creating a Seurat object. Signac uses a matrix and a fragment file as input.

User-defined parameters:
Matrix_file - The name of the matrix file you want to load. Default: ‘atac_v1_pbmc_10k_filtered_peak_bc_matrix.h5’.
Fragment_file - The name of the fragment file you want to load. Default: ‘atac_v1_pbmc_10k_fragments.tsv.gz’.
Metadata_file - The name of the metadata file you want to load. Default: ‘atac_v1_pbmc_10k_singlecell.csv’.
Reference_genome - ‘hg19’, ‘hg38’ or ‘mm10’ are the 3 currently recognised genomic version in this pipeline. Default: ‘hg19’.
Minimum_peaks - Minimum number of peaks required for a cell to pass filtering. Default: 200.
Minimum_cells - Minimum number of cells required for a peak to pass filtering. Default: 10.

Matrix_file <- "atac_v1_pbmc_10k_filtered_peak_bc_matrix.h5"
counts <- Read10X_h5(filename = paste0(inputpath, Matrix_file))

Metadata_file <- "atac_v1_pbmc_10k_singlecell.csv"
metadata <- read.csv(
  file = paste0(inputpath, Metadata_file),
  header = TRUE,
  row.names = 1
)

Fragment_file <- "atac_v1_pbmc_10k_fragments.tsv.gz"
Reference_genome <- 'hg19'
Minimum_peaks <- 200
Minimum_cells <- 10

chrom_assay <- CreateChromatinAssay(
  counts = counts,
  sep = c(":", "-"),
  genome = Reference_genome,
  fragments = paste0(inputpath, Fragment_file),
  min.cells = Minimum_cells,
  min.features = Minimum_peaks
)
## Computing hash

example <- CreateSeuratObject(
  counts = chrom_assay,
  assay = "peaks",
  meta.data = metadata
)

example

## An object of class Seurat 
## 87561 features across 8728 samples within 1 assay 
## Active assay: peaks (87561 features, 0 variable features)

We can see some of the additional information that can be contained in the ChromatinAssay, including motif information, gene annotations, and genome information.

example[['peaks']]

## ChromatinAssay data with 87561 features for 8728 cells
## Variable features: 0 
## Genome: hg19 
## Annotation present: FALSE 
## Motifs present: FALSE 
## Fragment files: 1

We can call granges on a Seurat object with a ChromatinAssay set as the active assay (or on a ChromatinAssay) to see the genomic ranges associated with each feature in the object.

granges(example)

## GRanges object with 87561 ranges and 0 metadata columns:
##           seqnames            ranges strand
##              <Rle>         <IRanges>  <Rle>
##       [1]     chr1     565107-565550      *
##       [2]     chr1     569174-569639      *
##       [3]     chr1     713460-714823      *
##       [4]     chr1     752422-753038      *
##       [5]     chr1     762106-763359      *
##       ...      ...               ...    ...
##   [87557]     chrY 58993392-58993760      *
##   [87558]     chrY 58994571-58994823      *
##   [87559]     chrY 58996352-58997331      *
##   [87560]     chrY 59001782-59002175      *
##   [87561]     chrY 59017143-59017246      *
##   -------
##   seqinfo: 24 sequences from an unspecified genome; no seqlengths

2.2 Add gene annotation

We can also add gene annotations to the example object for the human genome. This will allow downstream functions to pull the gene annotation information directly from the object.

# extract gene annotations from EnsDb
if (Reference_genome == 'mm10') {
    library(EnsDb.Mmusculus.v79)
    annotations <- GetGRangesFromEnsDb(ensdb = EnsDb.Mmusculus.v79)
} else if (Reference_genome == 'hg38') {
    library(EnsDb.Mmusculus.v86)
    annotations <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86)
} else {
    annotations <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v75)
}

# change to UCSC style since the data was mapped to hg19
seqlevelsStyle(annotations) <- 'UCSC'

# add the gene information to the object
Annotation(example) <- annotations

3 Quality control metrics

Next we can now compute some QC metrics for the scATAC-seq experiment.

User-defined parameters:
Minimum_fragments_in_peaks - Minimum number of fragments in peaks required for a cell to pass filtering. Default: 3000.
Maximum_fragments_in_peaks - Maximum number of fragments in peaks required for a cell to pass filtering. Default: 20000.
Minimum_fraction_in_peaks - Minimum fraction of fragments in peaks required for a cell to pass filtering. Default: 0.15.
Maximum_ratio_in_blacklist - Maximum ratio of reads in genomic blacklist regions required for a cell to pass filtering. Default: 0.05.
Maximum_nucleosome_signal - Maximum approximate ratio of mononucleosomal to nucleosome-free fragments required for a cell to pass filtering. Default: 0.04.
Minimum_TSS_enrichment - Minimum TSS enrichment scores required for a cell to pass filtering. Default: 2.

Nucleosome banding pattern: The histogram of DNA fragment sizes (determined from the paired-end sequencing reads) should exhibit a strong nucleosome banding pattern corresponding to the length of DNA wrapped around a single nucleosome. We calculate this per single cell, and quantify the approximate ratio of mononucleosomal to nucleosome-free fragments (stored as nucleosome_signal)

# compute nucleosome signal score per cell
example <- NucleosomeSignal(object = example)

Transcriptional start site (TSS) enrichment score. The ENCODE project has defined an ATAC-seq targeting score based on the ratio of fragments centered at the TSS to fragments in TSS-flanking regions (see https://www.encodeproject.org/data-standards/terms/). Poor ATAC-seq experiments typically will have a low TSS enrichment score. We can compute this metric for each cell with the TSSEnrichment() function, and the results are stored in metadata under the column name TSS.enrichment.

# compute TSS enrichment score per cell
example <- TSSEnrichment(object = example, fast = FALSE)
## Extracting TSS positions
## Finding + strand cut sites
## Finding - strand cut sites
## Computing mean insertion frequency in flanking regions
## Normalizing TSS score

Note that the following three metrics can be obtained from the output of CellRanger (which is stored in the object metadata), but can also be calculated for non-10x datasets using Signac.
Total number of fragments in peaks: A measure of cellular sequencing depth / complexity. Cells with very few reads may need to be excluded due to low sequencing depth. Cells with extremely high levels may represent doublets, nuclei clumps, or other artefacts.
Fraction of fragments in peaks: Represents the fraction of all fragments that fall within ATAC-seq peaks. Cells with low values (i.e. <15-20%) often represent low-quality cells or technical artifacts that should be removed. Note that this value can be sensitive to the set of peaks used.
Ratio reads in genomic blacklist regions. The ENCODE project has provided a list of blacklist regions, representing reads which are often associated with artefactual signal. Cells with a high proportion of reads mapping to these areas (compared to reads mapping to peaks) often represent technical artifacts and should be removed.

# add blacklist ratio and fraction of reads in peaks
example$pct_reads_in_peaks <- example$peak_region_fragments / example$passed_filters * 100
example$blacklist_ratio <- example$blacklist_region_fragments / example$peak_region_fragments

We can inspect the TSS enrichment scores by grouping the cells based on the score and plotting the accessibility signal over all TSS sites. Setting the fast=TRUE option in TSSEnrichment() will only compute the TSS enrichment score without storing the entire cell by position matrix of Tn5 insertion frequency for each cell, and can save memory. However, setting fast=TRUE will not allow downstream plotting of the TSS enrichment signal for different groups of cells using the TSSPlot() function, shown here:

options(repr.plot.width = 12, repr.plot.height = 6)
example$high.tss <- ifelse(example$TSS.enrichment > 2, 'High', 'Low')
p <- TSSPlot(example, group.by = 'high.tss') + NoLegend()
print(p)

plot of chunk unnamed-chunk-14

ggsave(p, filename = paste0(outputpath, "results/figures/TSS_enrichment.png"), width = 12, height = 6)

We can also look at the fragment length periodicity for all the cells, and group by cells with high or low nucleosomal signal strength. You can see that cells that are outliers for the mononucleosomal / nucleosome-free ratio (based on the plots above) have different nucleosomal banding patterns. The remaining cells exhibit a pattern that is typical for a successful ATAC-seq experiment.

example$nucleosome_group <- ifelse(example$nucleosome_signal > 4, 'NS > 4', 'NS < 4')
p <- FragmentHistogram(object = example, group.by = 'nucleosome_group')
print(p)

plot of chunk unnamed-chunk-15

ggsave(p, filename = paste0(outputpath, "results/figures/Fragment_Histogram.png"), width = 12, height = 6)

options(repr.plot.width = 15, repr.plot.height = 6)
p <- VlnPlot(
  object = example,
  features = c('pct_reads_in_peaks', 'peak_region_fragments',
               'TSS.enrichment', 'blacklist_ratio', 'nucleosome_signal'),
  pt.size = 0.5,
  ncol = 5
)
print(p)

plot of chunk unnamed-chunk-16

ggsave(p, filename = paste0(outputpath, "results/figures/ViolinPlot.png"), width = 15, height = 6)

Finally we remove cells that are outliers for these QC metrics.

Minimum_fragments_in_peaks <- 3000
Maximum_fragments_in_peaks <- 20000
Minimum_fraction_in_peaks <- 0.15
Maximum_ratio_in_blacklist <- 0.05
Maximum_nucleosome_signal <- 0.04
Minimum_TSS_enrichment <- 2

example <- subset(
  x = example,
  subset = peak_region_fragments > Minimum_fragments_in_peaks &
    peak_region_fragments < Maximum_fragments_in_peaks &
    pct_reads_in_peaks > Minimum_fraction_in_peaks*100 &
    blacklist_ratio < Maximum_ratio_in_blacklist &
    nucleosome_signal < Maximum_nucleosome_signal*100 &
    TSS.enrichment > Minimum_TSS_enrichment
)
example

## An object of class Seurat 
## 87561 features across 7060 samples within 1 assay 
## Active assay: peaks (87561 features, 0 variable features)

4 Dimensionality reduction and clustering

Next we perform the dimensionality reduction and clustering after normalization and feature selection.

4.1 Normalization

Signac performs term frequency-inverse document frequency (TF-IDF) normalization. This is a two-step normalization procedure, that both normalizes across cells to correct for differences in cellular sequencing depth, and across peaks to give higher values to more rare peaks.

example <- RunTFIDF(example)
## Performing TF-IDF normalization

4.2 Feature selection

The low dynamic range of scATAC-seq data makes it challenging to perform variable feature selection. Instead, we can choose to use only the top n% of features (peaks) for dimensional reduction, or remove features present in less than n cells with the FindTopFeatures() function.
Here, we will all features, though we note that we see very similar results when using only a subset of features (try setting min.cutoff to ‘q75’ to use the top 25% all peaks), with faster runtimes. Features used for dimensional reduction are automatically set as VariableFeatures() for the Seurat object by this function.

example <- FindTopFeatures(example, min.cutoff = 'q0')

4.3 Latent semantic indexing

We next run singular value decomposition (SVD) on the TD-IDF matrix, using the features (peaks) selected above. The combined steps of TF-IDF followed by SVD are known as latent semantic indexing (LSI).

example <- RunSVD(example)
## Running SVD
## Scaling cell embeddings

The first LSI component often captures sequencing depth (technical variation) rather than biological variation. If this is the case, the component should be removed from downstream analysis. We can assess the correlation between each LSI component and sequencing depth using the DepthCor() function.
Here we see there is a very strong correlation between the first LSI component and the total number of counts for the cell, so we will perform downstream steps without this component.

options(repr.plot.width = 12, repr.plot.height = 6)
p <- DepthCor(example)
print(p)

plot of chunk unnamed-chunk-21

ggsave(p, filename = paste0(outputpath, "results/figures/Correlation_between_depth_and_reduced_dimension_components.png"), width = 12, height = 6)

4.4 Non-linear dimensionality reduction

Now that the cells are embedded in a low-dimensional space, we can perform graph-based clustering and non-linear dimension reduction for visualization. The functions RunUMAP(), FindNeighbors(), and FindClusters() all come from the Seurat package.

User-defined parameters:
Select_dimensions - Which dimensions to use as input features. Default: 30.
Visualization_method - The visualization method for displaying dimensionality reduction and clustering results, scEpiTools provides two choices, ‘umap’ and ‘tsne’. Default: ‘umap’.
Clustering_algorithm - Clustering algorithm for modularity optimization (1 = original Louvain algorithm; 2 = Louvain algorithm with multilevel refinement; 3 = SLM algorithm; 4 = Leiden algorithm). Default: 3.

Select_dimensions <- 30
Visualization_method <- c('umap')
for (Visualization_method_ in Visualization_method) {
    if (Visualization_method_ == 'tsne') {
        example <- RunTSNE(object = example, reduction = 'lsi', dims = 2:Select_dimensions)
    } else {
        example <- RunUMAP(object = example, reduction = 'lsi', dims = 2:Select_dimensions)
    }
}
## 22:10:57 UMAP embedding parameters a = 0.9922 b = 1.112
## 22:10:57 Read 7060 rows and found 29 numeric columns
## 22:10:57 Using Annoy for neighbor search, n_neighbors = 30
## 22:10:57 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 22:10:57 Writing NN index file to temp file /tmp/Rtmp9iuYEz/file57e515f77f1a
## 22:10:57 Searching Annoy index using 1 thread, search_k = 3000
## 22:10:59 Annoy recall = 100%
## 22:11:00 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 22:11:02 Initializing from normalized Laplacian + noise (using irlba)
## 22:11:02 Commencing optimization for 500 epochs, with 274384 positive edges
## 22:11:10 Optimization finished
example <- FindNeighbors(object = example, reduction = 'lsi', dims = 2:Select_dimensions)
## Computing nearest neighbor graph
## Computing SNN

Clustering_algorithm <- 3
example <- FindClusters(object = example, verbose = FALSE, algorithm = Clustering_algorithm)

for (Visualization_method_ in Visualization_method) {
    options(repr.plot.width = 8, repr.plot.height = 6)
    p <- DimPlot(object = example, label = TRUE, reduction = Visualization_method_) + NoLegend()
    print(p)
    ggsave(p, filename = paste0(outputpath, "results/figures/", Visualization_method_, ".png"), width = 8, height = 6)
}

plot of chunk unnamed-chunk-24

5 Create gene activity matrix

To create a gene activity matrix, we extract gene coordinates and extend them to include the 2 kb upstream region. We then count the number of fragments for each cell that map to each of these regions, using the using the FeatureMatrix() function. These steps are automatically performed by the GeneActivity() function.

User-defined parameters:
Marker_genes - Marker genes to help interpret our ATAC-seq clusters. It is a string separated by ‘,’. Default: NULL.

gene.activities <- GeneActivity(example)
## Extracting gene coordinates
## Extracting reads overlapping genomic regions

Add the gene activity matrix to the Seurat object as a new assay and normalize it

example[['RNA']] <- CreateAssayObject(counts = gene.activities)
example <- NormalizeData(
  object = example,
  assay = 'RNA',
  normalization.method = 'LogNormalize',
  scale.factor = median(example$nCount_RNA)
)

Now we can visualize the activities of canonical marker genes to help interpret our ATAC-seq clusters.
In this example, we can begin to discern populations of monocytes, B, T, and NK cells based on these gene activity profiles.

DefaultAssay(example) <- 'RNA'

Marker_genes <- 'MS4A1,CD3D,LEF1,NKG7,TREM1,LYZ'
if (Marker_genes != "NULL"){
    Marker_genes <- strsplit(Marker_genes, ",")[[1]]
    for (Visualization_method_ in Visualization_method) {
        options(repr.plot.width = 15, repr.plot.height = 8)
        p <- FeaturePlot(
          object = example,
          reduction = Visualization_method_,
          features = Marker_genes,
          pt.size = 0.1,
          max.cutoff = 'q95',
          ncol = 3
        )
        print(p)
        ggsave(p, filename = paste0(outputpath, "results/figures/Marker_gene_activity_", Visualization_method_, ".png"), width = 15, height = 8)
    }
} else {print("There is no marker genes for this step!")}

plot of chunk unnamed-chunk-27

6 Integrating with scRNA-seq data

To help interpret the scATAC-seq data, we can classify cells based on an scRNA-seq experiment from the same biological system (human PBMC).

User-defined parameters:
RNA_file - The name of the pre-processed scRNA-seq data you want to load. Default: NULL.

RNA_file <- paste0(inputpath, 'pbmc_10k_v3.rds')
if (file.exists(RNA_file)){
    example_rna <- readRDS(RNA_file)
} else {print("There is no scRNA-seq data for this step!")}

if (file.exists(RNA_file)) {
    transfer.anchors <- FindTransferAnchors(
        reference = example_rna,
        query = example,
        reduction = 'cca'
)}
## Running CCA
## Merging objects
## Finding neighborhoods
## Finding anchors
## 	Found 17340 anchors
## Filtering anchors
## 	Retained 4050 anchors

if (file.exists(RNA_file)) {    
    predicted.labels <- TransferData(
      anchorset = transfer.anchors,
      refdata = example_rna$celltype,
      weight.reduction = example[['lsi']],
      dims = 2:30
    )}
## Finding integration vectors
## Finding integration vector weights
## Predicting cell labels

if (file.exists(RNA_file)) {
    example <- AddMetaData(object = example, metadata = predicted.labels)
    
    for (Visualization_method_ in Visualization_method) {
        options(repr.plot.width = 12, repr.plot.height = 6)
        plot1 <- DimPlot(
          object = example_rna,
          reduction = Visualization_method_,
          group.by = 'celltype',
          label = TRUE,
          repel = TRUE) + NoLegend() + ggtitle('scRNA-seq')

        plot2 <- DimPlot(
          object = example,
          reduction = Visualization_method_,
          group.by = 'predicted.id',
          label = TRUE,
          repel = TRUE) + NoLegend() + ggtitle('scATAC-seq')

        p <- plot1 + plot2
        print(p)
        ggsave(p, filename = paste0(outputpath, "results/figures/Integration_scRNA-seq_", Visualization_method_, ".png"), width = 12, height = 6)
    }
}

plot of chunk unnamed-chunk-31

Annotate the scATAC-seq-derived clusters from scRNA-seq.

# example <- subset(example, idents = 14, invert = TRUE)
# example <- RenameIdents(
#   object = example,
#   '0' = 'CD14 Mono',
#   '1' = 'CD4 Memory',
#   '2' = 'CD8 Effector',
#   '3' = 'CD4 Naive',
#   '4' = 'CD14 Mono',
#   '5' = 'DN T',
#   '6' = 'CD8 Naive',
#   '7' = 'NK CD56Dim',
#   '8' = 'pre-B',
#   '9' = 'CD16 Mono',
#   '10' = 'pro-B',
#   '11' = 'DC',
#   '12' = 'NK CD56bright',
#   '13' = 'pDC'
# )

7 Differential feature analysis

To find differentially accessible regions between clusters of cells, we can perform a differential accessibility (DA) test.

User-defined parameters:
First_cluster_number - The first cluster number for comparison in the differential feature analysis. Default: 1.
Second_cluster_number - The second cluster number for comparison in the differential feature analysis. Default: 2.

First_cluster_number <- 3
Second_cluster_number <- 0

DefaultAssay(example) <- 'peaks'

da_peaks <- FindMarkers(
  object = example,
  ident.1 = First_cluster_number,
  ident.2 = Second_cluster_number,
  test.use = 'LR',
  latent.vars = 'peak_region_fragments'
)

head(da_peaks)

##                                  p_val avg_log2FC pct.1 pct.2     p_val_adj
## chr14-99721608-99741934  3.296097e-288   5.924366 0.879 0.017 2.886096e-283
## chr14-99695477-99720910  5.598969e-219   5.105227 0.825 0.021 4.902514e-214
## chr7-142501666-142511108 4.728614e-204   4.680409 0.767 0.032 4.140422e-199
## chr17-80084198-80086094  1.865399e-200   6.607317 0.695 0.006 1.633362e-195
## chr2-113581628-113594911 5.454551e-198  -4.698955 0.055 0.701 4.776059e-193
## chr6-44025105-44028184   3.963592e-192  -4.145363 0.063 0.650 3.470561e-187

for (Visualization_method_ in Visualization_method) {
    options(repr.plot.width = 12, repr.plot.height = 6)
    plot1 <- VlnPlot(
      object = example,
      features = rownames(da_peaks)[1],
      pt.size = 0.1,
      idents = c(First_cluster_number,Second_cluster_number)
    )

    plot2 <- FeaturePlot(
      object = example,
      reduction = Visualization_method_,
      features = rownames(da_peaks)[1],
      pt.size = 0.1
    )

    p <- plot1 | plot2
    print(p)
    ggsave(p, filename = paste0(outputpath, "results/figures/Differential_feature_analysis_", Visualization_method_, ".png"), width = 12, height = 6)
}

plot of chunk unnamed-chunk-34

Look at the fold change accessibility between two groups of cells to find DA regions.

fc <- FoldChange(example, ident.1 = First_cluster_number, ident.2 = Second_cluster_number)
head(fc)

##                    avg_log2FC pct.1 pct.2
## chr1-565107-565550  1.3801269 0.012 0.004
## chr1-569174-569639  1.7622203 0.032 0.008
## chr1-713460-714823  0.6846989 0.421 0.207
## chr1-752422-753038 -3.7546046 0.010 0.096
## chr1-762106-763359  0.4754834 0.290 0.165
## chr1-779589-780271  4.2817365 0.051 0.002

Analysis of closest gene to each of these peaks.

open_1 <- rownames(da_peaks[da_peaks$avg_log2FC > 3, ])
open_2 <- rownames(da_peaks[da_peaks$avg_log2FC < -3, ])

closest_genes_1 <- ClosestFeature(example, regions = open_1)
closest_genes_2 <- ClosestFeature(example, regions = open_2)

head(closest_genes_1)

##                           tx_id   gene_name         gene_id   gene_biotype type
## ENST00000443726 ENST00000443726      BCL11B ENSG00000127152 protein_coding  cds
## ENST00000357195 ENST00000357195      BCL11B ENSG00000127152 protein_coding  cds
## ENSE00002456092 ENST00000463701       PRSS1 ENSG00000204983 protein_coding exon
## ENST00000583593 ENST00000583593      CCDC57 ENSG00000176155 protein_coding  cds
## ENST00000557595 ENST00000557595 AE000662.92 ENSG00000259003 protein_coding  cds
## ENST00000455990 ENST00000455990       HOOK1 ENSG00000134709 protein_coding  cds
##                           closest_region             query_region distance
## ENST00000443726  chr14-99737498-99737555  chr14-99721608-99741934        0
## ENST00000357195  chr14-99697682-99697894  chr14-99695477-99720910        0
## ENSE00002456092 chr7-142460719-142460923 chr7-142501666-142511108    40742
## ENST00000583593  chr17-80085568-80085694  chr17-80084198-80086094        0
## ENST00000557595  chr14-23027629-23027664  chr14-23012859-23028219        0
## ENST00000455990   chr1-60280790-60280852   chr1-60279767-60281364        0

head(closest_genes_2)

##                           tx_id     gene_name         gene_id   gene_biotype
## ENST00000432018 ENST00000432018          IL1B ENSG00000125538 protein_coding
## ENSE00001638912 ENST00000455005 RP5-1120P11.3 ENSG00000231881        lincRNA
## ENST00000568649 ENST00000568649         PPCDC ENSG00000138621 protein_coding
## ENST00000409245 ENST00000409245         TTC7A ENSG00000068724 protein_coding
## ENST00000445003 ENST00000445003 RP11-290F20.3 ENSG00000224397        lincRNA
## ENST00000560869 ENST00000560869       TNFAIP2 ENSG00000185215 protein_coding
##                 type            closest_region              query_region
## ENST00000432018  cds  chr2-113593760-113593806  chr2-113581628-113594911
## ENSE00001638912 exon    chr6-44041650-44042535    chr6-44025105-44028184
## ENST00000568649  cds   chr15-75335782-75335877   chr15-75334903-75336779
## ENST00000409245  cds    chr2-47300841-47301062    chr2-47297968-47301173
## ENST00000445003  gap   chr20-48884201-48894027   chr20-48889794-48893313
## ENST00000560869  utr chr14-103589798-103590288 chr14-103587259-103591113
##                 distance
## ENST00000432018        0
## ENSE00001638912    13465
## ENST00000568649        0
## ENST00000409245        0
## ENST00000445003        0
## ENST00000560869        0

8 Genomic regions visualization

We can plot the frequency of Tn5 integration across regions of the genome for cells grouped by cluster, cell type, or any other metadata stored in the object for any genomic region using the CoveragePlot() function.

# set plotting order
# levels(example) <- c("CD4 Naive","CD4 Memory","CD8 Naive","CD8 Effector","DN T","NK CD56bright",
#                      "NK CD56Dim","pre-B",'pro-B',"pDC","DC","CD14 Mono","CD16 Mono")

options(repr.plot.width = 12, repr.plot.height = 12)
p <- CoveragePlot(
  object = example,
  region = rownames(da_peaks)[1],
  extend.upstream = 40000,
  extend.downstream = 20000
)
print(p)

plot of chunk unnamed-chunk-39

ggsave(p, filename = paste0(outputpath, "results/figures/Genomic_regions_visualization.png"), width = 12, height = 12)

9 Save data

Write .rds-formatted file. You can get the results (a html file, a rds file and figure files) via your taskID.

saveRDS(example, file=paste0(outputpath, "results/", task_id, "_result.rds"))

ScEpiTools tutorial