Overview

Human-scATAC-Corpus (v1.0.0) is a comprehensive and large-scale database designed to advance research in single-cell epigenomics by providing an unprecedented resource of human scATAC-seq data. Currently comprising over 5.4 million cells—more than three times the size of any existing counterpart—the database aggregates and harmonizes data curated from 35 datasets and 37 tissues or cell lines, drawn from a thorough manual review of over 200 published studies. Stringent quality control and standardization protocols were applied to ensure high data integrity and usability for algorithm development and benchmarking.

Human-scATAC-Corpus features versatile data formats to accommodate diverse analytic needs. All datasets are uniformly represented in a cell-by-candidate cis-regulatory element (cCRE) matrix, facilitating cross-dataset analyses and matrix-based computational methods. The database also provides processed, standardized fragment files for fragment-level analyses and includes the original cell-by-peak matrices to support comparisons across feature definitions. This harmonization involved manual processing of more than 700 files to ensure consistency and ease of access.

A hallmark of Human-scATAC-Corpus is its rich metadata, enabling exploration of various biological scenarios, such as cell type annotation, batch effect correction, out-of-sample stimulation analysis, and CRISPR perturbation studies. Furthermore, the database is tightly integrated with EpiAgent, the first foundation model for single-cell epigenomics, offering users robust tools and tutorials for mapping new datasets onto the reference, supporting applications such as cancer cell tracing and developmental trajectory analysis.

Once online, Human-scATAC-Corpus will provide data browsing, search, download, and online analysis functionalities. We anticipate that Human-scATAC-Corpus will become a foundational resource for the single-cell epigenomics community, accelerating the development and benchmarking of novel analytic methods.

human body
Brain Tissue human_Brain
Cardiac Tissue human_Heart
Hepatic Tissue human_Liver
Pulmonary Tissue human_Lung
Renal Tissue human_Kidney
Intestinal Tissue human_Intestine
Gastric Tissue human_Stomach
Skin Tissue human_Skin
Muscle Tissue human_Muscle
Pancreatic Tissue human_Pancreas
Ocular Tissue human_Eye
PBMC human_Blood
Ovarian Tissue human_Ovary
Others human_Others

Number of cells for each file format

Number of cells for each tissue

Number of cells for each task