Hierarchical cell type annotation based on scANVI

Sheng Liu; Zhichao Miao; Yin Huang

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Preprint

Hierarchical cell type annotation based on scANVI

SL Sheng Liu email

ZM Zhichao Miao email

YH Yin Huang

Last updated date: Feb 24, 2026 Views: 16 Forks: 0

An abbreviated version of this protocol was published in Nat Med

A brain cell atlas integrating single-cell transcriptomes across human brain regions

Download PDF

Ask a question

How to cite

Favorite

Hierarchical Cell Type Annotation Based on scANVI

Overview

To provide a standardized method for annotating cell types at different resolutions, this protocol describes a hierarchical cell type annotation workflow based on scANVI. The method enables multi-resolution annotation of single-cell RNA-seq datasets by:

First identifying broad cell classes (primary cell types).
Then refining annotations to more specific cell types within each class.
Optionally extending to additional hierarchical levels.
Integrating datasets in a shared latent space for downstream analysis and visualization.

Materials and Software

Software Requirements:
- scvi-tools
- Python (≥3.8 recommended)
- Required Python libraries (e.g., NumPy, Pandas, scikit-learn, PyTorch)
Data:
- Raw gene expression counts from single-cell RNA sequencing studies.
- Reference datasets for machine learning (these should contain annotated cell types).
Computational Resources:
- A machine capable of handling model training and inference (preferably with a good GPU).

Steps

Step 1: Data Preparation

Gene Selection:
- Perform differential expression analysis across annotated cell types in the reference dataset to identify relevant genes for annotation.
- In this study, 1,841 representative genes were selected.
Data Splitting:
- Split your reference dataset into training and validation sets at a ratio of 5:1.
- Use the raw UMI count matrix of these selected genes as input.

Step 2: Model Training

Initial Model Training:
- Train an scVI model on the training data using 5 epochs.
Transfer Learning:
- Fine-tune the pre-trained scVI model using its parameters for training the scANVI model specifically for cell class annotation.
Hyperparameter Exploration:
- Explore various hyperparameters including:
  - Latent space dimension: between 10 and 100
  - Network layers: between 1 and 10
  - Different initializations (up to 10 different seeds).
- Select the best model based on validation performance.

Step 3: Hierarchical Annotation

Cell Class Models:
- Train a total of 31 cell class models for the annotation of specific cell types at the second level.
Third-Level Annotation (if necessary):
- Repeat the training process for additional specificity as required.

Step 4: Data Integration Using Latent Space

Integration of Datasets:
- Employ scANVI to infer the latent space for integration.
Visualize Data:
- Use UMAP (Uniform Manifold Approximation and Projection) to visualize integrated datasets in the latent space.

Step 5: Model Training Details

scANVI Model Settings:
- Configure the model to have two layers and a latent space dimension of 50.
- Use a negative binomial likelihood for gene expression modeling.
Early-Stopping Strategy:
- Implement an early-stopping mechanism based on evidence lower bound metrics:
  - Stop training if metrics do not improve for five epochs (set threshold at 0).
Learning Rate Adjustment:
- Apply a learning rate reduction when the loss function plateaus:
  - Patience: 8 epochs
  - Reduction factor: 0.1

Step 6: Semi-Supervised Training

Training on Whole Dataset:
- Conduct semi-supervised training with early stopping based on classification accuracy:
  - Set patience and threshold at 5 epochs and 0.001, respectively.
- Implement similar learning rate plateau adjustments as described before.

Conclusion

This protocol outlines a structured approach to annotate cell types hierarchically using scANVI, facilitating both class-level and specific cell-type annotation in single-cell RNA sequencing data.

How to cite：

Readers should cite both the Bio-protocol preprint and the original research article where this protocol was used:

Liu, S, Miao, Z and Huang, Y(2026).
Hierarchical cell type annotation based on scANVI
. Bio-protocol Preprint. bio-protocol.org/prep2907.
Chen, X., Huang, Y., Huang, L., Huang, Z., Hao, Z., Xu, L., Xu, N., Li, Z., Mou, Y., Ye, M., You, R., Zhang, X., Liu, S. and Miao, Z.(2024). A brain cell atlas integrating single-cell transcriptomes across human brain regions. Nat Med 30(9). DOI: 10.1038/s41591-024-03150-z

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

This protocol preprint was submitted via the "Request a Protocol" track.

Share your protocol with your peers.

Submit a Preprint Protocol