R/caesar_annotation.R
CAESAR.annotation.Rd
This function annotates cells in a Seurat object using marker gene frequencies and a distance matrix. It calculates average distances between cells and cell types, confidence scores, and mixing proportions. Optionally, it can add the annotations and related metrics to the Seurat object metadata.
CAESAR.annotation(
seu,
marker.freq,
reduction.name = "caesar",
assay.dist = "distce",
gene.use = NULL,
cal.confidence = TRUE,
cal.proportions = TRUE,
parallel = TRUE,
ncores = 10,
n_fake = 1001,
seed = 1,
threshold = 0.95,
unassign = "unassigned",
add.to.meta = FALSE
)
A Seurat object containing cell expression data.
A matrix where rows represent cell types and columns represent marker genes. The values in the matrix represent the frequency or weight of each marker gene for each cell type. Generally, it is a list of the output of function markerList2mat
.
A character string specifying the name of the dimensional reduction to use when calculating distances. Default is "caesar".
A character string specifying the name of the assay to store the distance matrix. If not present in the Seurat object, the function will calculate the distances using ProFAST::pdistance
. Default is "distce".
A character vector specifying which genes to use for the annotation. If NULL
, all genes in the distance matrix will be used. Default is NULL
.
Logical, indicating whether to calculate the confidence of the predictions. Default is TRUE
.
Logical, indicating whether to calculate the mixing proportions of cell types for each cell. Default is TRUE
.
Logical, indicating whether to run the confidence calculation in parallel. Default is TRUE
.
The number of cores to use for parallel computation. Default is 10.
The number of fake (randomized) distance matrices to simulate for confidence calculation. Default is 1001.
The random seed for reproducibility. Default is 1.
A numeric value specifying the confidence threshold below which a cell is labeled as unassigned
. Default is 0.95.
A character string representing the label to assign to cells below the confidence threshold. Default is "unassigned".
Logical, indicating whether to return the annotation results directly or add them to the Seurat object metadata. If TRUE
, the function will return the results directly. Default is FALSE
.
If add.to.meta = FALSE
, the Seurat object with the added metadata for predicted cell types (CAESAR
), predictions with unassigned (CAESARunasg
), confidence scores (CAESARconf
), average distances, and mixing proportions. If add.to.meta = TRUE
, a list containing the above annotation results is returned.
marker.select
for select markers.
find.sig.genes
for signature gene list.
markerList2mat
for marker frequency matrix.
pdistance
for obtain cell-gene distance matrix using co-embedding.
annotation_mat
for annotation procedure.
data(toydata)
seu <- toydata$seu
markers <- toydata$markers
marker.freq <- markerList2mat(list(markers))
anno_res <- CAESAR.annotation(seu, marker.freq, cal.confidence = FALSE, cal.proportions = FALSE)
#> Calculate co-embedding distance...
str(anno_res)
#> List of 5
#> $ ave.dist : num [1:3000, 1:8] 28.4 24.9 27 25 34.2 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:3000] "24387" "4049" "11570" "25172" ...
#> .. ..$ : chr [1:8] "CAFs" "Cancer Epithelial" "Endothelial" "Myeloid" ...
#> $ confidence : NULL
#> $ pred : chr [1:3000] "Cancer Epithelial" "Cancer Epithelial" "Cancer Epithelial" "Cancer Epithelial" ...
#> $ pred_unassign : NULL
#> $ cell_mixing_proportions: NULL