This function annotates cells in a Seurat object using marker gene frequencies and a distance matrix. It calculates average distances between cells and cell types, confidence scores, and mixing proportions. Optionally, it can add the annotations and related metrics to the Seurat object metadata.

CAESAR.annotation(
  seu,
  marker.freq,
  reduction.name = "caesar",
  assay.dist = "distce",
  gene.use = NULL,
  cal.confidence = TRUE,
  cal.proportions = TRUE,
  parallel = TRUE,
  ncores = 10,
  n_fake = 1001,
  seed = 1,
  threshold = 0.95,
  unassign = "unassigned",
  add.to.meta = FALSE
)

Arguments

seu

A Seurat object containing cell expression data.

marker.freq

A matrix where rows represent cell types and columns represent marker genes. The values in the matrix represent the frequency or weight of each marker gene for each cell type. Generally, it is a list of the output of function markerList2mat.

reduction.name

A character string specifying the name of the dimensional reduction to use when calculating distances. Default is "caesar".

assay.dist

A character string specifying the name of the assay to store the distance matrix. If not present in the Seurat object, the function will calculate the distances using ProFAST::pdistance. Default is "distce".

gene.use

A character vector specifying which genes to use for the annotation. If NULL, all genes in the distance matrix will be used. Default is NULL.

cal.confidence

Logical, indicating whether to calculate the confidence of the predictions. Default is TRUE.

cal.proportions

Logical, indicating whether to calculate the mixing proportions of cell types for each cell. Default is TRUE.

parallel

Logical, indicating whether to run the confidence calculation in parallel. Default is TRUE.

ncores

The number of cores to use for parallel computation. Default is 10.

n_fake

The number of fake (randomized) distance matrices to simulate for confidence calculation. Default is 1001.

seed

The random seed for reproducibility. Default is 1.

threshold

A numeric value specifying the confidence threshold below which a cell is labeled as unassigned. Default is 0.95.

unassign

A character string representing the label to assign to cells below the confidence threshold. Default is "unassigned".

add.to.meta

Logical, indicating whether to return the annotation results directly or add them to the Seurat object metadata. If TRUE, the function will return the results directly. Default is FALSE.

Value

If add.to.meta = FALSE, the Seurat object with the added metadata for predicted cell types (CAESAR), predictions with unassigned (CAESARunasg), confidence scores (CAESARconf), average distances, and mixing proportions. If add.to.meta = TRUE, a list containing the above annotation results is returned.

See also

marker.select for select markers. find.sig.genes for signature gene list. markerList2mat for marker frequency matrix. pdistance for obtain cell-gene distance matrix using co-embedding. annotation_mat for annotation procedure.

Examples

data(toydata)

seu <- toydata$seu
markers <- toydata$markers

marker.freq <- markerList2mat(list(markers))
anno_res <- CAESAR.annotation(seu, marker.freq, cal.confidence = FALSE, cal.proportions = FALSE)
#> Calculate co-embedding distance...
str(anno_res)
#> List of 5
#>  $ ave.dist               : num [1:3000, 1:8] 28.4 24.9 27 25 34.2 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:3000] "24387" "4049" "11570" "25172" ...
#>   .. ..$ : chr [1:8] "CAFs" "Cancer Epithelial" "Endothelial" "Myeloid" ...
#>  $ confidence             : NULL
#>  $ pred                   : chr [1:3000] "Cancer Epithelial" "Cancer Epithelial" "Cancer Epithelial" "Cancer Epithelial" ...
#>  $ pred_unassign          : NULL
#>  $ cell_mixing_proportions: NULL