This function annotates cells based on a cell-gene distance matrix and marker gene frequencies. It computes the average distances, optionally calculates confidence levels for the predictions, and computes cell mixing proportions.

annotation_mat(
  distce,
  marker.freq,
  gene.use = NULL,
  cal.confidence = TRUE,
  cal.proportions = TRUE,
  parallel = TRUE,
  ncores = 10,
  n_fake = 1001,
  seed = 1,
  threshold = 0.95,
  unassign = "unassigned"
)

Arguments

distce

A matrix of distances between spots and genes. Rows represent genes, and columns represent cells. Generally, it is a list of the output of function ProFAST::pdistance with CAESAR co-embedding as input.

marker.freq

A matrix where rows represent cell types, and columns represent marker genes. The values in the matrix represent the frequency or weight of each marker gene for each cell type. Generally, it is a list of the output of function markerList2mat.

gene.use

A character vector specifying which genes to use for the annotation. If `NULL`, all genes in `distce` will be used. Default is `NULL`.

cal.confidence

Logical, indicating whether to calculate the confidence of the predictions. Default is `TRUE`.

cal.proportions

Logical, indicating whether to calculate the mixing proportions of cell types for each spot. Default is `TRUE`.

parallel

Logical, indicating whether to run the confidence calculation in parallel. Default is `TRUE`.

ncores

The number of cores to use for parallel computation. Default is 10.

n_fake

The number of fake (randomized) distance matrices to simulate for confidence calculation. Default is 1001.

seed

The random seed for reproducibility. Default is 1.

threshold

A numeric value specifying the confidence threshold below which a cell is labeled as `unassigned`. Default is 0.95.

unassign

A character string representing the label to assign to cells below the confidence threshold. Default is `"unassigned"`.

Value

A list with the following components:

ave.dist

A matrix of average distances between each cell and each cell type.

confidence

A numeric vector of confidence values for each cell (if `cal.confidence = TRUE`).

pred

A character vector of predicted cell types for each cell.

pred_unassign

A character vector of predicted cell types with cells below the confidence threshold labeled as `unassigned` (if `cal.confidence = TRUE`).

cell_mixing_proportions

A matrix of mixing proportions for each spot across the different cell types (if `cal.proportions = TRUE`).

See also

marker.select for select markers. find.sig.genes for signature gene list. markerList2mat for marker frequency matrix. pdistance for obtain cell-gene distance matrix using co-embedding.

Examples

data(toydata)

seu <- toydata$seu
markers <- toydata$markers

seu <- ProFAST::pdistance(seu, reduction = "caesar")
#> Calculate co-embedding distance...
distce <- Seurat::GetAssayData(object = seu, slot = "data", assay = "distce")

marker.freq <- markerList2mat(list(markers))

anno_res <- annotation_mat(distce, marker.freq, cal.confidence = FALSE, cal.proportions = FALSE)
str(anno_res)
#> List of 5
#>  $ ave.dist               : num [1:3000, 1:8] 28.4 24.9 27 25 34.2 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:3000] "24387" "4049" "11570" "25172" ...
#>   .. ..$ : chr [1:8] "CAFs" "Cancer Epithelial" "Endothelial" "Myeloid" ...
#>  $ confidence             : NULL
#>  $ pred                   : chr [1:3000] "Cancer Epithelial" "Cancer Epithelial" "Cancer Epithelial" "Cancer Epithelial" ...
#>  $ pred_unassign          : NULL
#>  $ cell_mixing_proportions: NULL