This function integrates signature genes across different datasets by identifying common genes that meet specific criteria. It filters out mitochondrial and ribosomal genes, allows for the exclusion of genes based on expression proportion, and supports weighting gene selection by cell type ratios.

Intsg(
  sg_List,
  ntop,
  ct_ratio = NULL,
  expr.prop.cutoff = 0.1,
  species = "hm",
  rm_mito_ribo = TRUE,
  ratio_lower_bound = 0
)

Arguments

sg_List

A list of signature gene lists for different datasets. Each element in the list should be a named list where the names correspond to cell types, and each cell type contains a data frame with gene information.

ntop

An integer specifying the maximum number of top genes to retain for each cell type.

ct_ratio

A list of numeric vectors specifying the ratio of cells for each cell type in the datasets. If NULL, no weighting is applied. Default is NULL.

expr.prop.cutoff

A numeric value specifying the minimum expression proportion required for a gene to be considered. Default is 0.1.

species

A character string specifying the species, either "hm" (human) or "ms" (mouse). Default is "hm".

rm_mito_ribo

Logical, indicating whether to remove mitochondrial and ribosomal genes from the signature gene list. Default is TRUE.

ratio_lower_bound

A numeric value specifying the lower bound for the cell type ratio. Only cell types with a ratio above this bound are considered. Default is 0.0.

Value

A named list where each element corresponds to a cell type and contains the integrated list of top signature genes.

Examples

data(toydata)

seu <- toydata$seu

seu <- ProFAST::pdistance(seu, reduction = "caesar")
#> Calculate co-embedding distance...
sglist <- find.sig.genes(seu = seu)
top2sgs <- Intsg(list(sglist), ntop = 2)
print(top2sgs)
#> $CAFs
#> [1] "FBLN1"  "CCDC80"
#> 
#> $`Cancer Epithelial`
#> [1] "LYPD3" "MLPH" 
#> 
#> $Endothelial
#> [1] "HOXD9" "EGFL7"
#> 
#> $Myeloid
#> [1] "HAVCR2" "IGSF6" 
#> 
#> $`Normal Epithelial`
#> [1] "KRT5"  "KRT6B"
#> 
#> $PVL
#> [1] "CAV1"   "AVPR1A"
#> 
#> $Plasmablasts
#> [1] "CD79A" "DERL3"
#> 
#> $`T-cells`
#> [1] "CD3G" "CD3D"
#> 

top2intsgs <- Intsg(list(sglist, sglist), ntop = 2)