This function selects marker genes for each cluster or cell type based on expression proportion, with options to remove mitochondrial and ribosomal genes, limit the maximum number of top marker genes, and control the overlap between markers across clusters.

marker.select(
  ref_sig_list,
  expr.prop.cutoff = 0.1,
  ntop.max = 200,
  overlap.max = 1,
  rm_mito_ribo = FALSE,
  species = "ms"
)

Arguments

ref_sig_list

A list where each element corresponds to a cluster or cell type. Each element should be a data frame containing at least two columns: gene (the gene names) and expr.prop (the proportion of cells expressing each gene). Generally, it is the output of function find.sig.genes.

expr.prop.cutoff

A numeric value specifying the minimum proportion of cells that must express a gene for it to be considered. Default is 0.1.

ntop.max

An integer specifying the maximum number of top marker genes to be selected. Default is 200.

overlap.max

An integer specifying the maximum allowable overlap of marker genes across clusters. If a gene appears in more than overlap.max clusters, it will be excluded. Default is 1.

rm_mito_ribo

Logical, indicating whether to remove mitochondrial and ribosomal genes from the marker gene list. Default is FALSE.

species

A character string specifying the species for mitochondrial and ribosomal gene detection. Options are "ms" for mouse or "hs" for human. Default is "ms".

Value

A list where each element corresponds to a cluster and contains the selected marker genes. If no markers are found, a message is printed and NULL is returned.

See also

find.sig.genes for signature gene list.

Examples

data(toydata)

seu <- toydata$seu

seu <- ProFAST::pdistance(seu, reduction = "caesar")
#> Calculate co-embedding distance...
sglist <- find.sig.genes(seu = seu)

markers <- marker.select(sglist, expr.prop.cutoff = 0.1, overlap.max = 1)
print(markers)
#> $CAFs
#> [1] "FBLN1"    "CCDC80"   "PDGFRA"   "PTGDS"    "DPT"      "MMP2"     "LUM"     
#> [8] "CRISPLD2"
#> 
#> $`Cancer Epithelial`
#> [1] "LYPD3" "MLPH"  "AGR3"  "ESR1"  "FOXA1" "FASN"  "SCD"   "RHOH" 
#> 
#> $Endothelial
#> [1] "HOXD9" "EGFL7" "VWF"   "SOX17" "KDR"   "IL3RA" "SOX18" "BTNL9"
#> 
#> $Myeloid
#> [1] "HAVCR2" "IGSF6"  "ITGAX"  "FCER1G" "MNDA"   "C1QC"   "CD86"   "AIF1"  
#> 
#> $`Normal Epithelial`
#> [1] "KRT5"    "KRT6B"   "KRT16"   "C5orf46" "KRT15"   "KRT14"   "PIGR"   
#> [8] "MYH11"  
#> 
#> $PVL
#> [1] "CAV1"   "AVPR1A" "ZEB1"   "TCEAL7" "AQP1"   "PDGFRB" "EGFR"   "ANGPT2"
#> 
#> $Plasmablasts
#> [1] "CD79A"    "DERL3"    "MZB1"     "TNFRSF17" "CD27"     "SLAMF7"   "PRDM1"   
#> [8] "ITM2C"   
#> 
#> $`T-cells`
#> [1] "CD3G"  "CD3D"  "CD3E"  "CCL5"  "CD247" "CD69"  "KLRB1" "GZMA" 
#>