Skip to contents

Intrduction

This vignette introduces the iSC.MEB workflow for the analysis of seqFISH datasets. The seqFISH dataset contains the high-resolution spatial map from 6 mouse embryo tissue sections generated by a modified version of the seqFISH (sequential fluorescence in situ hybridization) method which allows highly-effective cell segmentation Lohoff T. et al., 2020. The workflow consists of three steps

  • Independent preprocessing and model setting
  • Integrated clustering using iSC.MEB model
  • Downstream analysis (i.e. visualization of clusters and embeddings, combined differential expression analysis)

We demonstrate the use of iSC.MEB to a subset of the seqFISH datasets that are here, which can be downloaded to the current working path by the following command:

githubURL <- "https://github.com/XiaoZhangryy/iSC.MEB/blob/main/vignettes_data/seqFISH.rda?raw=true"
download.file(githubURL, "seqFISH.rda", mode = "wb")

The package can be loaded with the command:

Then load datasets to R

load("seqFISH.rda")

Fit an iSC.MEB model

First, we view the the spatial transcriptomics data.

seuList  ## a list including six Seurat object
#> [[1]]
#> An object of class Seurat 
#> 351 features across 2025 samples within 1 assay 
#> Active assay: RNA (351 features, 0 variable features)
#> 
#> [[2]]
#> An object of class Seurat 
#> 351 features across 1530 samples within 1 assay 
#> Active assay: RNA (351 features, 0 variable features)
#> 
#> [[3]]
#> An object of class Seurat 
#> 351 features across 2221 samples within 1 assay 
#> Active assay: RNA (351 features, 0 variable features)

Prepare the iSC.MEB Object.

Then, we can create a iSCMEBObj object to prepare for iSC.MEB models. And next, we performe create adjacency matrix, dimensionality reduction and set model setting steps in turn. Here, the adjacency matrix is built based on L2 distance.

Genelist <- row.names(seuList[[1]])
iSCMEBObj <- CreateiSCMEBObject(seuList = seuList, customGenelist = Genelist, verbose = FALSE)
## check the number of genes/features after filtering step
iSCMEBObj@seulist
#> [[1]]
#> An object of class Seurat 
#> 351 features across 2012 samples within 1 assay 
#> Active assay: RNA (351 features, 0 variable features)
#> 
#> [[2]]
#> An object of class Seurat 
#> 351 features across 1525 samples within 1 assay 
#> Active assay: RNA (351 features, 0 variable features)
#> 
#> [[3]]
#> An object of class Seurat 
#> 351 features across 2174 samples within 1 assay 
#> Active assay: RNA (351 features, 0 variable features)
## Add adjacency matrix list for a iSCMEBObj object to prepare for iSC.MEB model fitting.
iSCMEBObj <- CreateNeighbors(iSCMEBObj, radius.upper = 10)
## run PCA to get low dimensional embeddings
iSCMEBObj <- runPCA(iSCMEBObj, npcs = 15, pca.method = "APCA")
## Add a model setting in advance for an iSCMEBObj object. verbose = TRUE helps outputing the
## information in the algorithm.
iSCMEBObj <- SetModelParameters(iSCMEBObj, verbose = FALSE, maxIter_ICM = 4, maxIter = 20, coreNum = 4)

Fit iSC.MEB

For function iSCMEB, users can specify the number of clusters K or set K to be an integer vector by using modified MBIC(BIC) to determine K. Here, we use MBIC to select optimal number of clusters, which is the default method. User can use specifying other criteria in function SetModelParameters or using function SelectModel to select model after model fitting. Here we briefly explain how to choose the parameter c_penalty in the modified BIC. In general, when the number of cluster is considered to be small, such as less than 10 clusters, c_penalty often be large, for example 5, 10. And when the number of cluster is considered to be large, c_penalty often ranges from 0.5 to 2. Most importantly, iSC.MEB is fast, scaling well in terms of sample size, which allow the user to tune the c_penalty based on their prior knowledge about the tissues or cells.

iSCMEBObj <- iSCMEB(iSCMEBObj, K = 16:19)
iSCMEBObj <- SelectModel(iSCMEBObj, criteria = "MBIC", c_penalty = 0.2)
p1 <- SelectKPlot(iSCMEBObj, criteria = "MBIC")
p1

Evaluate performance

The function idents can extract the labels provided by iSC.MEB method. Therefore, we can evaluate the clustering performance by some metrics, such as ARI.

LabelList <- lapply(iSCMEBObj@seulist, function(seu) seu@meta.data$celltype_mapped_refined)
ARI <- function(x, y) mclust::adjustedRandIndex(x, y)
ari_sections <- sapply(1:3, function(i) ARI(idents(iSCMEBObj)[[i]], LabelList[[i]]))
ari_all <- ARI(unlist(idents(iSCMEBObj)), unlist(LabelList))
print(ari_sections)
#> [1] 0.5038898 0.5484317 0.5315789
print(ari_all)
#> [1] 0.5170617

Visualization

Spatial scatter plot

In addition to metrics, we can use some visualization functions to measure the clustering results, such as the spatial scatter plot.

p2 <- SpaHeatMap(iSCMEBObj, item = "cluster", plot_type = "Scatter", layout.dim = c(1, 3), nrow.legend = 2,
    no_axis = TRUE, point_size = 1.5)
p2

t-SNE plot.

Next, user can visualize the inferred embeddings for biological effects between cell/domain types using two components from either tSNE or UMAP. Here, wo demonstrate the clustering and batch remove performance by t-SNE plot.

iSCMEBObj <- CalculateTSNE(iSCMEBObj, reduction = "iSCMEB", n_comp = 2)
library(patchwork)
p3 <- LowEmbedPlot(iSCMEBObj, item = "cluster", reduction = "TSNE2", point_size = 0.5)
p4 <- LowEmbedPlot(iSCMEBObj, item = "batch", reduction = "TSNE2", point_size = 0.5)
p3 + p4

Combined differential expression analysis

Finally, we also provide functions to fecility user do combined differential expression analysis. The IntegrateSpaData function can integrate multiple SRT data based on the our results, function doDEG provide top differential expression genes, and function doHeatmap plot heatmap.

seuInt <- IntegrateSpaData(iSCMEBObj, "mouse")
top10 <- doDEG(seuInt, topn = 10)
p5 <- doHeatmap(seuInt, top10$gene)
p5

Session information

sessionInfo()
#> R version 4.1.3 (2022-03-10)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19045)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.1252 
#> [2] LC_CTYPE=English_United States.1252   
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.1252    
#> system code page: 936
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] patchwork_1.1.2    SeuratObject_4.1.3 Seurat_4.2.1       iSC.MEB_1.0       
#> [5] ggplot2_3.4.0      gtools_3.9.3      
#> 
#> loaded via a namespace (and not attached):
#>   [1] utf8_1.2.2                  spatstat.explore_3.0-5     
#>   [3] reticulate_1.26             tidyselect_1.2.0           
#>   [5] htmlwidgets_1.5.4           grid_4.1.3                 
#>   [7] BiocParallel_1.28.3         Rtsne_0.16                 
#>   [9] ScaledMatrix_1.2.0          munsell_0.5.0              
#>  [11] codetools_0.2-18            ragg_1.2.4                 
#>  [13] ica_1.0-3                   future_1.29.0              
#>  [15] miniUI_0.1.1.1              withr_2.5.0                
#>  [17] spatstat.random_3.0-1       colorspace_2.0-3           
#>  [19] progressr_0.11.0            Biobase_2.54.0             
#>  [21] highr_0.9                   knitr_1.41                 
#>  [23] rstudioapi_0.14             stats4_4.1.3               
#>  [25] SingleCellExperiment_1.16.0 ROCR_1.0-11                
#>  [27] ggsignif_0.6.4              tensor_1.5                 
#>  [29] listenv_0.8.0               labeling_0.4.2             
#>  [31] MatrixGenerics_1.6.0        GenomeInfoDbData_1.2.7     
#>  [33] polyclip_1.10-4             farver_2.1.1               
#>  [35] rprojroot_2.0.3             parallelly_1.32.1          
#>  [37] vctrs_0.5.0                 generics_0.1.3             
#>  [39] xfun_0.35                   R6_2.5.1                   
#>  [41] GenomeInfoDb_1.30.1         ggbeeswarm_0.6.0           
#>  [43] rsvd_1.0.5                  bitops_1.0-7               
#>  [45] spatstat.utils_3.0-1        cachem_1.0.6               
#>  [47] DelayedArray_0.20.0         assertthat_0.2.1           
#>  [49] promises_1.2.0.1            scales_1.2.1               
#>  [51] beeswarm_0.4.0              gtable_0.3.1               
#>  [53] beachmat_2.10.0             globals_0.16.1             
#>  [55] goftest_1.2-3               rlang_1.0.6                
#>  [57] systemfonts_1.0.4           splines_4.1.3              
#>  [59] rstatix_0.7.1               lazyeval_0.2.2             
#>  [61] spatstat.geom_3.0-3         broom_1.0.1                
#>  [63] yaml_2.3.6                  reshape2_1.4.4             
#>  [65] abind_1.4-5                 backports_1.4.1            
#>  [67] httpuv_1.6.6                tools_4.1.3                
#>  [69] ellipsis_0.3.2              jquerylib_0.1.4            
#>  [71] RColorBrewer_1.1-3          BiocGenerics_0.40.0        
#>  [73] ggridges_0.5.4              Rcpp_1.0.9                 
#>  [75] plyr_1.8.8                  sparseMatrixStats_1.6.0    
#>  [77] zlibbioc_1.40.0             purrr_0.3.5                
#>  [79] RCurl_1.98-1.9              ggpubr_0.4.0               
#>  [81] deldir_1.0-6                viridis_0.6.2              
#>  [83] pbapply_1.5-0               cowplot_1.1.1              
#>  [85] S4Vectors_0.32.4            zoo_1.8-11                 
#>  [87] SummarizedExperiment_1.24.0 ggrepel_0.9.2              
#>  [89] cluster_2.1.4               fs_1.5.2                   
#>  [91] magrittr_2.0.3              GiRaF_1.0.1                
#>  [93] data.table_1.14.4           scattermore_0.8            
#>  [95] lmtest_0.9-40               RANN_2.6.1                 
#>  [97] fitdistrplus_1.1-8          matrixStats_0.62.0         
#>  [99] mime_0.12                   evaluate_0.18              
#> [101] xtable_1.8-4                mclust_6.0.0               
#> [103] IRanges_2.28.0              gridExtra_2.3              
#> [105] compiler_4.1.3              scater_1.22.0              
#> [107] tibble_3.1.8                KernSmooth_2.23-20         
#> [109] htmltools_0.5.3             later_1.3.0                
#> [111] tidyr_1.2.1                 DBI_1.1.3                  
#> [113] formatR_1.12                MASS_7.3-58.1              
#> [115] Matrix_1.5-1                car_3.1-1                  
#> [117] cli_3.4.1                   parallel_4.1.3             
#> [119] igraph_1.3.5                DR.SC_2.9                  
#> [121] GenomicRanges_1.46.1        pkgconfig_2.0.3            
#> [123] pkgdown_2.0.6               sp_1.5-1                   
#> [125] plotly_4.10.1               scuttle_1.4.0              
#> [127] spatstat.sparse_3.0-0       vipor_0.4.5                
#> [129] bslib_0.4.1                 XVector_0.34.0             
#> [131] CompQuadForm_1.4.3          stringr_1.4.1              
#> [133] digest_0.6.30               sctransform_0.3.5          
#> [135] RcppAnnoy_0.0.20            spatstat.data_3.0-0        
#> [137] rmarkdown_2.18              leiden_0.4.3               
#> [139] uwot_0.1.14                 DelayedMatrixStats_1.16.0  
#> [141] shiny_1.7.3                 lifecycle_1.0.3            
#> [143] nlme_3.1-160                jsonlite_1.8.3             
#> [145] carData_3.0-5               BiocNeighbors_1.12.0       
#> [147] limma_3.50.3                desc_1.4.2                 
#> [149] viridisLite_0.4.1           fansi_1.0.3                
#> [151] pillar_1.8.1                lattice_0.20-45            
#> [153] fastmap_1.1.0               httr_1.4.4                 
#> [155] survival_3.4-0              glue_1.6.2                 
#> [157] png_0.1-7                   stringi_1.7.8              
#> [159] sass_0.4.2                  textshaping_0.3.6          
#> [161] BiocSingular_1.10.0         memoise_2.0.1              
#> [163] dplyr_1.0.10                irlba_2.3.5.1              
#> [165] future.apply_1.10.0