iSC.MEB: seqFISH Data Analysis
Xiao Zhang
2022-12-15
Source:vignettes/iSC.MEB.seqFISH.Rmd
iSC.MEB.seqFISH.Rmd
Intrduction
This vignette introduces the iSC.MEB workflow for the analysis of seqFISH datasets. The seqFISH dataset contains the high-resolution spatial map from 6 mouse embryo tissue sections generated by a modified version of the seqFISH (sequential fluorescence in situ hybridization) method which allows highly-effective cell segmentation Lohoff T. et al., 2020. The workflow consists of three steps
- Independent preprocessing and model setting
- Integrated clustering using iSC.MEB model
- Downstream analysis (i.e. visualization of clusters and embeddings, combined differential expression analysis)
We demonstrate the use of iSC.MEB to a subset of the seqFISH datasets that are here, which can be downloaded to the current working path by the following command:
githubURL <- "https://github.com/XiaoZhangryy/iSC.MEB/blob/main/vignettes_data/seqFISH.rda?raw=true"
download.file(githubURL, "seqFISH.rda", mode = "wb")
The package can be loaded with the command:
Then load datasets to R
load("seqFISH.rda")
Fit an iSC.MEB
model
First, we view the the spatial transcriptomics data.
seuList ## a list including six Seurat object
#> [[1]]
#> An object of class Seurat
#> 351 features across 2025 samples within 1 assay
#> Active assay: RNA (351 features, 0 variable features)
#>
#> [[2]]
#> An object of class Seurat
#> 351 features across 1530 samples within 1 assay
#> Active assay: RNA (351 features, 0 variable features)
#>
#> [[3]]
#> An object of class Seurat
#> 351 features across 2221 samples within 1 assay
#> Active assay: RNA (351 features, 0 variable features)
Prepare the iSC.MEB
Object.
Then, we can create a iSCMEBObj object to prepare for
iSC.MEB
models. And next, we performe create adjacency
matrix, dimensionality reduction and set model setting steps in turn.
Here, the adjacency matrix is built based on L2 distance.
Genelist <- row.names(seuList[[1]])
iSCMEBObj <- CreateiSCMEBObject(seuList = seuList, customGenelist = Genelist, verbose = FALSE)
## check the number of genes/features after filtering step
iSCMEBObj@seulist
#> [[1]]
#> An object of class Seurat
#> 351 features across 2012 samples within 1 assay
#> Active assay: RNA (351 features, 0 variable features)
#>
#> [[2]]
#> An object of class Seurat
#> 351 features across 1525 samples within 1 assay
#> Active assay: RNA (351 features, 0 variable features)
#>
#> [[3]]
#> An object of class Seurat
#> 351 features across 2174 samples within 1 assay
#> Active assay: RNA (351 features, 0 variable features)
## Add adjacency matrix list for a iSCMEBObj object to prepare for iSC.MEB model fitting.
iSCMEBObj <- CreateNeighbors(iSCMEBObj, radius.upper = 10)
## run PCA to get low dimensional embeddings
iSCMEBObj <- runPCA(iSCMEBObj, npcs = 15, pca.method = "APCA")
## Add a model setting in advance for an iSCMEBObj object. verbose = TRUE helps outputing the
## information in the algorithm.
iSCMEBObj <- SetModelParameters(iSCMEBObj, verbose = FALSE, maxIter_ICM = 4, maxIter = 20, coreNum = 4)
Fit iSC.MEB
For function iSCMEB
, users can specify the number of
clusters K
or set K
to be an integer vector by
using modified MBIC(BIC) to determine K
. Here, we use MBIC
to select optimal number of clusters, which is the default method. User
can use specifying other criteria in function
SetModelParameters
or using function
SelectModel
to select model after model fitting. Here we
briefly explain how to choose the parameter c_penalty
in
the modified BIC. In general, when the number of cluster is considered
to be small, such as less than 10 clusters, c_penalty
often
be large, for example 5, 10. And when the number of cluster is
considered to be large, c_penalty
often ranges from 0.5 to
2. Most importantly, iSC.MEB is fast, scaling well in terms of sample
size, which allow the user to tune the c_penalty
based on
their prior knowledge about the tissues or cells.
iSCMEBObj <- iSCMEB(iSCMEBObj, K = 16:19)
iSCMEBObj <- SelectModel(iSCMEBObj, criteria = "MBIC", c_penalty = 0.2)
p1 <- SelectKPlot(iSCMEBObj, criteria = "MBIC")
p1
Evaluate performance
The function idents
can extract the labels provided by
iSC.MEB
method. Therefore, we can evaluate the clustering
performance by some metrics, such as ARI
.
LabelList <- lapply(iSCMEBObj@seulist, function(seu) seu@meta.data$celltype_mapped_refined)
ARI <- function(x, y) mclust::adjustedRandIndex(x, y)
ari_sections <- sapply(1:3, function(i) ARI(idents(iSCMEBObj)[[i]], LabelList[[i]]))
ari_all <- ARI(unlist(idents(iSCMEBObj)), unlist(LabelList))
print(ari_sections)
#> [1] 0.5038898 0.5484317 0.5315789
print(ari_all)
#> [1] 0.5170617
Visualization
Spatial scatter plot
In addition to metrics, we can use some visualization functions to measure the clustering results, such as the spatial scatter plot.
p2 <- SpaHeatMap(iSCMEBObj, item = "cluster", plot_type = "Scatter", layout.dim = c(1, 3), nrow.legend = 2,
no_axis = TRUE, point_size = 1.5)
p2
t-SNE plot.
Next, user can visualize the inferred embeddings for biological effects between cell/domain types using two components from either tSNE or UMAP. Here, wo demonstrate the clustering and batch remove performance by t-SNE plot.
iSCMEBObj <- CalculateTSNE(iSCMEBObj, reduction = "iSCMEB", n_comp = 2)
library(patchwork)
p3 <- LowEmbedPlot(iSCMEBObj, item = "cluster", reduction = "TSNE2", point_size = 0.5)
p4 <- LowEmbedPlot(iSCMEBObj, item = "batch", reduction = "TSNE2", point_size = 0.5)
p3 + p4
Combined differential expression analysis
Finally, we also provide functions to fecility user do combined
differential expression analysis. The IntegrateSpaData
function can integrate multiple SRT data based on the our results,
function doDEG
provide top differential expression genes,
and function doHeatmap
plot heatmap.
seuInt <- IntegrateSpaData(iSCMEBObj, "mouse")
top10 <- doDEG(seuInt, topn = 10)
p5 <- doHeatmap(seuInt, top10$gene)
p5
Session information
sessionInfo()
#> R version 4.1.3 (2022-03-10)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19045)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=English_United States.1252
#> [2] LC_CTYPE=English_United States.1252
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.1252
#> system code page: 936
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] patchwork_1.1.2 SeuratObject_4.1.3 Seurat_4.2.1 iSC.MEB_1.0
#> [5] ggplot2_3.4.0 gtools_3.9.3
#>
#> loaded via a namespace (and not attached):
#> [1] utf8_1.2.2 spatstat.explore_3.0-5
#> [3] reticulate_1.26 tidyselect_1.2.0
#> [5] htmlwidgets_1.5.4 grid_4.1.3
#> [7] BiocParallel_1.28.3 Rtsne_0.16
#> [9] ScaledMatrix_1.2.0 munsell_0.5.0
#> [11] codetools_0.2-18 ragg_1.2.4
#> [13] ica_1.0-3 future_1.29.0
#> [15] miniUI_0.1.1.1 withr_2.5.0
#> [17] spatstat.random_3.0-1 colorspace_2.0-3
#> [19] progressr_0.11.0 Biobase_2.54.0
#> [21] highr_0.9 knitr_1.41
#> [23] rstudioapi_0.14 stats4_4.1.3
#> [25] SingleCellExperiment_1.16.0 ROCR_1.0-11
#> [27] ggsignif_0.6.4 tensor_1.5
#> [29] listenv_0.8.0 labeling_0.4.2
#> [31] MatrixGenerics_1.6.0 GenomeInfoDbData_1.2.7
#> [33] polyclip_1.10-4 farver_2.1.1
#> [35] rprojroot_2.0.3 parallelly_1.32.1
#> [37] vctrs_0.5.0 generics_0.1.3
#> [39] xfun_0.35 R6_2.5.1
#> [41] GenomeInfoDb_1.30.1 ggbeeswarm_0.6.0
#> [43] rsvd_1.0.5 bitops_1.0-7
#> [45] spatstat.utils_3.0-1 cachem_1.0.6
#> [47] DelayedArray_0.20.0 assertthat_0.2.1
#> [49] promises_1.2.0.1 scales_1.2.1
#> [51] beeswarm_0.4.0 gtable_0.3.1
#> [53] beachmat_2.10.0 globals_0.16.1
#> [55] goftest_1.2-3 rlang_1.0.6
#> [57] systemfonts_1.0.4 splines_4.1.3
#> [59] rstatix_0.7.1 lazyeval_0.2.2
#> [61] spatstat.geom_3.0-3 broom_1.0.1
#> [63] yaml_2.3.6 reshape2_1.4.4
#> [65] abind_1.4-5 backports_1.4.1
#> [67] httpuv_1.6.6 tools_4.1.3
#> [69] ellipsis_0.3.2 jquerylib_0.1.4
#> [71] RColorBrewer_1.1-3 BiocGenerics_0.40.0
#> [73] ggridges_0.5.4 Rcpp_1.0.9
#> [75] plyr_1.8.8 sparseMatrixStats_1.6.0
#> [77] zlibbioc_1.40.0 purrr_0.3.5
#> [79] RCurl_1.98-1.9 ggpubr_0.4.0
#> [81] deldir_1.0-6 viridis_0.6.2
#> [83] pbapply_1.5-0 cowplot_1.1.1
#> [85] S4Vectors_0.32.4 zoo_1.8-11
#> [87] SummarizedExperiment_1.24.0 ggrepel_0.9.2
#> [89] cluster_2.1.4 fs_1.5.2
#> [91] magrittr_2.0.3 GiRaF_1.0.1
#> [93] data.table_1.14.4 scattermore_0.8
#> [95] lmtest_0.9-40 RANN_2.6.1
#> [97] fitdistrplus_1.1-8 matrixStats_0.62.0
#> [99] mime_0.12 evaluate_0.18
#> [101] xtable_1.8-4 mclust_6.0.0
#> [103] IRanges_2.28.0 gridExtra_2.3
#> [105] compiler_4.1.3 scater_1.22.0
#> [107] tibble_3.1.8 KernSmooth_2.23-20
#> [109] htmltools_0.5.3 later_1.3.0
#> [111] tidyr_1.2.1 DBI_1.1.3
#> [113] formatR_1.12 MASS_7.3-58.1
#> [115] Matrix_1.5-1 car_3.1-1
#> [117] cli_3.4.1 parallel_4.1.3
#> [119] igraph_1.3.5 DR.SC_2.9
#> [121] GenomicRanges_1.46.1 pkgconfig_2.0.3
#> [123] pkgdown_2.0.6 sp_1.5-1
#> [125] plotly_4.10.1 scuttle_1.4.0
#> [127] spatstat.sparse_3.0-0 vipor_0.4.5
#> [129] bslib_0.4.1 XVector_0.34.0
#> [131] CompQuadForm_1.4.3 stringr_1.4.1
#> [133] digest_0.6.30 sctransform_0.3.5
#> [135] RcppAnnoy_0.0.20 spatstat.data_3.0-0
#> [137] rmarkdown_2.18 leiden_0.4.3
#> [139] uwot_0.1.14 DelayedMatrixStats_1.16.0
#> [141] shiny_1.7.3 lifecycle_1.0.3
#> [143] nlme_3.1-160 jsonlite_1.8.3
#> [145] carData_3.0-5 BiocNeighbors_1.12.0
#> [147] limma_3.50.3 desc_1.4.2
#> [149] viridisLite_0.4.1 fansi_1.0.3
#> [151] pillar_1.8.1 lattice_0.20-45
#> [153] fastmap_1.1.0 httr_1.4.4
#> [155] survival_3.4-0 glue_1.6.2
#> [157] png_0.1-7 stringi_1.7.8
#> [159] sass_0.4.2 textshaping_0.3.6
#> [161] BiocSingular_1.10.0 memoise_2.0.1
#> [163] dplyr_1.0.10 irlba_2.3.5.1
#> [165] future.apply_1.10.0