HierFabs: SKCM Data Analysis with LM-GG model
Xiao Zhang
2022-10-22
Source:vignettes/HierFabs.GGLM.Rmd
HierFabs.GGLM.Rmd
This vignette introduces the HierFabs
workflow for the
analysis of the skin cutaneous melanoma (SKCM) dataset downloaded from
The Cancer Genome Atlas TCGA, which contains disease
outcomes, environmental factors, and high-dimensional gene expressions.
The goal of analysis is to identify interactions that are associated
with the prognosis of SKCM.
We demonstrate the use of HierFabs
to the SKCM data that
are here,
which can be downloaded to the current working path by the following
command:
githubURL <- "https://github.com/XiaoZhangryy/HierFabs/blob/master/vignettes_data/cleaned_SKCM_TCGA_Data.rda?raw=true"
download.file(githubURL, "cleaned_SKCM_TCGA_Data.rda", mode = "wb")
The outcome of interest is the (log-transformed) Breslow’s thickness, which is a continuous variable that has been suggested as a clinicopathologic feature of cutaneous melanoma. We conduct a prescreening by the p-value of a marginal linear model, and the top 2,000 genes are selected for downstream analysis. To identify GG interactions under the weak hierarchy, we need to fit a high dimensional linear model with 2,003,000 covariates.
The package can be loaded with the command:
Then load datasets to R
load("cleaned_SKCM_TCGA_Data.rda")
Fit LM-GG
Fit a linear model with gene-gene interaction under weak hierarchy constraint. The response is the log-transformed Breslow’s thickness.
Genes = as.matrix(data$gexp)
Y = data$Y
fit <- HierFabs(Genes, Y, eps = 0.01, hier = "weak", model = "gaussian", diagonal = TRUE, criteria = "BIC")
Then, we can use the print
function to show the
result.
print(fit)
#> 10 x 12 sparse Matrix of class "dgCMatrix"
#> main effect SMC3 RABGEF1 MLLT3 INPP1 SNX3 LINC00442 LPIN3
#> SLC8A1 -0.04617 . . . . . . .
#> DPYD -0.02318 . . . . . . .
#> PHIP -0.00977 . . . . . . .
#> SLC40A1 -0.01117 0.01167 . . . . . .
#> PARD6G 0.00822 . 0.03023 . . . . .
#> TMEM159 0.00773 . . -0.01449 0.03722 . . .
#> STAMBPL1 -0.01088 . . . . 0.04428 . .
#> INPP5K 0.00809 . . . . . -0.05904 .
#> SERP2 0.00886 . . . . . . 0.02399
#> NR2F1 -0.01133 . . . . . . .
#> LINC00482 GLIPR2 VPS37B SLAMF7
#> SLC8A1 . . . .
#> DPYD . . . .
#> PHIP . . . .
#> SLC40A1 . . . .
#> PARD6G . . . .
#> TMEM159 . . . .
#> STAMBPL1 . . . .
#> INPP5K . . . .
#> SERP2 0.00313 -0.04368 . .
#> NR2F1 . . -0.01841 0.12726
Session information
sessionInfo()
#> R version 4.1.3 (2022-03-10)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19043)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=English_United States.1252
#> [2] LC_CTYPE=English_United States.1252
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.1252
#> system code page: 936
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] Matrix_1.5-1 HierFabs_0.1.0
#>
#> loaded via a namespace (and not attached):
#> [1] rstudioapi_0.13 knitr_1.38 magrittr_2.0.3 lattice_0.20-45
#> [5] R6_2.5.1 ragg_1.2.2 rlang_1.0.6 fastmap_1.1.0
#> [9] stringr_1.4.1 tools_4.1.3 grid_4.1.3 xfun_0.30
#> [13] cli_3.2.0 jquerylib_0.1.4 htmltools_0.5.3 systemfonts_1.0.4
#> [17] yaml_2.3.5 digest_0.6.29 rprojroot_2.0.3 pkgdown_2.0.6
#> [21] textshaping_0.3.6 purrr_0.3.4 formatR_1.12 sass_0.4.2
#> [25] fs_1.5.2 memoise_2.0.1 cachem_1.0.6 evaluate_0.15
#> [29] rmarkdown_2.13 stringi_1.7.8 compiler_4.1.3 bslib_0.4.0
#> [33] desc_1.4.1 jsonlite_1.8.2