A hierarchical Forward and Backward Stagewise (HierFabs) algorithm for identifying hierachical interaction of genomics data.

Usage

HierFabs(
  G,
  y,
  E,
  weight = NULL,
  model = c("gaussian", "cox", "quantile", "logistic"),
  back = TRUE,
  stoping = TRUE,
  eps = 0.01,
  xi = 10^-6,
  iter = 3000,
  lambda.min = NULL,
  hier = c("strong", "weak"),
  max_s = NULL,
  diagonal = FALSE,
  status = NULL,
  gamma = NULL,
  tau = NUL,
  criteria = c("EBIC", "BIC")
)

Arguments

G: Gene matrix, each row is an observation vector.
y: Response variable. For logistic regression model, y takes value at 1 and -1.
E: An optional environment matrix. If Z is given, the interactions between environment and gene are of interest. Otherwise, the gene-gene interactions are of interest.
weight: An optional weights. Default is 1 for each observation.
model: A character string representing one of the built-in models. 'gaussian' for linear model and 'cox' for cox model.
back: An indicator of whether to take backward steps.
stoping: An indicator of whether to stop iteration when lambda is less than lambda.min.
eps: Step size. Default is 0.01.
xi: A tolerate to improve program stability. Default is 10^-6.
iter: Maximum number of outer-loop iterations allowed. Default is 3000.
lambda.min: Smallest value for lambda, as a fraction of lambda.max.
hier: Whether to enforce strong or weak heredity. Default is 'strong'.
max_s: Limit the maximum number of variables in the model. When exceed this limit, program will early stopped.
diagonal: An indicator of whether to include "pure" quadratic terms. Work when gene-gene interactions are of interest.
status: A censoring indicator.
gamma: A tuning parameter in EBIC.
tau: parameter for quantile regression.
criteria: The criteria used to select the optimal solution.

Value

A list.

theta - The coefficients of covariates, each column is a solution.
beta - The optimal solution.
lambda - Lambda sequence.
direction - Update indicator. 1 means take a forward step, -1 means take a backward step.
iter - Number of iterations.
EBIC - EBIC for each solution.
loss - loss for each solution.
df - Number of nonzero coefficients.
opt - Position of the optimal tuning based on EBIC.
intercept - The intercept term, which appearance is due to standardization.

Examples

set.seed(0)
n = 500
p = 100
x = matrix(rnorm(n*p),n,p)
eta = x[,1:4] %*% rep(1,4) + 3*x[,1]*x[,2] + 3*x[,1]*x[,4]
y =  eta + rnorm(n)
xtest = matrix(rnorm(n*p),n,p)
eta.test = xtest[,1:4] %*% rep(1,4) + 3*xtest[,1]*xtest[,2] + 3*xtest[,1]*xtest[,4]
ytest =  eta.test + rnorm(n)
fit.gg.strong = HierFabs(x, y)
y.pred.gg.s = predict(fit.gg.strong, xtest, ytest)
y.pred.gg.s$mse
#> [1] 1.060471
print(fit.gg.strong)
#> 4 x 3 sparse Matrix of class "dgCMatrix"
#>    main effect      G2      G4
#> G1     0.86902 2.94975 2.95852
#> G2     0.84403 .       .      
#> G3     0.83112 .       .      
#> G4     0.93636 .       .      

## Weak hierarchy
fit.gg.weak = HierFabs(x, y, hier="weak")
y.pred.gg.w = predict(fit.gg.weak, xtest, ytest)
y.pred.gg.w$mse
#> [1] 1.107333
print(fit.gg.weak)
#> 4 x 3 sparse Matrix of class "dgCMatrix"
#>    main effect      G2      G4
#> G1     0.83871 2.92996 2.93195
#> G2     0.81424 .       .      
#> G3     0.80107 .       .      
#> G4     0.90827 .       .      

## Cox model with Gene-Environment interactions
pz = 10
z = matrix(rnorm(n*pz),n,pz)
eta.ge = x[,1:4] %*% rep(1,4) + z[,1] + z[,2] + 3*x[,1]*z[,1] + 3*x[,2]*z[,2]
err = log(rweibull(n, shape = 1, scale = 1))
y0 = exp(-eta.ge + err)
cens = quantile(y0, 0.9)
y.ge = pmin(y0, cens)
status = 1 * (y0<=cens)
ztest = matrix(rnorm(n*pz),n,pz)
eta.ge.test = rowSums(xtest[,1:4]) + ztest[,1] + ztest[,2]
eta.ge.test = eta.ge.test + 3*xtest[,1]*ztest[,1] + 3*xtest[,2]*ztest[,2]
err.test = log(rweibull(n, shape = 1, scale = 1))
y0.test = exp(-eta.ge.test + err.test)
cens = quantile(y0.test, 0.9)
y.ge.test = pmin(y0.test, cens)
status.test = 1 * (y0.test<=cens)
fit.ge.strong = HierFabs(x, y.ge, z, model="cox", status=status)
y.pred.ge.s = predict(fit.ge.strong, xtest, y.ge.test, ztest, status.test)
y.pred.ge.s$c.index
#> [1] 0.914657
print(fit.ge.strong)
#> 7 x 3 sparse Matrix of class "dgCMatrix"
#>             main effect      E1      E2
#> main effect     .       0.63192 0.64792
#> G1              0.60630 1.88923 .      
#> G2              0.54614 .       1.92648
#> G3              0.64086 .       .      
#> G4              0.60864 .       .      
#> G24             0.02830 .       .      
#> G53             0.02939 .       .      

## Weak hierarchy
fit.ge.weak = HierFabs(x, y.ge, z, model="cox", status=status, hier="weak")
y.pred.ge.w = predict(fit.ge.weak, xtest, y.ge.test, ztest, status.test)
y.pred.ge.w$c.index
#> [1] 0.9148998
print(fit.ge.weak)
#> 9 x 3 sparse Matrix of class "dgCMatrix"
#>             main effect      E1      E2
#> main effect     .       0.61186 0.61755
#> G1              0.57598 1.82142 .      
#> G2              0.52628 .       1.86564
#> G3              0.62083 .       .      
#> G4              0.58055 .       .      
#> G24             0.01887 .       .      
#> G53             0.01959 .       .      
#> G32             .       0.00994 .      
#> G59             .       0.02020 .

A hierarchical Forward and Backward Stagewise (HierFabs) algorithm for identifying hierachical interaction of genomics data.

Usage

Arguments

Value

See also

Examples