Skip to contents

A hierarchical Forward and Backward Stagewise (HierFabs) algorithm for identifying hierachical interaction of genomics data.

Usage

HierFabs(
  G,
  y,
  E,
  weight = NULL,
  model = c("gaussian", "cox", "quantile", "logistic"),
  back = TRUE,
  stoping = TRUE,
  eps = 0.01,
  xi = 10^-6,
  iter = 3000,
  lambda.min = NULL,
  hier = c("strong", "weak"),
  max_s = NULL,
  diagonal = FALSE,
  status = NULL,
  gamma = NULL,
  tau = NUL,
  criteria = c("EBIC", "BIC")
)

Arguments

G

Gene matrix, each row is an observation vector.

y

Response variable. For logistic regression model, y takes value at 1 and -1.

E

An optional environment matrix. If Z is given, the interactions between environment and gene are of interest. Otherwise, the gene-gene interactions are of interest.

weight

An optional weights. Default is 1 for each observation.

model

A character string representing one of the built-in models. 'gaussian' for linear model and 'cox' for cox model.

back

An indicator of whether to take backward steps.

stoping

An indicator of whether to stop iteration when lambda is less than lambda.min.

eps

Step size. Default is 0.01.

xi

A tolerate to improve program stability. Default is 10^-6.

iter

Maximum number of outer-loop iterations allowed. Default is 3000.

lambda.min

Smallest value for lambda, as a fraction of lambda.max.

hier

Whether to enforce strong or weak heredity. Default is 'strong'.

max_s

Limit the maximum number of variables in the model. When exceed this limit, program will early stopped.

diagonal

An indicator of whether to include "pure" quadratic terms. Work when gene-gene interactions are of interest.

status

A censoring indicator.

gamma

A tuning parameter in EBIC.

tau

parameter for quantile regression.

criteria

The criteria used to select the optimal solution.

Value

A list.

  • theta - The coefficients of covariates, each column is a solution.

  • beta - The optimal solution.

  • lambda - Lambda sequence.

  • direction - Update indicator. 1 means take a forward step, -1 means take a backward step.

  • iter - Number of iterations.

  • EBIC - EBIC for each solution.

  • loss - loss for each solution.

  • df - Number of nonzero coefficients.

  • opt - Position of the optimal tuning based on EBIC.

  • intercept - The intercept term, which appearance is due to standardization.

Examples

set.seed(0)
n = 500
p = 100
x = matrix(rnorm(n*p),n,p)
eta = x[,1:4] %*% rep(1,4) + 3*x[,1]*x[,2] + 3*x[,1]*x[,4]
y =  eta + rnorm(n)
xtest = matrix(rnorm(n*p),n,p)
eta.test = xtest[,1:4] %*% rep(1,4) + 3*xtest[,1]*xtest[,2] + 3*xtest[,1]*xtest[,4]
ytest =  eta.test + rnorm(n)
fit.gg.strong = HierFabs(x, y)
y.pred.gg.s = predict(fit.gg.strong, xtest, ytest)
y.pred.gg.s$mse
#> [1] 1.060471
print(fit.gg.strong)
#> 4 x 3 sparse Matrix of class "dgCMatrix"
#>    main effect      G2      G4
#> G1     0.86902 2.94975 2.95852
#> G2     0.84403 .       .      
#> G3     0.83112 .       .      
#> G4     0.93636 .       .      

## Weak hierarchy
fit.gg.weak = HierFabs(x, y, hier="weak")
y.pred.gg.w = predict(fit.gg.weak, xtest, ytest)
y.pred.gg.w$mse
#> [1] 1.107333
print(fit.gg.weak)
#> 4 x 3 sparse Matrix of class "dgCMatrix"
#>    main effect      G2      G4
#> G1     0.83871 2.92996 2.93195
#> G2     0.81424 .       .      
#> G3     0.80107 .       .      
#> G4     0.90827 .       .      

## Cox model with Gene-Environment interactions
pz = 10
z = matrix(rnorm(n*pz),n,pz)
eta.ge = x[,1:4] %*% rep(1,4) + z[,1] + z[,2] + 3*x[,1]*z[,1] + 3*x[,2]*z[,2]
err = log(rweibull(n, shape = 1, scale = 1))
y0 = exp(-eta.ge + err)
cens = quantile(y0, 0.9)
y.ge = pmin(y0, cens)
status = 1 * (y0<=cens)
ztest = matrix(rnorm(n*pz),n,pz)
eta.ge.test = rowSums(xtest[,1:4]) + ztest[,1] + ztest[,2]
eta.ge.test = eta.ge.test + 3*xtest[,1]*ztest[,1] + 3*xtest[,2]*ztest[,2]
err.test = log(rweibull(n, shape = 1, scale = 1))
y0.test = exp(-eta.ge.test + err.test)
cens = quantile(y0.test, 0.9)
y.ge.test = pmin(y0.test, cens)
status.test = 1 * (y0.test<=cens)
fit.ge.strong = HierFabs(x, y.ge, z, model="cox", status=status)
y.pred.ge.s = predict(fit.ge.strong, xtest, y.ge.test, ztest, status.test)
y.pred.ge.s$c.index
#> [1] 0.914657
print(fit.ge.strong)
#> 7 x 3 sparse Matrix of class "dgCMatrix"
#>             main effect      E1      E2
#> main effect     .       0.63192 0.64792
#> G1              0.60630 1.88923 .      
#> G2              0.54614 .       1.92648
#> G3              0.64086 .       .      
#> G4              0.60864 .       .      
#> G24             0.02830 .       .      
#> G53             0.02939 .       .      

## Weak hierarchy
fit.ge.weak = HierFabs(x, y.ge, z, model="cox", status=status, hier="weak")
y.pred.ge.w = predict(fit.ge.weak, xtest, y.ge.test, ztest, status.test)
y.pred.ge.w$c.index
#> [1] 0.9148998
print(fit.ge.weak)
#> 9 x 3 sparse Matrix of class "dgCMatrix"
#>             main effect      E1      E2
#> main effect     .       0.61186 0.61755
#> G1              0.57598 1.82142 .      
#> G2              0.52628 .       1.86564
#> G3              0.62083 .       .      
#> G4              0.58055 .       .      
#> G24             0.01887 .       .      
#> G53             0.01959 .       .      
#> G32             .       0.00994 .      
#> G59             .       0.02020 .