Title: | Group Lasso and Elastic Net Solver for Generalized Linear Models |
---|---|
Description: | Extremely efficient procedures for fitting the entire group lasso and group elastic net regularization path for GLMs, multinomial, the Cox model and multi-task Gaussian models. Similar to the R package 'glmnet' in scope of models, and in computational speed. This package provides R bindings to the C++ code underlying the corresponding Python package 'adelie'. These bindings offer a general purpose group elastic net solver, a wide range of matrix classes that can exploit special structure to allow large-scale inputs, and an assortment of generalized linear model classes for fitting various types of data. The package includes The package is an implementation of Yang, J. and Hastie, T. (2024) <doi:10.48550/arXiv.2405.08631>. |
Authors: | James Yang [aut, cph], Trevor Hastie [aut, cph, cre], Balasubramanian Narasimhan [aut] |
Maintainer: | Trevor Hastie <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.3 |
Built: | 2024-11-09 05:26:14 UTC |
Source: | https://github.com/jamesyang007/adelie-r |
Does k-fold cross-validation for grpnet
cv.grpnet( X, glm, n_folds = 10, foldid = NULL, min_ratio = 0.01, lmda_path_size = 100, offsets = NULL, progress_bar = FALSE, n_threads = 1, ... )
cv.grpnet( X, glm, n_folds = 10, foldid = NULL, min_ratio = 0.01, lmda_path_size = 100, offsets = NULL, progress_bar = FALSE, n_threads = 1, ... )
X |
Feature matrix. Either a regualr R matrix, or else an
|
glm |
GLM family/response object. This is an expression that
represents the family, the reponse and other arguments such as
weights, if present. The choices are |
n_folds |
(default 10). Although |
foldid |
An optional vector of values between 1 and |
min_ratio |
Ratio between smallest and largest value of lambda. Default is 1e-2. |
lmda_path_size |
Number of values for |
offsets |
Offsets, default is |
progress_bar |
Progress bar. Default is |
n_threads |
Number of threads, default |
... |
Other arguments that can be passed to |
The function runs grpnet
n_folds
+1 times; the first to get the
lambda
sequence, and then the remainder to compute the fit with each
of the folds omitted. The out-of-fold deviance is accumulated, and the average deviance and
standard deviation over the folds is computed. Note that cv.grpnet
does NOT search for values for alpha
. A specific value should be
supplied, else alpha=1
is assumed by default. If users would like to
cross-validate alpha
as well, they should call cv.grpnet
with
a pre-computed vector foldid
, and then use this same foldid
vector in
separate calls to cv.grpnet
with different values of alpha
.
Note also that the results of cv.grpnet
are random, since the folds
are selected at random. Users can reduce this randomness by running
cv.grpnet
many times, and averaging the error curves.
an object of class "cv.grpnet"
is returned, which is a list
with the ingredients of the cross-validation fit.
lambda |
the values of |
cvm |
The mean cross-validated deviance - a vector of length |
cvsd |
estimate of standard error of |
cvup |
upper curve = |
cvlo |
lower curve = |
nzero |
number of non-zero coefficients at each |
name |
a text string indicating type of measure (for plotting purposes).
Currently this is |
grpnet.fit |
a fitted grpnet object for the full data. |
lambda.min |
value of |
lambda.1se |
largest value of |
index |
a one column matrix with the indices of |
James Yang, Trevor Hastie, and Balasubramanian Narasimhan
Maintainer: Trevor Hastie
[email protected]
Yang, James and Hastie, Trevor. (2024) A Fast and Scalable Pathwise-Solver for Group Lasso
and Elastic Net Penalized Regression via Block-Coordinate Descent. arXiv doi:10.48550/arXiv.2405.08631.
Friedman, J., Hastie, T. and Tibshirani, R. (2008)
Regularization Paths for Generalized Linear Models via Coordinate
Descent (2010), Journal of Statistical Software, Vol. 33(1), 1-22,
doi:10.18637/jss.v033.i01.
Simon, N., Friedman, J., Hastie, T. and Tibshirani, R. (2011)
Regularization Paths for Cox's Proportional
Hazards Model via Coordinate Descent, Journal of Statistical Software, Vol.
39(5), 1-13,
doi:10.18637/jss.v039.i05.
Tibshirani,Robert, Bien, J., Friedman, J., Hastie, T.,Simon, N.,Taylor, J. and
Tibshirani, Ryan. (2012) Strong Rules for Discarding Predictors in
Lasso-type Problems, JRSSB, Vol. 74(2), 245-266,
https://arxiv.org/abs/1011.2234.
set.seed(0) n <- 100 p <- 200 X <- matrix(rnorm(n * p), n, p) y <- X[,1] * rnorm(1) + rnorm(n) fit <- grpnet(X, glm.gaussian(y)) print(fit)
set.seed(0) n <- 100 p <- 200 X <- matrix(rnorm(n * p), n, p) y <- X[,1] * rnorm(1) + rnorm(n) fit <- grpnet(X, glm.gaussian(y)) print(fit)
Solves group elastic net via covariance method.
gaussian_cov( A, v, constraints = NULL, groups = NULL, alpha = 1, penalty = NULL, lmda_path = NULL, max_iters = as.integer(1e+05), tol = 1e-07, rdev_tol = 0.001, newton_tol = 1e-12, newton_max_iters = 1000, n_threads = 1, early_exit = TRUE, screen_rule = "pivot", min_ratio = 0.01, lmda_path_size = 100, max_screen_size = NULL, max_active_size = NULL, pivot_subset_ratio = 0.1, pivot_subset_min = 1, pivot_slack_ratio = 1.25, check_state = FALSE, progress_bar = TRUE, warm_start = NULL )
gaussian_cov( A, v, constraints = NULL, groups = NULL, alpha = 1, penalty = NULL, lmda_path = NULL, max_iters = as.integer(1e+05), tol = 1e-07, rdev_tol = 0.001, newton_tol = 1e-12, newton_max_iters = 1000, n_threads = 1, early_exit = TRUE, screen_rule = "pivot", min_ratio = 0.01, lmda_path_size = 100, max_screen_size = NULL, max_active_size = NULL, pivot_subset_ratio = 0.1, pivot_subset_min = 1, pivot_slack_ratio = 1.25, check_state = FALSE, progress_bar = TRUE, warm_start = NULL )
A |
Positive semi-definite matrix. |
v |
Linear term. |
constraints |
Constraints. |
groups |
Groups. |
alpha |
Elastic net parameter. |
penalty |
Penalty factor. |
lmda_path |
The regularization path. |
max_iters |
Maximum number of coordinate descents. |
tol |
Coordinate descent convergence tolerance. |
rdev_tol |
Relative percent deviance explained tolerance. |
newton_tol |
Convergence tolerance for the BCD update. |
newton_max_iters |
Maximum number of iterations for the BCD update. |
n_threads |
Number of threads. |
early_exit |
|
screen_rule |
Screen rule (currently the only value is the default |
min_ratio |
Ratio between largest and smallest regularization parameter, default is |
lmda_path_size |
Number of regularization steps in the path, default is |
max_screen_size |
Maximum number of screen groups, default is |
max_active_size |
Maximum number of active groups, default is |
pivot_subset_ratio |
Subset ratio of pivot rule, default is |
pivot_subset_min |
Minimum subset of pivot rule, default is |
pivot_slack_ratio |
Slack ratio of pivot rule, default is |
check_state |
Check state, default is |
progress_bar |
Progress bar, default is |
warm_start |
Warm start, default is |
State of the solver.
set.seed(0) n <- 100 p <- 200 X <- matrix(rnorm(n * p), n, p) y <- X[,1] * rnorm(1) + rnorm(n) A <- t(X) %*% X / n v <- t(X) %*% y / n state <- gaussian_cov(A, v)
set.seed(0) n <- 100 p <- 200 X <- matrix(rnorm(n * p), n, p) y <- X[,1] * rnorm(1) + rnorm(n) A <- t(X) %*% X / n v <- t(X) %*% y / n state <- gaussian_cov(A, v)
A GLM family object specifies the type of model fit, provides the appropriate response object and makes sure it is represented in the right form for the model family, and allows for optional parameters such as a weight vector.
glm.binomial(y, weights = NULL, link = "logit")
glm.binomial(y, weights = NULL, link = "logit")
y |
Binary response vector, with values 0 or 1, or a logical vector. Alternatively, if data are represented by a two-column matrix of proportions (with row-sums = 1), then one can provide one of the columns as the response. This is useful for grouped binomial data, where each observation represents the result of |
weights |
Observation weight vector, with default |
link |
The link function type, with choice |
Binomial GLM object.
Trevor Hastie and James Yang
Maintainer: Trevor Hastie [email protected]
glm.gaussian
, glm.binomial
, glm.poisson
, glm.multinomial
, glm.multigaussian
, glm.cox
.
n <- 100 y <- rbinom(n, 1, 0.5) obj <- glm.binomial(y)
n <- 100 y <- rbinom(n, 1, 0.5) obj <- glm.binomial(y)
A GLM family object specifies the type of model fit, provides the appropriate response object and makes sure it is represented in the right form for the model family, and allows for optional parameters such as a weight vector.
glm.cox( stop, status, start = -Inf, weights = NULL, tie_method = c("efron", "breslow") )
glm.cox( stop, status, start = -Inf, weights = NULL, tie_method = c("efron", "breslow") )
stop |
Stop time vector. |
status |
Binary status vector of same length as |
start |
Start time vector. Default is a vector of |
weights |
Observation weights, with default |
tie_method |
The tie-breaking method - one of |
Cox GLM object.
James Yang, Trevor Hastie, and Balasubramanian Narasimhan
Maintainer: Trevor Hastie [email protected]
glm.gaussian
, glm.binomial
, glm.poisson
, glm.multinomial
, glm.multigaussian
, glm.cox
.
n <- 100 start <- sample.int(20, size=n, replace=TRUE) stop <- start + 1 + sample.int(5, size=n, replace=TRUE) status <- rbinom(n, 1, 0.5) obj <- glm.cox(start, stop, status)
n <- 100 start <- sample.int(20, size=n, replace=TRUE) stop <- start + 1 + sample.int(5, size=n, replace=TRUE) status <- rbinom(n, 1, 0.5) obj <- glm.cox(start, stop, status)
A GLM family object specifies the type of model fit, provides the appropriate response object and makes sure it is represented in the right form for the model family, and allows for optional parameters such as a weight vector.
glm.gaussian(y, weights = NULL, opt = TRUE)
glm.gaussian(y, weights = NULL, opt = TRUE)
y |
Response vector. |
weights |
Observation weight vector, with default |
opt |
If |
Gaussian GLM
James Yang, Trevor Hastie, and Balasubramanian Narasimhan
Maintainer: Trevor Hastie [email protected]
glm.gaussian
, glm.binomial
, glm.poisson
, glm.multinomial
, glm.multigaussian
, glm.cox
.
n <- 100 y <- rnorm(n) obj <- glm.gaussian(y)
n <- 100 y <- rnorm(n) obj <- glm.gaussian(y)
A GLM family object specifies the type of model fit, provides the appropriate response object and makes sure it is represented in the right form for the model family, and allows for optional parameters such as a weight vector.
glm.multigaussian(y, weights = NULL, opt = TRUE)
glm.multigaussian(y, weights = NULL, opt = TRUE)
y |
Response matrix, with two or more columns. |
weights |
Observation weight vector, with default |
opt |
If |
MultiGaussian GLM object.
James Yang, Trevor Hastie, and Balasubramanian Narasimhan
Maintainer: Trevor Hastie [email protected]
glm.gaussian
, glm.binomial
, glm.poisson
, glm.multinomial
, glm.multigaussian
, glm.cox
.
n <- 100 K <- 5 y <- matrix(rnorm(n*K), n, K) obj <- glm.multigaussian(y)
n <- 100 K <- 5 y <- matrix(rnorm(n*K), n, K) obj <- glm.multigaussian(y)
A GLM family object specifies the type of model fit, provides the appropriate response object and makes sure it is represented in the right form for the model family, and allows for optional parameters such as a weight vector.
glm.multinomial(y, weights = NULL)
glm.multinomial(y, weights = NULL)
y |
Response matrix with |
weights |
Observation weights. |
Multinomial GLM object.
James Yang, Trevor Hastie, and Balasubramanian Narasimhan
Maintainer: Trevor Hastie [email protected]
glm.gaussian
, glm.binomial
, glm.poisson
, glm.multinomial
, glm.multigaussian
, glm.cox
.
n <- 100 K <- 5 y <- t(rmultinom(n, 1, rep(1/K, K))) obj <- glm.multinomial(y)
n <- 100 K <- 5 y <- t(rmultinom(n, 1, rep(1/K, K))) obj <- glm.multinomial(y)
A GLM family object specifies the type of model fit, provides the appropriate response object and makes sure it is represented in the right form for the model family, and allows for optional parameters such as a weight vector.
glm.poisson(y, weights = NULL)
glm.poisson(y, weights = NULL)
y |
Response vector of non-negative counts. |
weights |
Observation weight vector, with default |
Poisson GLM object.
James Yang, Trevor Hastie, and Balasubramanian Narasimhan
Maintainer: Trevor Hastie [email protected]
glm.gaussian
, glm.binomial
, glm.poisson
, glm.multinomial
, glm.multigaussian
, glm.cox
.
n <- 100 y <- rpois(n, 1) obj <- glm.poisson(y)
n <- 100 y <- rpois(n, 1) obj <- glm.poisson(y)
Computes a group elastic-net regularization path for a variety of
GLM and other families, including the Cox model. This function
extends the abilities of the glmnet
package to allow for
grouped regularization. The code is very efficient (core routines
are written in C++), and allows for specialized matrix
classes.
grpnet( X, glm, constraints = NULL, groups = NULL, alpha = 1, penalty = NULL, offsets = NULL, lambda = NULL, standardize = TRUE, irls_max_iters = as.integer(10000), irls_tol = 1e-07, max_iters = as.integer(1e+05), tol = 1e-07, adev_tol = 0.9, ddev_tol = 0, newton_tol = 1e-12, newton_max_iters = 1000, n_threads = 1, early_exit = TRUE, intercept = TRUE, screen_rule = c("pivot", "strong"), min_ratio = 0.01, lmda_path_size = 100, max_screen_size = NULL, max_active_size = NULL, pivot_subset_ratio = 0.1, pivot_subset_min = 1, pivot_slack_ratio = 1.25, check_state = FALSE, progress_bar = FALSE, warm_start = NULL )
grpnet( X, glm, constraints = NULL, groups = NULL, alpha = 1, penalty = NULL, offsets = NULL, lambda = NULL, standardize = TRUE, irls_max_iters = as.integer(10000), irls_tol = 1e-07, max_iters = as.integer(1e+05), tol = 1e-07, adev_tol = 0.9, ddev_tol = 0, newton_tol = 1e-12, newton_max_iters = 1000, n_threads = 1, early_exit = TRUE, intercept = TRUE, screen_rule = c("pivot", "strong"), min_ratio = 0.01, lmda_path_size = 100, max_screen_size = NULL, max_active_size = NULL, pivot_subset_ratio = 0.1, pivot_subset_min = 1, pivot_slack_ratio = 1.25, check_state = FALSE, progress_bar = FALSE, warm_start = NULL )
X |
Feature matrix. Either a regualr R matrix, or else an
|
glm |
GLM family/response object. This is an expression that
represents the family, the reponse and other arguments such as
weights, if present. The choices are |
constraints |
Constraints on the parameters. Currently these are ignored. |
groups |
This is an ordered vector of integers that represents the groupings,
with each entry indicating where a group begins. The entries refer to column numbers
in the feature matrix.
If there are |
alpha |
The elasticnet mixing parameter, with
where thte sum is over groups.
|
penalty |
Separate penalty factors can be applied to each group of coefficients.
This is a number that multiplies |
offsets |
Offsets, default is |
lambda |
A user supplied |
standardize |
If |
irls_max_iters |
Maximum number of IRLS iterations, default is
|
irls_tol |
IRLS convergence tolerance, default is |
max_iters |
Maximum total number of coordinate descent
iterations, default is |
tol |
Coordinate descent convergence tolerance, default |
adev_tol |
Fraction deviance explained tolerance, default
|
ddev_tol |
Difference in fraction deviance explained
tolerance, default |
newton_tol |
Convergence tolerance for the BCD update, default
|
newton_max_iters |
Maximum number of iterations for the BCD
update, default |
n_threads |
Number of threads, default |
early_exit |
|
intercept |
Default |
screen_rule |
Screen rule, with default |
min_ratio |
Ratio between smallest and largest value of lambda. Default is 1e-2. |
lmda_path_size |
Number of values for |
max_screen_size |
Maximum number of screen groups. Default is |
max_active_size |
Maximum number of active groups. Default is |
pivot_subset_ratio |
Subset ratio of pivot rule. Default is |
pivot_subset_min |
Minimum subset of pivot rule. Defaults is |
pivot_slack_ratio |
Slack ratio of pivot rule, default is |
check_state |
Check state. Internal parameter, with default |
progress_bar |
Progress bar. Default is |
warm_start |
Warm start (default is |
A list of class "grpnet"
. This has a main component called state
which
represents the fitted path, and a few extra
useful components such as the call
, the family
name, and group_sizes
.
Users typically use methods like predict()
, print()
, plot()
etc to examine the object.
James Yang, Trevor Hastie, and Balasubramanian Narasimhan
Maintainer: Trevor Hastie
[email protected]
Yang, James and Hastie, Trevor. (2024) A Fast and Scalable Pathwise-Solver for Group Lasso
and Elastic Net Penalized Regression via Block-Coordinate Descent. arXiv doi:10.48550/arXiv.2405.08631.
Friedman, J., Hastie, T. and Tibshirani, R. (2008)
Regularization Paths for Generalized Linear Models via Coordinate
Descent (2010), Journal of Statistical Software, Vol. 33(1), 1-22,
doi:10.18637/jss.v033.i01.
Simon, N., Friedman, J., Hastie, T. and Tibshirani, R. (2011)
Regularization Paths for Cox's Proportional
Hazards Model via Coordinate Descent, Journal of Statistical Software, Vol.
39(5), 1-13,
doi:10.18637/jss.v039.i05.
Tibshirani,Robert, Bien, J., Friedman, J., Hastie, T.,Simon, N.,Taylor, J. and
Tibshirani, Ryan. (2012) Strong Rules for Discarding Predictors in
Lasso-type Problems, JRSSB, Vol. 74(2), 245-266,
https://arxiv.org/abs/1011.2234.
cv.grpnet
, predict.grpnet
, plot.grpnet
, print.grpnet
.
set.seed(0) n <- 100 p <- 200 X <- matrix(rnorm(n * p), n, p) y <- X[,1] * rnorm(1) + rnorm(n) fit <- grpnet(X, glm.gaussian(y)) print(fit)
set.seed(0) n <- 100 p <- 200 X <- matrix(rnorm(n * p), n, p) y <- X[,1] * rnorm(1) + rnorm(n) fit <- grpnet(X, glm.gaussian(y)) print(fit)
IO handler for SNP phased, ancestry matrix.
io.snp_phased_ancestry(filename, read_mode = "file")
io.snp_phased_ancestry(filename, read_mode = "file")
filename |
File name. |
read_mode |
Reading mode. |
IO handler for SNP phased, ancestry data.
n <- 123 s <- 423 A <- 8 filename <- paste(tempdir(), "snp_phased_ancestry_dummy.snpdat", sep="/") handle <- io.snp_phased_ancestry(filename) calldata <- matrix( as.integer(sample.int( 2, n * s * 2, replace=TRUE, prob=c(0.7, 0.3) ) - 1), n, s * 2 ) ancestries <- matrix( as.integer(sample.int( A, n * s * 2, replace=TRUE, prob=rep_len(1/A, A) ) - 1), n, s * 2 ) handle$write(calldata, ancestries, A, 1) handle$read() file.remove(filename)
n <- 123 s <- 423 A <- 8 filename <- paste(tempdir(), "snp_phased_ancestry_dummy.snpdat", sep="/") handle <- io.snp_phased_ancestry(filename) calldata <- matrix( as.integer(sample.int( 2, n * s * 2, replace=TRUE, prob=c(0.7, 0.3) ) - 1), n, s * 2 ) ancestries <- matrix( as.integer(sample.int( A, n * s * 2, replace=TRUE, prob=rep_len(1/A, A) ) - 1), n, s * 2 ) handle$write(calldata, ancestries, A, 1) handle$read() file.remove(filename)
IO handler for SNP unphased matrix.
io.snp_unphased(filename, read_mode = "file")
io.snp_unphased(filename, read_mode = "file")
filename |
File name. |
read_mode |
Reading mode. |
IO handler for SNP unphased data.
n <- 123 s <- 423 filename <- paste(tempdir(), "snp_unphased_dummy.snpdat", sep="/") handle <- io.snp_unphased(filename) mat <- matrix( as.integer(sample.int( 3, n * s, replace=TRUE, prob=c(0.7, 0.2, 0.1) ) - 1), n, s ) impute <- double(s) handle$write(mat, "mean", impute, 1) handle$read() file.remove(filename)
n <- 123 s <- 423 filename <- paste(tempdir(), "snp_unphased_dummy.snpdat", sep="/") handle <- io.snp_unphased(filename) mat <- matrix( as.integer(sample.int( 3, n * s, replace=TRUE, prob=c(0.7, 0.2, 0.1) ) - 1), n, s ) impute <- double(s) handle$write(mat, "mean", impute, 1) handle$read() file.remove(filename)
Creates a block-diagonal matrix.
matrix.block_diag(mats, n_threads = 1)
matrix.block_diag(mats, n_threads = 1)
mats |
List of matrices. |
n_threads |
Number of threads. |
Block-diagonal matrix.
Trevor Hastie and James Yang
Maintainer: Trevor Hastie [email protected]
n <- 100 ps <- c(10, 20, 30) mats <- lapply(ps, function(p) { X <- matrix(rnorm(n * p), n, p) matrix.dense(t(X) %*% X, method="cov") }) out <- matrix.block_diag(mats)
n <- 100 ps <- c(10, 20, 30) mats <- lapply(ps, function(p) { X <- matrix(rnorm(n * p), n, p) matrix.dense(t(X) %*% X, method="cov") }) out <- matrix.block_diag(mats)
Creates a concatenation of the matrices.
matrix.concatenate(mats, axis = 2, n_threads = 1)
matrix.concatenate(mats, axis = 2, n_threads = 1)
mats |
List of matrices. |
axis |
The axis along which the matrices will be joined. With axis = 2 (default) this function is equivalent to |
n_threads |
Number of threads. |
Concatenation of matrices. The object is an S4 class with methods for efficient computation in C++ by adelie. Note that for the object itself axis is represented with base 0 (so 1 less than the argument here).
Trevor Hastie and James Yang
Maintainer: Trevor Hastie [email protected]
n <- 100 ps <- c(10, 20, 30) ps <- c(10, 20, 30) n <- 100 mats <- lapply(ps, function(p) { matrix.dense(matrix(rnorm(n * p), n, p)) }) out <- matrix.concatenate(mats, axis=2)
n <- 100 ps <- c(10, 20, 30) ps <- c(10, 20, 30) n <- 100 mats <- lapply(ps, function(p) { matrix.dense(matrix(rnorm(n * p), n, p)) }) out <- matrix.concatenate(mats, axis=2)
Creates a dense matrix object.
matrix.dense(mat, method = c("naive", "cov"), n_threads = 1)
matrix.dense(mat, method = c("naive", "cov"), n_threads = 1)
mat |
The dense matrix. |
method |
Method type, with default |
n_threads |
Number of threads. |
Dense matrix. The object is an S4 class with methods for efficient computation by adelie.
Trevor Hastie and James Yang
Maintainer: Trevor Hastie [email protected]
n <- 100 p <- 20 X_dense <- matrix(rnorm(n * p), n, p) out <- matrix.dense(X_dense, method="naive") A_dense <- t(X_dense) %*% X_dense out <- matrix.dense(A_dense, method="cov")
n <- 100 p <- 20 X_dense <- matrix(rnorm(n * p), n, p) out <- matrix.dense(X_dense, method="naive") A_dense <- t(X_dense) %*% X_dense out <- matrix.dense(A_dense, method="cov")
Creates an eager covariance matrix.
matrix.eager_cov(mat, n_threads = 1)
matrix.eager_cov(mat, n_threads = 1)
mat |
A dense matrix to be used with the |
n_threads |
Number of threads. |
The dense covariance matrix. This matrix is exactly t(mat)%*%mat
, computed with some efficiency.
n <- 100 p <- 20 mat <- matrix(rnorm(n * p), n, p) out <- matrix.eager_cov(mat)
n <- 100 p <- 20 mat <- matrix(rnorm(n * p), n, p) out <- matrix.eager_cov(mat)
Creates a matrix with pairwise interactions.
matrix.interaction( mat, intr_keys = NULL, intr_values, levels = NULL, n_threads = 1 )
matrix.interaction( mat, intr_keys = NULL, intr_values, levels = NULL, n_threads = 1 )
mat |
The dense matrix, which can include factors with levels coded as non-negative integers. |
intr_keys |
List of feature indices. This is a list of all features with which interactions can be formed. Default is |
intr_values |
List of integer vectors of feature indices. For each of the |
levels |
Number of levels for each of the columns of |
n_threads |
Number of threads. |
Pairwise interaction matrix. Logic is used to avoid repetitions. For each factor variable, the column is one-hot-encoded to form a basis for that feature. The object is an S4 class with methods for efficient computation by adelie. Note that some of the arguments are transformed to C++ base 0 for internal use, and if the object is examined, it will reflect that.
Trevor Hastie and James Yang
Maintainer: Trevor Hastie [email protected]
n <- 10 p <- 20 X_dense <- matrix(rnorm(n * p), n, p) X_dense[,1] <- rbinom(n, 4, 0.5) intr_keys <- c(1, 2) intr_values <- list(NULL, c(1, 3)) levels <- c(c(5), rep(1, p-1)) out <- matrix.interaction(X_dense, intr_keys, intr_values, levels)
n <- 10 p <- 20 X_dense <- matrix(rnorm(n * p), n, p) X_dense[,1] <- rbinom(n, 4, 0.5) intr_keys <- c(1, 2) intr_values <- list(NULL, c(1, 3)) levels <- c(c(5), rep(1, p-1)) out <- matrix.interaction(X_dense, intr_keys, intr_values, levels)
Creates a Kronecker product with an identity matrix.
matrix.kronecker_eye(mat, K = 1, n_threads = 1)
matrix.kronecker_eye(mat, K = 1, n_threads = 1)
mat |
The matrix to view as a Kronecker product. |
K |
Dimension of the identity matrix (default is 1, which does essentially nothing). |
n_threads |
Number of threads. |
Kronecker product with identity matrix. If mat
is n x p, the the resulting matrix will be nK x np.
The object is an S4 class with methods for efficient computation by adelie.
James Yang, Trevor Hastie, and Balasubramanian Narasimhan
Maintainer: Trevor Hastie [email protected]
n <- 100 p <- 20 K <- 2 mat <- matrix(rnorm(n * p), n, p) out <- matrix.kronecker_eye(mat, K) mat <- matrix.dense(mat) out <- matrix.kronecker_eye(mat, K)
n <- 100 p <- 20 K <- 2 mat <- matrix(rnorm(n * p), n, p) out <- matrix.kronecker_eye(mat, K) mat <- matrix.dense(mat) out <- matrix.kronecker_eye(mat, K)
Creates a lazy covariance matrix.
matrix.lazy_cov(mat, n_threads = 1)
matrix.lazy_cov(mat, n_threads = 1)
mat |
A dense data matrix to be used with the |
n_threads |
Number of threads. |
Lazy covariance matrix. This is essentially the same matrix, but with a setup to create covariance terms as needed on the fly. The object is an S4 class with methods for efficient computation by adelie.
James Yang, Trevor Hastie, and Balasubramanian Narasimhan
Maintainer: Trevor Hastie [email protected]
n <- 100 p <- 20 mat <- matrix(rnorm(n * p), n, p) out <- matrix.lazy_cov(mat)
n <- 100 p <- 20 mat <- matrix(rnorm(n * p), n, p) out <- matrix.lazy_cov(mat)
Creates a one-hot encoded matrix.
matrix.one_hot(mat, levels = NULL, n_threads = 1)
matrix.one_hot(mat, levels = NULL, n_threads = 1)
mat |
A dense matrix, which can include factors with levels coded as non-negative integers. |
levels |
Number of levels for each of the columns of |
n_threads |
Number of threads. |
One-hot encoded matrix. All the factor columns, with levels>1, are replaced by a collection of one-hot encoded versions (dummy matrices). The resulting matrix has sum(levels)
columns.
The object is an S4 class with methods for efficient computation by adelie. Note that some of the arguments are transformed to C++ base 0 for internal use, and if the object is examined, it will reflect that.
James Yang, Trevor Hastie, and Balasubramanian Narasimhan
Maintainer: Trevor Hastie [email protected]
n <- 100 p <- 20 mat <- matrix(rnorm(n * p), n, p) out <- matrix.one_hot(mat)
n <- 100 p <- 20 mat <- matrix(rnorm(n * p), n, p) out <- matrix.one_hot(mat)
Creates a SNP phased, ancestry matrix.
matrix.snp_phased_ancestry(io, n_threads = 1)
matrix.snp_phased_ancestry(io, n_threads = 1)
io |
IO handler. |
n_threads |
Number of threads. |
SNP phased, ancestry matrix.
James Yang, Trevor Hastie, and Balasubramanian Narasimhan
Maintainer: Trevor Hastie [email protected]
n <- 123 s <- 423 A <- 8 filename <- paste(tempdir(), "snp_phased_ancestry_dummy.snpdat", sep="/") handle <- io.snp_phased_ancestry(filename) calldata <- matrix( as.integer(sample.int( 2, n * s * 2, replace=TRUE, prob=c(0.7, 0.3) ) - 1), n, s * 2 ) ancestries <- matrix( as.integer(sample.int( A, n * s * 2, replace=TRUE, prob=rep_len(1/A, A) ) - 1), n, s * 2 ) handle$write(calldata, ancestries, A, 1) out <- matrix.snp_phased_ancestry(handle) file.remove(filename)
n <- 123 s <- 423 A <- 8 filename <- paste(tempdir(), "snp_phased_ancestry_dummy.snpdat", sep="/") handle <- io.snp_phased_ancestry(filename) calldata <- matrix( as.integer(sample.int( 2, n * s * 2, replace=TRUE, prob=c(0.7, 0.3) ) - 1), n, s * 2 ) ancestries <- matrix( as.integer(sample.int( A, n * s * 2, replace=TRUE, prob=rep_len(1/A, A) ) - 1), n, s * 2 ) handle$write(calldata, ancestries, A, 1) out <- matrix.snp_phased_ancestry(handle) file.remove(filename)
Creates a SNP unphased matrix.
matrix.snp_unphased(io, n_threads = 1)
matrix.snp_unphased(io, n_threads = 1)
io |
IO handler. |
n_threads |
Number of threads. |
SNP unphased matrix.
n <- 123 s <- 423 filename <- paste(tempdir(), "snp_unphased_dummy.snpdat", sep="/") handle <- io.snp_unphased(filename) mat <- matrix( as.integer(sample.int( 3, n * s, replace=TRUE, prob=c(0.7, 0.2, 0.1) ) - 1), n, s ) impute <- double(s) handle$write(mat, "mean", impute, 1) out <- matrix.snp_unphased(handle) file.remove(filename)
n <- 123 s <- 423 filename <- paste(tempdir(), "snp_unphased_dummy.snpdat", sep="/") handle <- io.snp_unphased(filename) mat <- matrix( as.integer(sample.int( 3, n * s, replace=TRUE, prob=c(0.7, 0.2, 0.1) ) - 1), n, s ) impute <- double(s) handle$write(mat, "mean", impute, 1) out <- matrix.snp_unphased(handle) file.remove(filename)
Creates a sparse matrix object.
matrix.sparse(mat, method = c("naive", "cov"), n_threads = 1)
matrix.sparse(mat, method = c("naive", "cov"), n_threads = 1)
mat |
A sparse matrix. |
method |
Method type, with default |
n_threads |
Number of threads. |
Sparse matrix object. The object is an S4 class with methods for efficient computation by adelie.
n <- 100 p <- 20 X_dense <- matrix(rnorm(n * p), n, p) X_sp <- as(X_dense, "dgCMatrix") out <- matrix.sparse(X_sp, method="naive") A_dense <- t(X_dense) %*% X_dense A_sp <- as(A_dense, "dgCMatrix") out <- matrix.sparse(A_sp, method="cov")
n <- 100 p <- 20 X_dense <- matrix(rnorm(n * p), n, p) X_sp <- as(X_dense, "dgCMatrix") out <- matrix.sparse(X_sp, method="naive") A_dense <- t(X_dense) %*% X_dense A_sp <- as(A_dense, "dgCMatrix") out <- matrix.sparse(A_sp, method="cov")
Creates a standardized matrix.
matrix.standardize( mat, centers = NULL, scales = NULL, weights = NULL, ddof = 0, n_threads = 1 )
matrix.standardize( mat, centers = NULL, scales = NULL, weights = NULL, ddof = 0, n_threads = 1 )
mat |
An |
centers |
The center values. Default is to use the column means. |
scales |
The scale values. Default is to use the sample standard deviations. |
weights |
Observation weight vector, which defaults to 1/n per observation. |
ddof |
Degrees of freedom for standard deviations, with default 0 (1/n). The alternative is 1 leading to 1/(n-1). |
n_threads |
Number of threads. |
Standardized matrix. The object is an S4 class with methods for efficient computation by adelie.
James Yang, Trevor Hastie, and Balasubramanian Narasimhan
Maintainer: Trevor Hastie [email protected]
n <- 100 p <- 20 X <- matrix(rnorm(n * p), n, p) out <- matrix.standardize(matrix.dense(X))
n <- 100 p <- 20 X <- matrix(rnorm(n * p), n, p) out <- matrix.standardize(matrix.dense(X))
Creates a subset of the matrix along an axis.
matrix.subset(mat, indices, axis = 1, n_threads = 1)
matrix.subset(mat, indices, axis = 1, n_threads = 1)
mat |
The |
indices |
Vector of indices to subset the matrix. |
axis |
The axis along which to subset (2 is columns, 1 is rows). |
n_threads |
Number of threads. |
Matrix subsetted along the appropriate axis. The object is an S4 class with methods for efficient computation by adelie.
James Yang, Trevor Hastie, and Balasubramanian Narasimhan
Maintainer: Trevor Hastie [email protected]
n <- 100 p <- 20 X <- matrix.dense(matrix(rnorm(n * p), n, p)) indices <- c(1, 3, 10) out <- matrix.subset(X, indices, axis=1) out <- matrix.subset(X, indices, axis=2)
n <- 100 p <- 20 X <- matrix.dense(matrix(rnorm(n * p), n, p)) indices <- c(1, 3, 10) out <- matrix.subset(X, indices, axis=1) out <- matrix.subset(X, indices, axis=2)
Plots the cross-validation curve, and upper and lower standard deviation
curves, as a function of the lambda
values used.
## S3 method for class 'cv.grpnet' plot(x, sign.lambda = -1, ...)
## S3 method for class 'cv.grpnet' plot(x, sign.lambda = -1, ...)
x |
fitted |
sign.lambda |
Either plot against |
... |
Other graphical parameters |
A plot is produced, and nothing is returned.
James Yang, Trevor Hastie, and Balasubramanian Narasimhan
Maintainer:
Trevor Hastie [email protected]
Yang, James and Hastie, Trevor. (2024) A Fast and Scalable Pathwise-Solver for Group Lasso
and Elastic Net Penalized Regression via Block-Coordinate Descent. arXiv doi:10.48550/arXiv.2405.08631.
Adelie Python user guide https://jamesyang007.github.io/adelie/
grpnet
and cv.grpnet
.
set.seed(1010) n = 1000 p = 100 nzc = trunc(p/10) x = matrix(rnorm(n * p), n, p) beta = rnorm(nzc) fx = (x[, seq(nzc)] %*% beta) eps = rnorm(n) * 5 y = drop(fx + eps) px = exp(fx) px = px/(1 + px) ly = rbinom(n = length(px), prob = px, size = 1) cvob1 = cv.grpnet(x, glm.gaussian(y)) plot(cvob1) title("Gaussian Family", line = 2.5) frame() set.seed(1011) cvob2 = cv.grpnet(x, glm.binomial(ly)) plot(cvob2) title("Binomial Family", line = 2.5)
set.seed(1010) n = 1000 p = 100 nzc = trunc(p/10) x = matrix(rnorm(n * p), n, p) beta = rnorm(nzc) fx = (x[, seq(nzc)] %*% beta) eps = rnorm(n) * 5 y = drop(fx + eps) px = exp(fx) px = px/(1 + px) ly = rbinom(n = length(px), prob = px, size = 1) cvob1 = cv.grpnet(x, glm.gaussian(y)) plot(cvob1) title("Gaussian Family", line = 2.5) frame() set.seed(1011) cvob2 = cv.grpnet(x, glm.binomial(ly)) plot(cvob2) title("Binomial Family", line = 2.5)
Produces a coefficient profile plot of the coefficient paths for a fitted
"grpnet"
object.
## S3 method for class 'grpnet' plot(x, sign.lambda = -1, glm.name = TRUE, ...)
## S3 method for class 'grpnet' plot(x, sign.lambda = -1, glm.name = TRUE, ...)
x |
fitted |
sign.lambda |
This determines whether we plot against |
glm.name |
This is a logical (default |
... |
Other graphical parameters to plot |
A coefficient profile plot is produced. If x
is a multinomial or multigaussian model,
the 2norm of the vector of coefficients is plotted.
James Yang, Trevor Hastie, and Balasubramanian Narasimhan
Maintainer: Trevor Hastie [email protected]
Yang, James and Hastie, Trevor. (2024) A Fast and Scalable Pathwise-Solver for Group Lasso and Elastic Net Penalized Regression via Block-Coordinate Descent. arXiv doi:10.48550/arXiv.2405.08631.
grpnet
, and print
, and coef
methods, and
cv.grpnet
.
x=matrix(rnorm(100*20),100,20) y=rnorm(100) fit1=grpnet(x,glm.gaussian(y)) plot(fit1) g4=diag(4)[sample(1:4,100,replace=TRUE),] fit2=grpnet(x,glm.multinomial(g4)) plot(fit2,lwd=3) fit3=grpnet(x,glm.gaussian(y),groups=c(1,5,9,13,17)) plot(fit3)
x=matrix(rnorm(100*20),100,20) y=rnorm(100) fit1=grpnet(x,glm.gaussian(y)) plot(fit1) g4=diag(4)[sample(1:4,100,replace=TRUE),] fit2=grpnet(x,glm.multinomial(g4)) plot(fit2,lwd=3) fit3=grpnet(x,glm.gaussian(y),groups=c(1,5,9,13,17)) plot(fit3)
This function makes predictions from a cross-validated grpnet model, using
the stored "grpnet.fit"
object, and the optimal value chosen for
lambda
.
## S3 method for class 'cv.grpnet' predict(object, newx, lambda = c("lambda.1se", "lambda.min"), ...)
## S3 method for class 'cv.grpnet' predict(object, newx, lambda = c("lambda.1se", "lambda.min"), ...)
object |
Fitted |
newx |
Matrix of new values for |
lambda |
Value(s) of the penalty parameter |
... |
Not used. Other arguments to predict. |
This function makes it easier to use the results of cross-validation to make a prediction.
The object returned depends on the arguments.
James Yang, Trevor Hastie, and Balasubramanian Narasimhan
Maintainer: Trevor Hastie [email protected]
Yang, James and Hastie, Trevor. (2024) A Fast and Scalable Pathwise-Solver for Group Lasso and Elastic Net Penalized Regression via Block-Coordinate Descent. arXiv doi:10.48550/arXiv.2405.08631.
grpnet
, and print
, and coef
methods, and
cv.grpnet
.
x = matrix(rnorm(100 * 20), 100, 20) y = rnorm(100) cv.fit = cv.grpnet(x, glm.gaussian(y)) predict(cv.fit, newx = x[1:5, ]) coef(cv.fit) coef(cv.fit, lambda = "lambda.min") predict(cv.fit, newx = x[1:5, ], lambda = c(0.001, 0.002))
x = matrix(rnorm(100 * 20), 100, 20) y = rnorm(100) cv.fit = cv.grpnet(x, glm.gaussian(y)) predict(cv.fit, newx = x[1:5, ]) coef(cv.fit) coef(cv.fit, lambda = "lambda.min") predict(cv.fit, newx = x[1:5, ], lambda = c(0.001, 0.002))
Similar to other predict methods, this functions predicts linear predictors,
coefficients and more from a fitted "grpnet"
object.
## S3 method for class 'grpnet' predict( object, newx, lambda = NULL, type = c("link", "response", "coefficients"), newoffsets = NULL, n_threads = 1, ... ) ## S3 method for class 'grpnet' coef(object, lambda = NULL, ...)
## S3 method for class 'grpnet' predict( object, newx, lambda = NULL, type = c("link", "response", "coefficients"), newoffsets = NULL, n_threads = 1, ... ) ## S3 method for class 'grpnet' coef(object, lambda = NULL, ...)
object |
Fitted |
newx |
Matrix of new values for |
lambda |
Value(s) of the penalty parameter |
type |
Type of prediction required. Type |
newoffsets |
If an offset is used in the fit, then one must be supplied
for making predictions (except for |
n_threads |
Number of threads, default |
... |
Currently ignored. |
The shape of the objects returned are different for "multinomial"
and "multigaussian"
objects
coef(...)
is equivalent to predict(type="coefficients",...)
The object returned depends on type.
James Yang, Trevor Hastie, and Balasubramanian Narasimhan
Maintainer: Trevor Hastie
[email protected]
Yang, James and Hastie, Trevor. (2024) A Fast and Scalable Pathwise-Solver for Group Lasso
and Elastic Net Penalized Regression via Block-Coordinate Descent. arXiv doi:10.48550/arXiv.2405.08631.
Adelie Python user guide https://jamesyang007.github.io/adelie/
grpnet
, and print
, and coef
methods, and
cv.grpnet
.
set.seed(0) n <- 100 p <- 200 X <- matrix(rnorm(n * p), n, p) y <- X[,1] * rnorm(1) + rnorm(n) fit <- grpnet(X, glm.gaussian(y)) coef(fit) predict(fit,newx = X[1:5,])
set.seed(0) n <- 100 p <- 200 X <- matrix(rnorm(n * p), n, p) y <- X[,1] * rnorm(1) + rnorm(n) fit <- grpnet(X, glm.gaussian(y)) coef(fit) predict(fit,newx = X[1:5,])
Print a summary of the results of cross-validation for a grpnet model.
## S3 method for class 'cv.grpnet' print(x, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'cv.grpnet' print(x, digits = max(3, getOption("digits") - 3), ...)
x |
fitted 'cv.grpnet' object |
digits |
significant digits in printout |
... |
additional print arguments |
James Yang, Trevor Hastie, and Balasubramanian Narasimhan
Maintainer: Trevor Hastie [email protected]
Yang, James and Hastie, Trevor. (2024) A Fast and Scalable Pathwise-Solver for Group Lasso and Elastic Net Penalized Regression via Block-Coordinate Descent. arXiv doi:10.48550/arXiv.2405.08631.
grpnet
, predict
and coef
methods.
x = matrix(rnorm(100 * 20), 100, 20) y = rnorm(100) fit1 = cv.grpnet(x, glm.gaussian(y)) print(fit1)
x = matrix(rnorm(100 * 20), 100, 20) y = rnorm(100) fit1 = cv.grpnet(x, glm.gaussian(y)) print(fit1)
Print a summary of the grpnet path at each step along the path.
## S3 method for class 'grpnet' print(x, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'grpnet' print(x, digits = max(3, getOption("digits") - 3), ...)
x |
fitted grpnet object |
digits |
significant digits in printout |
... |
additional print arguments |
The call that produced the object x
is printed, followed by a
three-column matrix with columns Df
, %Dev
and Lambda
.
The Df
column is the number of nonzero coefficients (Df is a
reasonable name only for lasso fits). %Dev
is the percent deviance
explained (relative to the null deviance).
The matrix above is silently returned
Yang, James and Hastie, Trevor. (2024) A Fast and Scalable Pathwise-Solver for Group Lasso and Elastic Net Penalized Regression via Block-Coordinate Descent. arXiv doi:10.48550/arXiv.2405.08631.
grpnet
, predict
, plot
and coef
methods.
x = matrix(rnorm(100 * 20), 100, 20) y = rnorm(100) fit1 = grpnet(x, glm.gaussian(y)) print(fit1)
x = matrix(rnorm(100 * 20), 100, 20) y = rnorm(100) fit1 = grpnet(x, glm.gaussian(y)) print(fit1)
Set configuration settings.
set_configs(name, value = NULL)
set_configs(name, value = NULL)
name |
Configuration variable name. |
value |
Value to assign to the configuration variable. |
Assigned value.
set_configs("hessian_min", 1e-6) set_configs("hessian_min")
set_configs("hessian_min", 1e-6) set_configs("hessian_min")