Rbayz

Bayesian mixed models, shrinkage, sparse and interaction kernel regression

Build
StatusCoverage
Status
coveralls

Jump to:

For various help check the Help Index page.

R/bayz overview and features

R/bayz – package Rbayz – with main function bayz():

R/bayz has been developed in agriculture and ecology for various modeling and prediction needs using genomics, multi-omics, phenomics, enviromics, time-series images and spectral data, multi-time multi-tissue transcriptomics data, and more. This includes basic applications such as genomic and genomic-by-enviromic prediction and multi-trait modeling using a Bayesian factor-analytic approach, is extendable to efficienctly learn covariance structures in higher order interactions such as trait x time x enviroment, and may be useful in any domain where mixed models, (multiple) kernels and/or large sets of predictors are relevant.

For further help check links in the text above, and the Quick tour and help section below that can help finding relevant documentation. If you’er reading this as part of the R package help, full documentation is available on github at ljanss.github.io/Rbayz/.

R/bayz technical details

R/Bayz is implemented in C++ using Rcpp and uses MCMC-based inference in Bayesian models. Current version: 0.12 (March 2026).

Downloading and installing R/bayz

R/bayz us not yet released as a cran R package. It can be downloaded and installed with the options shown below.

1. Precompiled binary versions (Windows, Mac)

There are precompiled binary packages for Windows and MacOS, which can be installed using install.packages() or with devtools-tools. The MacOS precompiled version is built on Mac silicon architecture, the Windows version on Windows 11. If these precompiled versions do not work on your Windows or MacOS system, use option 2 below to install from github source.

Using install.packages() requires to use repos=NULL to circumvent use of cran, but which requires then to manually install dependencies:

install.packages("Rcpp")
install.packages("nlme")
install.packages("coda")
install.packages("https://ljanss.github.io/Rbayz/Rbayz_0.9.0.zip", repos=NULL, type="win.binary") # Windows
install.packages("https://ljanss.github.io/Rbayz/Rbayz_0.9.0.tgz", repos=NULL, type="mac.binary") # MacOS

Alternatively, if you have devtools installed, you can use devtools::install_url(), which automatically does install dependencies as well:

devtools::install_url("https://ljanss.github.io/Rbayz/Rbayz_0.9.0.zip", type="win.binary") # Windows
devtools::install_url("https://ljanss.github.io/Rbayz/Rbayz_0.9.0.tgz", type="mac.binary") # MacOS

2. Install source from Github repository (all systems including linux)

This requires a development environment in R, which needs Rtools on Windows, or “command line tools” on MacOS, and the devtools package. On linux the development tools may often be pre-installed. The below commands run in the R terminal.

Installing devtools and package dependencies:

install.packages("devtools")
install.packages("Rcpp")
install.packages("nlme")
install.packages("coda")

Download and install/compile Rbayz using:

library(devtools)
devtools::install_github("ljanss/Rbayz")

Quick tour and help

Basic syntax and Bayesian mixed model

R/bayz main function bayz() accepts model formulas in an extended R-formula syntax where all explanatory (right-hand-side) variables are wrapped by a term to specify how to fit variable(s) in the model. For instance:

fit1 <- bayz(Yield ~ fx(Year) + rn(Variety), data=example1)

specifies a model to fit Yield with Year as a fixed factor and Variety as a random factor. Wrappers fx() and rn() force data variables Year and Variety as factors.

Interactions between fators are specified using a colon, such as fx(Year:Location) for fixed effects of Year-Location interactions and with the same meaning as in other R model functions. This works likewise for random effects in rn(). Automatic expansion with main effects is not supported, and main effects, if desired, should be explicitly added in the model.

More on the basic syntax for model building (including using regressions) here:

Use of model output

R/bayz supports many common R methods to work on model output, such as summary(), standard methods to extract fixed and random effect estimaes fixef() and ranef(), extraction of variance estimates with vhest() (variance and hyper parameter estimates), conv() to find convergence diagnostics, plot() to produce trace and density plots for MCMC diagnostics, and predict() which extracts predictions for missing data.

More on the basic syntax for model building (including using regressions) here:

Use of kernels on random effects and on interactions

For random effects, variance structures can be added with a V= option inside the rn(), for instance

fit1 <- bayz(Yield ~ fx(Year) + rn(Variety, V=Gmat), data=example1)

to model a variance-covariance structure Gmat $\sigma_g^2$ for Variety effects. Kernels should be prepared with rownames matching the levels of the variable in the data, but data levels may be repeated or missing (the latter implies prediction of Variety effects not in the data).

On interactions, products of kernels can be specified, which are interpreted as Kronecker products (here with main and interaction effects):

fit1 <- bayz(Yield ~ fx(Environment) + rn(Variety, V=Gmat) +
                     rn(Variety:Environment, V=Gmat*Emat), data=example1)

This implies a variance-covariance structure for Variety-by-Environment effects Gmat $\otimes$ Emat $\sigma_{ge}^2$. The Emat could be a weather-based similarity matrix between environments, and, like other kernels, should have rownames matching the levels of (in this example) the Environment variable in the data.

R/bayz always fits kernels as lower-rank embeddings (‘factor analytic’) and, for products of kernels, keeps sets of embeddings as a tensor-decomposition which allows efficient computations on large kernels and in higher dimensions, and allows to sparsify and re-weight kernels in each dimension.

More on working with kernels and interactions can be found here:

Estimated kernels and multi-trait / multi-response models

Description under development…

Sets of features and significances for individual features

R/bayz can fit sets of features directly using the rr() (ridge or random regression) wrapper term. These sets of features are supplied as matrices, with row-names to match to the data, and are fitted as random / shrunken regressions. Significances for individual features can be obtained as “p-values from the random effects” ($p_r$ values), extracted with the prval() method. Running a multiple regression on large sets of features can be computationally intensive, but can be speeded up using sparse regression options, or by running the model on an SVD decomposition of the feature matrix. When run on an SVD decomposition, posterior uncertainties and significances for individual features can be obtained approximately, or exactly by back-solving.

More on working with sets of features and on extracting $p_r$ values can be found here: