**Help file for BXD power app**

This app (http://power.genenetwork.org/) is designed to be used to estimate power of QTL studies in recombinant inbred line derived from two inbred parents, especially the BXD recombinant inbred family. Given the estimated heritability of the trait and estimated locus effect size, the tool will show the user the study’s estimated power to detect that effect, given a number of strains used and within-strain biological replicates.

The app is build using Shiny and the ‘qtlDesign’ R package. All values were calculated using ‘qtlDesign’, and the resultant spreadsheet is queried by the Shiny app.

All code for this app can be found at https://github.com/Dashbrook/BXD_power_calculator_app.

In this document, we will first give a broad overview of the terms used, which should be sufficient for most users. We will then show copies of all the calculations used to produce estimated values, and explanations of the variables, which we hope will satisfy advanced users. All calculations can be replicated used the ‘qtlDesign’ R package.

Here we are taking an estimate of the broad sense heritability of the phenotype. Broad sense heritability (H^{2}) is the total genetic contributions to a population’s phenotypic variance including additive, dominant, and epistatic, as well as maternal and paternal effects. It is given by the variance due to genotype (gen.var) divided by the variance due to environment (env.var).

In our case gen.var was kept at 1, and the env.var was varied to give different H2 values. These gen.var and env.var values were then used for a number of calculations below.

**Biological (within-strain) replicates**

Because all members of a recombinant inbred population are fully inbred the same genome can be sampled multiple times. By doing this, the environmental variance is reduced, bringing the strain mean closer to the ‘true’ variance due to genetics alone.

This is the heritability corrected for the number of within-strain biological replicates (taken from Belknap, 1998). H^{2}_{RIx̅ }= gen.var/(gen.var+env.var/bio.reps).

This is simply the number of different strains being used. In some populations this is limited (i.e. most mouse recombinant inbred populations contain less than 40 strains). In the case of the BXD, there are ~150 extant strains, with the possibility of creating more.

**Standardized locus effect size**

This is a standardized locus effect size, that is the difference in mean trait value between animals with the B/B and D/D genotypes at this locus (i.e. the two homozygotes) as a proportion of the total genetic variance, with 1 being the total genetic variance. Note that these locus effect sizes are twice as high as will be typically estimated in a study using traditional F2 or backcrosses. For example, in GeneNetwork, if one calculated the R2 between a phenotype and the genotype at the peak locus, this R2 would be the proportion of the total genetic variance explained by the QTL,and therefore is the standardized locus effect size.

The locus effect size is the amount of the variance which is due to genetics (gen.var) explained by a single QTL. That is, a value of 0.2 would mean that a QTL explains 20% of the genetic variance, and a value of 1.0 indicates a Mendelian locus (one QTL which explains all of the genetic variance).

Two factors contribute to the much higher level of explained variance of QTLs when using inbred panels. The first is because all loci in inbred panels are homozygous, the same number of sampled animals will explain twice as much of the genetic variance as it would in an F2 cross, and quadruple the variance explained compared to a backcross (Belknap, 1998). Effectively, in an RI we are only examining the most extreme ends of the distribution, providing greater power to detect additive effects. The drawback is obviously that we cannot detect dominance effects.

The second reason is due to replicability. When effect size is treated as the proportion of total variance explained by heritable factors, effect size will increase as environmental effect decreases due to replication: that is, resampling decreases the standard error of the mean, boosting the effect size (Belknap, 1998). This is analogous to the increase in heritability above, but rather than acting on the variance explained by strain (i.e. the genome type), it is acting on the variance explained by a particular locus (i.e. the genotype at that locus).

For a large midbrain gene expression array dataset (55,683 probes, across 129 individuals of 37 BXD strains, with mean 3.5 replicates per strain, with a range of 1-5 replicates), we collected all cis-eQTLs from GeneNetwork (6867 total), and calculated the proportion of variance in gene expression explained by the cis-eQTL marker using each individual or using strain means (i.e. biological replicates of the same genome). This shows that the mean proportion of variance explained is greatly increased (Figure S2). For variants with a small effect size (≤0.2; n = 800 cis-eQTLs), the mean improvement is 2.49, that is, the average proportion of variance in phenotype explained will be 149% higher when using strain means, than when using indiviudals. Across all cis-eQTLs (n = 6867), the effect size when using each animal individually is 0.49, whereas when using strain means of probe expression, the variance explained is 0.66. The mean improvement per probe is 1.56X: a 56% increase in variance explained by the locus. All data needed to replicate this is freely and publically availabe on GN, ID GN381.

Adapted from:

Ashbrook, D. G., Arends, D., Prins, P., Mulligan, M. K., Roy, S., Williams, E. G., et al. (2019). The expanded BXD family of mice: A cohort for experimental systems genetics and precision medicine. *bioRxiv*, 672097. doi:10.1101/672097.

This is the power of the experiment to correctly detect a true positive QTL, given the other values above. Therefore, the line on the graph is the power to detect a QTL explaining a locus of the given effect size.

**Estimating heritability and effect size**

These two values are necessary for our calculations, but are often unknown. We suggest two ways to estimate them.

1) Use already gathered data. For the BXD population there are >6000 phenotypes already gathered, and publicly available on GeneNetwork (http://www.genenetwork.org). This means that for many traits, similar phenotypes may already have been gathered. Therefore, this data on GeneNetwork can be examined to get ballpark figures.

2) A diallel cross. A diallel cross is a simple F1 cross between the parents of the recombinant inbred population (in the case of the BXD the C57BL/6J and DBA/2J). The parents and reciprocal F1s are phenotyped, and from this both heritability and the effect size can be estimated. These are actually underestimates: the heritability calculated will be narrow sense heritability (h^{2}) which is only the additive effect, and therefore will be lower than H2; the genetic variance of most phenotypes within the BXD population is greater than that between the parental strains, due to the oligogenic or polygenic nature of most traits (i.e. the B6 and D2 will contain a mixture of alleles that both increase and decrease the trait, whereas some BXD strain may contain ‘all’ of the alleles that increase the trait, and other BXD strains may contain ‘all’ the alleles which decrease the trait).

** **

**Advanced users/calculations:**

**How the values were calculated **

The values used in the app were calculated using the *powercalc* function in the qtlDesign package. For recombinant inbred populations this uses the function:

function (cross, n, effect, sigma2, env.var, gen.var, thresh = 3,

sel.frac = 1, theta = 0, bio.reps = 1)

{

if (missing(sigma2)) {

if ((missing(env.var)) | (missing(gen.var)))

stop(“Need either sigma2 or both env.var and gen.var.”)

sigma2 <- error.var(cross, env.var, gen.var, bio.reps)

}

else {

if ((!missing(env.var)) | (!missing(gen.var)))

stop("Need either sigma2 or both env.var and gen.var.")

}

if (cross == "bc")

ans <- powercalc.bc(n, effect, sigma2, thresh, sel.frac,

theta)

else if (cross == "f2")

ans <- powercalc.f2(n, effect, sigma2, thresh, sel.frac,

theta)

else if (cross == "ri")

ans <- powercalc.ri(n, effect, sigma2, thresh)

else stop("Unknown cross ", cross, ".")

ans <- c(ans, 100 * prop.var(cross, effect, sigma2))

names(ans) <- c("power", "percent.var.explained")

t(ans)

}

Where the powercalc.ri is:

function (n, effect, sigma2, thresh = 3)

{

powercalc.bc(n, effect * 2, sigma2, thresh, sel.frac = 1,

theta = 0)

}

**Sigma2**

sigma2 <- error.var(cross, env.var, gen.var, bio.reps)

**error.var**

env.var/bio.reps + 1 * gen.var

**bio.reps**

The number of within strain (i.e. genetically identical) biological replicates used. Part of the user input, ranging from 1-20

**gen.var**

Total genetic variance. For the purpose of these calculations this was kept as a constant, 1, so that environmental variance (env.var) could be used to calculate heritability.

**env.var**

Total environmental variance. This was altered to produce heritability values from 0 to 1 when gen.var was kept constant.