## Abstract

Genomic prediction is a useful tool to accelerate genetic gain in selection using DNA marker information. However, this technology typically relies on standard prediction procedures, such as genomic BLUP, that are not designed to accommodate population heterogeneity resulting from differences in marker effects across populations. In this study, we assayed different prediction procedures to capture marker-by-population interactions in genomic prediction models. Prediction procedures included genomic BLUP and two kernel-based extensions of genomic BLUP which explicitly accounted for population heterogeneity. To model population heterogeneity, dissemblance between populations was either depicted by a unique coefficient (as previously reported), or a more flexible function of genetic distance between populations (proposed herein). Models under investigation were applied in a diverse switchgrass sample under two validation schemes: whole-sample calibration, where all individuals except selection candidates are included in the calibration set, and cross-population calibration, where the target population is entirely excluded from the calibration set. First, we showed that using fixed effects, from principal components or putative population groups, appeared detrimental to prediction accuracy, especially in cross-population calibration. Then we showed that modeling population heterogeneity by our proposed procedure resulted in highly significant improvements in model fit. In such cases, gains in accuracy were often positive. These results suggest that population heterogeneity may be parsimoniously captured by kernel methods. However, in cases where improvement in model fit by our proposed procedure is null-to-moderate, ignoring heterogeneity should probably be preferred due to the robustness and simplicity of the standard genomic BLUP model.

- Genomic Prediction
- population heterogeneity
- marker-by-population interaction
- kernel functions
*Panicum virgatum*- GenPred
- Shared Data Resources

Genomic prediction has proved a useful tool to predict genetic merit in plant and animal breeding (Hayes *et al.* 2009a, Lorenz *et al.* 2011). This technology consists of learning relationships between DNA markers and phenotypes, which arise from the non-random association (linkage disequilibrium; LD) between DNA markers and causal genetic variants having direct effects on the trait studied (Meuwissen *et al.* 2001). Standard genomic prediction models, including genomic BLUP (GBLUP; VanRaden 2008, Hayes *et al.* 2009b) or Bayesian linear regression (BLR) models (Meuwissen *et al.* 2001, Gianola *et al.* 2009), assume that the effects of causal variants are linear and purely additive, so estimated effects do not capture any dependence on context, arising for example from interactions of causal variants with environmental or genetic backgrounds. Initially, genomic prediction models have been proposed for applications in populations that are relatively homogeneous with respect to LD patterns and interactions involving causal variants (Meuwissen *et al.* 2001). In such situations, increasing the size of the calibration set (CS) – the set of individuals used to estimate the model’s parameters – would typically benefit accuracy of the models (Lorenzana & Bernardo 2009, VanRaden *et al.* 2009). However, in practice, increasing the CS size may often involve calibrating prediction models on individuals with inconsistent LD patterns and/or backgrounds, which may result in reduced accuracy (Wientjes *et al.* 2016). This issue will arise in the typical situation where an initially homogeneous CS is augmented with individuals from extraneous populations, that is, multi-population – or (in the animal literature) multi-breed – calibration (Lund *et al.* 2014). Recently, studies in both plant and animal breeding have assessed the usefulness of combining populations from different genetic backgrounds in genomic prediction.

Under standard genomic prediction models (where population heterogeneity is ignored), the simulation study of de Roos *et al.* (2009) suggested that adding an extraneous population to a CS may benefit prediction accuracy if the added population is not too dissimilar (in terms of divergence time) from the initial CS. These authors also suggested that high enough marker density could prevent prediction accuracy from decreasing, even in cases of strong divergence between populations. Consistently, most empirical studies of multi-population calibration with high marker density, based on standard GBLUP and/or BLR, have reported little or no gain in accuracy under strong population structure (Lehermeier *et al.* 2015, Jarquin *et al.* 2016, Hayes *et al.* 2009c, Erbe *et al.* 2012). In contrast, only a few studies have reported substantial increases in accuracy from multi-population calibration in similar conditions (Technow *et al.* 2013, Daetwyler *et al.* 2012).

In multi-population prediction models (where marker-by-population interactions are explicitly accounted for), studies have proposed to fit, to the whole set of available individuals, models that were capable of accommodating population heterogeneity explicitly. This type of models includes multi-trait GBLUP models, with “traits” corresponding to population backgrounds (Karoui *et al.* 2012, Carillier *et al.* 2014, Lehermeier *et al.* 2015), and random regression models based on markers interacting with discrete population cluster coefficients (de Los Campos *et al.* 2015, with an extension of a standard BLR model). To our knowledge, the implementation of these methods has not been adapted to contexts of admixture, where population structure variables are continuous. Furthermore, when calibration involves many populations, the increase in model complexity of these methods will make them computationally intractable and statistically inefficient. Parsimonious multi-population models, based on only a few parameters to capture population heterogeneity, have also been proposed (Zhou *et al.* 2014, Heslot and Jannink 2015). In the presence of many populations, such models are more practical and potentially more useful than multi-trait and random interaction models. Also, since they generally assume some underlying basis for population heterogeneity (*e.g.*, inconsistency in LD patterns), they may generate insight about the causes of marker-by-population interactions.

In this study, we first considered a standard GBLUP model and evaluated different types of fixed effects to reflect population structure, then we investigated the usefulness of standard GBLUP and two kernel-based extensions for coping with population heterogeneity. These procedures were compared to a standard GBLUP model under two validation schemes: whole-sample calibration, where all individuals except selection candidates are included in the calibration set, and cross-population calibration, where the target population is excluded from the calibration set. The two multi-population GBLUP extensions are one previously-reported model, derived from Heslot and Jannink (2015), and one proposed model, based on a flexible kernel function of population principal components. We applied the procedures to the analysis of three traits (plant height, heading date, and standability) in switchgrass (*Panicum virgatum* L.), an herbaceous biomass crop showing good promise for bioenergy production (Sanderson *et al.* 1996, Langholtz *et al.* 2016). This species is characterized by an extensive diversity which makes it particularly suitable for studying population heterogeneity (Casler 2012). The sample under study comprised seven population clusters from two diverse panels, assayed in the Midwestern region of the United States, which represent differentiation by ecotype (upland or lowland), geographical origin (latitudinal and longitudinal gradients) and ploidy level (tetraploid or octoploid). This sample exemplified the heterogeneity of data available for practical applications of genomic prediction, which pose both opportunities (by increased sample sizes) and challenges (by inconsistencies in marker effects across populations) for breeding based on DNA markers.

In this study, we did not fit BLR models (usually based on Markov chain Monte Carlo optimizations), since we focused on deterministic methods for model fit and considered only models based on computationally efficient best linear unbiased predictors (BLUP). Further research would be needed to develop kernel-based extensions of this type of models.

## Material and methods

### Panels and populations

In this study, two multi-population panels were assayed and considered together in one sample. The first panel was the breeding panel (BP) described in Ramstein *et al.* (2016), comprising two tetraploid breeding populations of half-sib families: WS4U-C2, which consisted of 137 half-sib families derived from a diverse upland-ecotype pool of 162 plants (Casler *et al.* 2006), and Liberty-C2, which consisted of 110 half-sib families derived from the lowland-upland cultivar Liberty (Casler and Vogel 2014). The second panel was the association panel (AP) described in Lu *et al.* (2013) and Evans *et al.* (2015), comprising six putative populations of clonally propagated genotypes of different ecotypes (U: upland; L: lowland), ploidy levels (4X: tetraploid; 8X: octoploid) and geographical origins (S: South; W: West; N: North; E: East): U4X-N (135 plants), U8X-W (129 plants), U8X-E (97 plants), U8X-S (10 plants), L4X-NE (106 plants) and L4X-S (37 plants). These populations corresponded to 66 diverse accessions (Lu *et al.* 2013, Evans *et al.* 2015) with up to 10 individuals per accession.

In WS4U-C2, one individual was discarded so as to avoid assigning it to a population in AP, since it was too distantly related to the other individuals in BP (based on principal component analysis). In total, individuals were considered in this analysis. The main goal of this study was to assess different methods for accommodating genetic heterogeneity when predicting phenotypic means in a given target population. Four targets were chosen, with a defined focus on tetraploid populations with at least 100 relatively homogeneous individuals: WS4U-C2 and Liberty-C2 (from BP), and U4X-N and L4X-NE (from AP).

### Marker data

Exome capture sequencing of individuals (parents in BP and clonally propagated plants in AP) was performed using the Roche-Nimblegen protocol for preparation of SeqCap EZ Developer libraries using the Roche-Nimblegen probeset ‘120911_Switchrass_GLBRC_R_EZ_HX1’ as described previously (Evans *et al.* 2014, 2015). Reads from sequencing were aligned to the hardmasked *P. virgatum* v1.1 reference genome (http://phytozome.jgi.doe.gov/pz/portal.html#!info?alias=Org_Pvirgatum). Counts of reads corresponding to alternate and reference alleles for each individual were then determined as described previously (for BP, Ramstein *et al.* 2016; for AP, Evans *et al.* 2014, 2015) at 2,179,164 single nucleotide polymorphism (SNP) loci, which were identified as polymorphic in two diversity panels: the Northern Switchgrass Panel, corresponding to AP (Evans *et al.* 2014, 2015), and a southern switchgrass panel (E. C. Brummer, unpublished data). The numbers of alternate allele at the SNP loci were then called by using the expectation-maximization algorithm of Martin *et al.* (2010) fitted in each population (in BP) or accession (in AP) separately, under the assumption of disomic inheritance. Although this assumption is supported in switchgrass for tetraploid genotypes (Okada *et al.* 2010; Li *et al.* 2014), it does not hold for octoploid genotypes, which would presumably exhibit tetrasomic inheritance. However, we did not adapt the algorithm of Martin *et al.* (2010) to accommodate possible tetrasomic inheritance, as the sequencing depth of ∼24 was deemed insufficient for calling intermediate heterozygotes (simplex and triplex) with high enough accuracy (Evans *et al.* 2015). Indeed, a sequencing depth of at least 60-80 was recommended by Uitdewilligen *et al.* (2013) for accurately calling tetrasomic genotypes. Therefore, for all individuals, the resulting marker-data matrix consisted of expected allelic dosages (sums of alternate-allele counts weighted by their posterior probabilities, for every individual and SNP) between 0 and 2.

The SNP data were filtered based on the following criteria: (i) proportion of missing values strictly lower than 2%, a stringent threshold given prior filtering of SNPs on read depth ≥ 5 (Evans *et al.* 2014, 2015); (ii) minor allele frequency strictly greater than and variance of allelic dosages strictly greater than , with individuals; (iii) *p*-value for Hardy-Weinberg equilibrium strictly greater than 10^{−4} in each BP population; (iv) availability of genomic-location information (as per v1.1 of the reference genome of *P. virgatum*). Missing values at SNPs were imputed by their mode in the whole sample. The resulting filtered and imputed marker-data matrix **X** consisted of allelic dosages at SNP markers.

### Phenotypic data

Populations in BP were assayed each year between 2012 and 2014, in Arlington, WI (USA), in a randomized complete block design, with four replicates for WS4U-C2 and three replicates for Liberty-C2. Populations in AP were assayed each year between 2009 and 2011 in Ithaca, NY (USA), in a sets-in-reps design, with two replicates per individual and 10 sets within each replicate, with each set comprising at most one individual from each of the 66 accessions in AP (Lu *et al.* 2013, Evans *et al.* 2015). In each panel, three phenotypic traits were considered: plant height, heading date and standability. Plant height (PH) was measured in centimeters as the height from the ground to the top of the tallest panicle. Heading date (HD) was measured in growing degrees days as the cumulated sum of daily average temperatures (in degrees Celsius; °) above 10 °, from January 1^{st} to the day of heading, defined as the emergence of at least half of the panicles from the boot (Mitchell *et al.* 1997); daily average temperatures were estimated by the average of the minimum and maximum daily temperatures. Standability (St) was measured on a 0-10 scale to describe plants’ stature and stiffness, with 0 qualifying plants that are prostrate and 10 qualifying upright and rigid plants (Lipka *et al.* 2014).

Not all traits were measured every year in any given population: only HD was measured in all three years in AP populations and Liberty-C2. For all other cases, measurements were available for only a subset of years (Table 1).

In BP, observational units were plants within half-sib families from a given genotype (maternal parent) *i*. Half-sib families were arranged in a randomized complete block design and assayed in multiple years; so the following mixed model was fitted to phenotypic measurements , to estimate half-sib family effects *f _{i}*’s:where μ is the population mean; , and are the effects of half-sib family i (fixed), block j (random) and year l (random) respectively; × indicates interactions (random); are residuals for plant

*m*within plot

*ij*in year

*l*. In AP, observational units were clones of a given genotype

*i*. Genotypes were arranged in a sets-in-reps design and assayed in multiple years; so the following model was fitted to measurements to estimate genotype effects

*g*’s:where μ is the panel mean; , , and are the effects of genotype i (fixed), replicate j (random), set

_{i}*k*within replicate

*j*(random) and year l (random) respectively; × indicates interactions (random); are residuals for clone

*ij*in year

*l*.

The models described above conform to analyses of strip-plot (split-block) designs (Steel *et al.* 1996), in which years and genotype classes (half-sib families in BP, individual genotypes in AP) are whole-plot factors in cross-classification and sub-plot factors are combinations of years and genotype classes. For each random term, the corresponding effects were modeled as independent and identically normally distributed. The linear mixed models described above were fitted using ASREML-R (Butler *et al.* 2009).

Effects *f _{i}*’s in BP are transmitting abilities of genotypes (the mean of their half-sib progeny in their respective breeding population), so , where

*BV*is the breeding value of genotype

_{i}*i*. In comparison, effects

*g*’s in AP are genotypic values, such that , where is the deviation from additivity due to dominance and/or epistasis. Thus, outcomes of interest for genomic prediction were set to be genotype means

_{i}*y*’s such that in BP and in AP, with the estimated population mean in BP or estimated panel mean in AP.

_{i}### Population structure and relatedness

#### Admixture analysis:

The soft clustering model from the ADMIXTURE software was fitted on the whole sample and the whole set of SNPs, *i.e.*, without selection on individuals or markers (Alexander *et al.* 2009). Based on the 10-fold cross-validation implemented in ADMIXTURE (Alexander and Lange 2011), the number of population clusters in the admixture model was set to , as cross-validation error reached a plateau at that value (Figure S1). The resulting matrix **A** of admixture coefficients comprised inferred membership probabilities at each cluster (Figure 1a). For convenience (in prediction models), minimum values in **A** (10^{−5}) were set to zero while ensuring that each row still summed to one (by dividing each element in **A** by its row sum).

#### Principal component analysis:

Principal component analysis (PCA) was performed on the whole sample and the whole set of SNPs. The number of principal components (PCs) to choose for depicting population structure was chosen based on the proportion of variance explained and the grouping patterns captured by PCs (Figure 1b). The resulting PC matrix **P** consisted of coordinates for each individual at the first PCs.

#### Recent relationships among individuals:

Let be the genomic relationship matrix as defined by VanRaden (2008), where is the centered marker-data matrix and is a scaling factor depending on allele frequencies ′s (estimated on the whole sample), where is the number of SNP markers. Following Fan *et al.* (2013), **G** was decomposed as , where P consisted of the first PCs, as described above. Matrix is the dense part of the relationship matrix **G**, representing resemblance among individuals through population structure, whereas matrix represents recent relationships conditional on population structure, similarly to the adjusted relationships introduced by Thornton *et al.* (2012) and Conomos *et al.* (2016), with the difference that here coefficients in are not scaled for direct estimation of recent-kinship coefficients. Here the graphical LASSO was applied to to infer a graph of recent relationships among individuals, according to a regularization parameter *λ*. Parameter *λ* was chosen to maximize the restricted likelihood in a GBLUP model based on the regularized genomic relationship matrix , fitted to the whole sample (see Appendix 1 for technical details and discussion on graph inference). The GBLUP model depicting relationships through was fitted for each trait separately, so different optimal values of *λ* were inferred for different traits.

### Genomic prediction models

All linear mixed models described below were fitted using the R package rrBLUP (Endelman 2011).

For a given marker-data matrix **X** and vector **y** of outcomes, the standard GBLUP model is described as follows:(1)where **y** is the *n*-vector of genotype means (*y _{i}*′s, as described above);

**X**is the marker-data matrix (here consisting of allelic dosages), and is the variance of marker effects;

**Q**is the model matrix for fixed effects

**α**; is the covariance matrix for errors considered independent and identically distributed.

Hereafter, the testing set TS is defined as the set of individuals left out for model validation. The calibration set CS is the set of individuals used to fit the prediction models, which excludes the TS but does not necessarily consist of all remaining (available) individuals.

We defined the mean structure in fitted models by **Q** being one of the following: *(Intercept*) a *n*-vector of ones , such that fixed effects consisted of a single intercept; (*PCA*) the matrix of column vector of ones and first four PCs; (*Panel*) the model matrix attributing observations to panel AP or BP, such that fixed effects reflected differences in genetic compositions and environments across panels; or (*Group*) the matrix model matrix attributing individuals to the following putative population groups: WS4U-C2, Liberty-C2, U4X-N, U8X-W+U8X-S, U8X-E, L4X-NE and L4X-S. Genotypes from U8X-S were grouped with U8X-W on the basis of their proximity according to the first 4 PCs, to avoid having one group with too few observations.

In this study, we first compared mean structures with respect to prediction accuracy under the standard prediction procedure (*GBLUP*, as described below). Then, we focused on and compared prediction procedures for accommodating population heterogeneity (see below; *GBLUP*, *GBLUP-Target*, *MPM-Mixture*, *MPM-Matérn*). Prediction accuracy of models (differing either by mean structure or prediction procedure) was assessed by cross-validation as described in the next subsection (Validations).

#### Whole-sample model: GBLUP:

In the whole-sample model (*GBLUP*), we fitted model (1) to all available individuals, thereby assuming that the whole sample consists of only one population. This method consists of ignoring population heterogeneity and relying on robustness of standard GBLUP to interactions between markers and population backgrounds.

#### Target-population model: GBLUP-Target:

In the target-population model (*GBLUP-Target*), we fitted model (1) to individuals belonging to the same population as the TS, when possible (see below). This method corresponds to a typical choice of reducing population heterogeneity and basing predictions only on individuals that have genetic backgrounds that are *a priori* similar to those in the TS.

#### Multi-population models: MPM-Mixture, MPM-Matérn:

Multi-population models (*MPM*) were extensions of model (1) intended to accommodate population heterogeneity. The following general model was fitted:(2)where ○ is the element-wise (Hadamard) product, and is a *n* × *n* covariance matrix depicting population differentiation among individuals (see Appendix 2 for derivations and technical details). To parsimoniously estimate , we used two different procedures: *MPM-Mixture* (based on **A**) and *MPM-Matérn* (based on **P**). In both procedures, we did not model any heteroscedasticity for additive genetic effects **u**.

In *MPM-Mixture* (the reference *MPM* procedure), , where is the *n* × *n* matrix of ones and is a *K* × *K* matrix depicting relationships among population clusters as inferred in **A**. Here, we simply set ( is the *K* × *K* identity matrix), so . Therefore in this procedure, set a trade-off between the case where relationships were cluster-specific () and the case where relationships assumed one single homogeneous population for all individuals (). This approach is similar (but not exactly equivalent) to the *K*-kernel method of Heslot and Jannink (2015), which set a similar balance between cluster-specific and overall relationships, but using G for relationships (VanRaden 2008), instead of , and considering only discrete population clusters (in which case values in **A** would then be only 0 or 1). Alternatively, *MPM-Mixture* may be viewed as a multi-kernel model where and are the variance components respectively associated to cluster-specific and main marker effects.

In *MPM-Matérn* (the proposed *MPM* procedure),, where is a Matérn kernel function of and : , is the Euclidean distance between the *d*-vectors of PC coordinates for any pair (*i*, *j*) of individuals, is a shape parameter, is a scale parameter, and is the modified Bessel function of the second kind, of order *ν* (Abramowitz and Stegun 1984, Ober *et al.* 2011). Matérn functions have been used in various contexts, including in genomic prediction for depicting relationships among individuals (Ober *et al.* 2011). Here, we used Matérn functions to depict relationships among populations, with the input representing differentiation with respect to population structure in orthogonal directions. We used Matérn functions instead of more typical kernel functions (*e.g.*, an exponential or Gaussian kernel function) to allow for some flexibility in the shape of the correlation in : and correspond respectively to the exponential and Gaussian kernels as special cases, while different shapes can also be fitted (Ober *et al.* 2011).

The parameter *ρ* in *MPM-Mixture* was estimated by maximizing the restricted likelihood of model (2) using the optimization algorithm implemented in the R function *optimize*. The parameters *ν* and *h* in *MPM-Matérn* were estimated by maximizing the restricted likelihood of model (2) using the Nelder-Mead algorithm implemented in the R function *constrOptim*, with constraints for positivity. In order to control (to some extent) for the possible presence of local maxima in the restricted likelihood surface in *MPM-Matérn*, we used four different starting points : , , and , with *D _{max}* the maximum distance observed over pairs of individuals (

*i*,

*j*). In cross-validation (see next section), parameters

*ρ*,

*ν*and

*h*were estimated in each CS separately.

### Validations

We assessed the accuracy of our prediction procedures by cross-validation (CV) under two schemes: whole-sample calibration, where all individuals except the TS are included in the CS, and cross-population calibration, where the target population (the population to which the TS belongs) is excluded from the CS. The target-population model *GBLUP-Target* was only assayed in whole-sample calibration, since this model could only rely on individuals from the target population for calibration (in *GBLUP-Target*, the CS could only consist of individuals in the target population, which was not possible in cross-population calibration).

For each target population (L4X-NE, U4X-N, Liberty-C2 or WS4U-C2), we used as the TS a random subset of the target sample. The size of the TS was one fifth of the target sample size. All remaining individuals were used as input to the prediction procedures. Such validations were replicated times for each target.

Prediction procedures were evaluated for accuracy by , *i.e.*, the correlation between actual and predicted outcomes in a given TS. To assess the significance of differences in prediction accuracy between two procedures, we performed a *t*-test on , where is the average of ; () is the vector of prediction accuracies over testing sets for the tested procedure (baseline procedure); and z is the Fisher transformation. The standard error of the mean difference in prediction accuracy, , was estimated in two different ways: (liberal *t*-test) where is the standard deviation of δ, with all testing sets assumed to be independent datasets; (conservative *t*-test) based on the first method of Nadeau and Bengio (2003), , where redundancy over testing sets is accounted for by the additional term , with *o* being the expected fraction of overlap among testing sets; here and because testing sets were random subsets consisting of a fifth of any given target sample. We considered that this method for estimating was conservative in whole-sample calibration because Nadeau and Bengio (2003) derived it by assuming that the CV criterion (the “loss function”, analog here to , for a given TS) did not depend on the CS instances, given a particular CS size. Therefore the adjustment from Nadeau and Bengio (2003) may have overestimated the correlation among values of the CV criterion across replicates, in whole-sample calibration, since prediction procedures are probably quite sensitive to differences in the composition of the CS. In all comparisons between procedures, we reported the results from both tests in order to characterize the significance of differences in prediction accuracy.

### Data availability

Population information (population assignment and geographical origin of genotypes, when available), raw phenotypic data (trait measurements at individual plants) and estimated genotype means (for maternal parents in BP and individuals in AP) are available in Files S1, S2 and S3, respectively. These supplementary files as well as the marker data (allelic dosages at the 717,814 selected SNP markers; in .rds format readable in R) are available from figShare. Supplemental material available at Figshare: https://doi.org/10.25387/g3.7464863.

## Results

### Population structure in the sample

#### Population-level differentiation:

Seven population clusters were inferred from the ADMIXTURE software (Figure S1; Alexander *et al.* 2009). These clusters corresponded roughly to populations L4X-NE, L4X-S, Liberty-C2 and U4X-N, WS4U-C2, U8X-E, U8X-W. One population with little representation in our sample, U8X-S, appeared to be of mixed origin (Figure 1a). The other populations generally displayed a low level of admixture, with relatively few individuals having intermediate admixture coefficients. There seemed to be some admixture involving upland populations (WS4U-C2 and U4X-N, WS4U-C2 and U8X-W, U8X-E and U8X-W), with even some shared ancestry between WS4U-C2 and U4X-N. The PCA confirmed that population structure was relatively discrete (Figure 1b). Expectedly, the first PC separated genotypes by ecotype while the second PC reflected geographical origin within the lowland ecotype (Lu *et al.* 2013, Evans *et al.* 2015). The third and four PCs discriminated upland genotypes by geographical origin and ploidy level, and distinguished L4X-S from the two other lowland populations (L4X-NE and Liberty-C2).

Differences in mean and range among populations were quite typical of previously reported differences between ecotypes (Table 1; Casler 2012). Indeed, L4X-S and Liberty-C2 (populations of lowland origin) had high mean values and range values for PH, HD and St, compared to upland populations (excluding U8X-S). However, L4X-NE stood out as a lowland population for being relatively short, early-flowering, and prone to lodging, with corresponding values for PH, HD, and St more similar to those of the upland populations.

#### Recent relationships in the sample:

Here, marginal genomic relationships were defined as the elements of , with consisting of centered marker variables, and *ν* being some scaling factor. The strong and quite discrete population structure in the sample translated into multimodal marginal genomic relationship coefficients, with the multiple peaks in off-diagonal elements of **G** reflecting differentiation of population with respect to allele frequencies (Figure S2a). Conditioning relationships on population structure (as depicted by the first four PCs of ) yielded the matrix , with and **P** reflecting structure in **G** due to population-level variation (Fan *et al.* 2013). The conditional genomic relationships seemed sparser, in the sense that they appeared to cluster around zero, so most individuals could be assumed to be unrelated after accounting for population structure in the sample (Figure S2b). Conditional relationships in were particularly relevant in this study, since among-population variation, captured by , contributed little to variation within any given TS. Indeed, any TS generally consisted of selection candidates from a relatively homogeneous target sample (made of individuals from WS4U-C2, Liberty-C2, U4X-N or L4X-NE), where variation with respect to P was minimal. Graphs of recent relationships, inferred by the graphical LASSO, were rather dense, with average degrees (number of neighbors by node/individual in the graph) ranging from 217 to 458 (Figure 2). However, some noticeable features of populations emerged from the inferred graphs (Figure 2): WS4U-C2, U4X-N and U8X-E appeared quite connected to one another; U8X-W also showed some connection with other upland populations but seemed more distinct, as reflected by a relatively lower average degree (Figure S3); Liberty-C2 and L4X-S were somewhat connected to both upland and lowland populations, which certainly explains why their individual degrees were generally high (Figure S3); most notably, L4X-NE displayed an outstandingly low level of connection with the other populations, which translated in a clear separation of this population in the graph, after placing the nodes based on a force-directed algorithm (Fruchterman and Reingold 1991). These features exemplify the usefulness of conditional relationships and their associated graphs for describing relationships among individuals.

### Impact of mean structure on prediction accuracy

Prior to assessing different prediction procedures accommodating population heterogeneity, models were compared for different fixed-effect specifications used to characterize population structure (mean structures of models). Mean structures were tested for prediction accuracy under *GBLUP*, in which the whole sample, excluding the TS, was used to fit a standard GBLUP model. In whole-sample calibration (where the target population was included in the CS), the various mean structures assayed differed marginally with respect to their prediction accuracy. There were improvements over *Intercept* (only intercept as fixed effect) by mean structures which explicitly captured population structure, *i.e.*, *PCA* (intercept and effects of PCs), *Panel* (effects of panels AP and BP) and *Group* (effects of putative population groups). However, those were small, inconsistent and moderately significant (hereafter “moderately significant” refers to based on the liberal “naïve” *t*-test) (Table 2a). Conversely, in cross-population calibration (where the target population was excluded from the CS), fixed effects explicitly depicting population structure resulted in highly significant decreases in prediction accuracy compared to *Intercept* (hereafter “highly significant” refers to based on the conservative *t*-test adapted from Nadeau and Bengio 2003; see *Material and methods* for details). In particular, prediction accuracy was substantially lower with *PCA* for L4X-NE (HD, St), as well as with *Group* for U4X-N (PH, HD), L4X-NE (PH, HD) and Liberty-C2. Mean structure *Panel* was not as sensitive to cross-population calibration compared to *PCA* and *Group*, and even showed one highly significant increase in prediction accuracy compared to *Intercept*, for Liberty-C2 (PH). But it also showed decreases in prediction accuracy, with U4X-N (PH) and L4X-NE (HD), which were stronger and highly significant (Table 2b). Interestingly, the deterioration of prediction accuracy in cross-population calibration by *PCA* and *Group* may be due to different factors. Indeed, while *PCA* may fail to properly extrapolate effects of PCs outside the set of populations represented in the CS, *Group* would fail to capture any difference due to population differentiation in the TS (since all individuals in the TS belong to the same unobserved population group). Due to the relative stability in performance of *Intercept*, hereafter we chose to focus on this mean structure when comparing prediction procedures.

### Impact of prediction procedures on prediction accuracy

#### Target-population model:

For prediction in a given TS, the target-population model (*GBLUP-Target*) consisted in restricting the CS to the subset of the sample belonging to the same population as the TS. Compared to *GBLUP*, the target-population model yielded decreases in prediction accuracy which appeared moderately significant for PH (WS4U-C2, U4X-N) and St (Liberty-C2) (Table 3a, Table S1). However, prediction accuracy for St (WS4U-C2) was higher, with a moderately significant difference. More intriguing is the consistent increase in prediction accuracy with L4X-NE, with differences being small yet highly significant for PH and HD, and moderately significant for St. It is unclear whether these differences are due to the consistently higher accuracies achieved with L4X-NE (in *GBLUP-Target*) compared to other populations, or a result of L4X-NE being relatively under-connected to the other populations in the sample (Figure 2, Figure S3). Both factors could very well contribute to the observed decreases in accuracy when incorporating information from the whole sample.

#### Multi-population models and marker-by-population interactions:

The inferred mixing parameter *ρ* from the *MPM-Mixture* model was null (or close to null), low and intermediate, for PH, St and HD respectively, with estimations being quite consistent over CV replicates (Table 4). The improvement in fit, relatively to *GBLUP*, was non-significant for PH, rather significant (*P* < 0.05) for St, and strongly significant (*P* < 0.001) for HD (Table 4). In *MPM-Matérn*, the inferred correlation functions differed substantially across traits (Table 4), while being quite consistent over CV replicates in whole-sample calibration and across validation schemes, with similar shapes of the correlation function in whole-sample calibration and cross-population calibration (Figure 3): roughly resembled an exponential kernel with PH and HD, and was more similar to a Gaussian kernel with St, for which a “shoulder” maintained high correlation in marker effects for individuals that were relatively close to each other, based on their PCs. Remarkably, the shapes of inferred correlation functions were quite consistent in cross-population calibration, despite entire populations being left out from one CS to another (Figure 3b). Inferences regarding among-population correlations () in *MPM-Matérn* were weakly significant for PH and St, with *p*-values close to 0.05; in contrast, inferences regarding for HD were strongly significant, with *P* < 0.001 (Table 4). Interestingly, distances based on PCs may be equivalent to distances based on allele frequencies. Specifically, , where () is the *m*-vector of individual-specific allele frequencies of individual *i* (*j*) as described by Conomos *et al.* (2016), with population structure described by (Appendix 3). Therefore, the significant relationship between PC-based distances and correlations in marker effects (depicted by ) for HD in *MPM-Matérn* indicates that marker effects for this trait were highly sensitive to variation in allele frequencies across genetic backgrounds.

In whole-sample calibration, the performance of *MPM-Mixture* was very similar to that of *GBLUP*, with differences in accuracy ranging from -0.018 to +0.009 (Table 3a). Quite surprisingly, *MPM-Mixture* displayed slightly deteriorated accuracies for HD (with the exception of U4X-N), despite the strongly significant improvement in fit for this trait. In contrast, *MPM-Matérn* yielded larger differences in accuracy, ranging from -0.019 to +0.060 in whole-sample calibration (Table 3a). With the two upland target populations (WS4U-C2 and U4X-N), noteworthy increases in prediction accuracy (+0.060 and +0.030 respectively) were observed for HD. But with the two other target populations (Liberty-C2 and L4X-NE), smaller differences in accuracy (-0.008 and +0.007 respectively) were observed for HD.

In cross-population calibration, *MPM-Mixture* showed more differences in accuracy compared to *GBLUP*, with differences in accuracy ranging from -0.150 to 0.032 (Table 3b). Again, *MPM-Mixture* displayed deteriorated accuracy for HD despite a strong improvement in fit, with a dramatic decrease by 0.15 for U4X-N. Such decrease in accuracy may be due to the lack of flexibility of *MPM-Mixture* in depicting among-population resemblance, as it only fits one correlation coefficient for all pairs of populations. In cross-population calibration, *MPM-Matérn* also resulted in large differences in accuracy compared to *GBLUP*, ranging from -0.304 to 0.147. The dramatic decreases in prediction accuracy with PH (-0.304 and -0.250 for WS4U-C2 and L4X-NE, respectively) could be explained by the relatively weak improvement in model fit from *GBLUP* to *MPM-Matérn* (Table 4). Interestingly, large and significant improvements in prediction accuracy were observed with HD, similarly to the results from whole-sample calibration, with nonetheless more dramatic increases in accuracy. While *MPM-Mixture* simply estimates a general coefficient for among-population resemblance, based on the CS, *MPM-Matérn* may be more suitable for extrapolation to unobserved population backgrounds, as it estimates the resemblance between any two populations as a function of their specific properties (here, PC coordinates). Consistently, the relative improvement in accuracy from *GBLUP* to *MPM-Matérn* seemed more predictable based on the relative improvement in model fit. Specifically, a decrease in Bayesian information criterion (BIC) seemed to discriminate cases where an improvement in accuracy could be achieved by *MPM-Matérn*, especially in cross-population calibration where a correct depiction of population heterogeneity seemed more critical (Figure S4).

## Discussion

### Conclusions

The present study assessed different mean structures to represent population differentiation and evaluated various procedures to accommodate population heterogeneity in diverse samples, with an application in switchgrass. We considered different approaches to reflect population structure, *i.e.*, characterizing it implicitly by random marker effects, using only an intercept as fixed effect (*Intercept*), or characterize it explicitly, by continuous differentiation (*PCA*) or discrete effects at the level of panels (*Panel*) or putative population groups (*Group*). Furthermore, we employed three typical strategies for dealing with marker-by-population interactions, *i.e.*, ignoring (*GBLUP*), reducing (*GBLUP-Target*), or modeling (*MPM*) the source of heterogeneity in the data.

Our assessment of mean structures points to a simple fixed-effect specification being preferable in genomic prediction analyses, since accuracies from *Intercept* were relatively high across populations and traits, and relatively stable across validation schemes (whole-sample calibration or cross-population calibration). These conclusions are consistent with those of Phocas and Laloë (2004), who showed larger prediction errors in cattle when including putative genetic groups as fixed effects (comparable to *Group* and *Panel*). Notably, deteriorations of prediction accuracies from *PCA* and *Group* were especially large in cross-population calibration in which entire populations were excluded from the CS. Moreover, these were often noted for L4X-NE which was under-connected to other populations in the sample (Figure 2). Decreases in accuracy with *PCA* suggest that linear fixed effects capturing population structure may fail to properly extrapolate on unobserved populations whose genetics may differ markedly from other populations in the sample (Figure 1b). However, it is worth noting that the switchgrass sample under study was highly structured. Samples in other species, *e.g.*, in maize or rice, may not display such discrete population differentiation and therefore may not suffer as much from fixed effects at population level (Guo *et al.* 2014).

In whole-sample calibration, *GBLUP* often seemed robust to population heterogeneity, regarding prediction accuracy (Table 3a). This robustness was certainly due to the ability of GBLUP models to combine information from individuals according to the specified relationship matrix, by transferring information preferentially from the more related individuals (Searle *et al.* 2009, Habier *et al.* 2013). Furthermore, GBLUP models were probably all the more robust as marker density was high, such that genomic relationships were accurately estimated (Casella and Berger 2002, Endelman and Jannink 2012). However, some decreases in prediction accuracy compared to *GBLUP-Target* suggest that robustness of *GBLUP* may have been affected by other factors. Such factors may be related to relationships within the sample, *i.e.*, under-connectedness of some populations with others (Figure 2), or differences in accuracy of the prediction model across populations, as reflected by *GBLUP-Target* being more accurate in certain populations (Table 3a).

In whole-sample calibration, prediction was mostly determined by individuals in the CS belonging to the same population as the TS. Consistently, *MPM* procedures, which shrink relationships involving individuals from distantly-related populations, did not dramatically affect prediction accuracy (Table 3a). However, in cross-population calibration, decreasing the contribution of individuals from distantly-related populations must have been more pertinent, so that there were more opportunities for improvement of prediction accuracy by *MPM* procedures. In this context, *MPM-Matérn* proved more useful than *MPM-Mixture*, especially with HD for which *MPM-Matérn* resulted in a dramatic improvement of fit (Table 3b). Importantly, this relative superiority may be due to the fact that *MPM-Matérn* extrapolated correlations between populations, through leading PCs, while *MPM-Mixture* merely interpolated such correlations, by estimating a common coefficient of correlation across populations. This lack of flexibility, and the subsequent inability to extrapolate to unobserved populations, must have resulted in high sensitivity of *MPM-Mixture* to the composition of the CS, making it particularly inappropriate in a cross-population context.

Marker-by-population interactions captured by *MPM-Mixture* and *MPM-Matérn* were presumably not confounded by marker-by-environment interactions, since interactions between panel and markers were not significant (*P* > 0.25 in a model, similar to *MPM-Mixture*, which depicted correlation in marker effects between BP, assayed in WI, and AP, assayed in NY; Figure S5). Therefore, models analyzed in this study would reflect actual differences in genetic bases across populations. Moreover, for every trait, genomic variability (variance of marker effects) would be similar across panels. Indeed, the non-significant improvement in fit from an extension of a GBLUP model where genomic variance can vary by panel (*P* ≥ 0.18), suggested limited differences in variance of marker effects by panel (Figure S6). This result further implied that estimation of genotype effects and half-sib family effects (in AP and BP, respectively) and scaling of half-sib family effects (multiplied by two, so they corresponded to breeding values) were effective to ensure concordance in genomic variability across panels.

Here, we modeled interactions between markers and population structure through products of relationships at markers, which were linear (), and relationships about population structure, which were linear in *MPM-Mixture* and nonlinear in *MPM-Matérn* (; Appendix 3). Using kernel functions to estimate relationships dispensed us from fitting effects of many variables, by estimating instead *n* breeding values directly from relationships. A similar strategy was adopted quite recently by Jarquín *et al.* (2014), who modeled genotype-by-environment interactions for genomic prediction, through products of linear kernels at markers and linear kernels at environmental covariates. As noted by these authors, such decomposition with respect to interactions had been introduced in quantitative genetics much earlier, by Kempthorne (1954) and Cockerham (1954) for depicting epistatic effects, based on expected relationships under an infinitesimal model. Importantly, relationships about population structure not only allowed us to efficiently specify a genome-wide marker-by-population interaction model, but they also enabled the use of nonlinear kernels at the population level (with produced by nonlinear functions in *MPM-Matérn*). Matérn kernels, introduced to genomic prediction research by Ober *et al.* (2011), were used here to estimate covariance among individuals in a flexible yet parsimonious way (Figure 3), while still using simple linear kernels for depicting within-population variability (by ). Our results exemplify the potential usefulness of parsimonious multi-population models, which are all the more interesting that they can be applied on samples comprising many populations. In contrast, typical multi-trait models would be computationally intractable or statistically inefficient here, since those would rely on one parameter for each population pair to model correlations among populations in (*e.g.*, 21 parameters for population clusters). As a matter of fact, multivariate genomic BLUP models fitted by ASREML-R to estimate such correlations among putative population groups (WS4U-C2, Liberty-C2, U4X-N, U8X-W+U8X-S, U8X-E, L4X-NE and L4X-S) failed to converge.

### Improvement of procedures

Our results suggest that a very high increase in quality of fit, as was observed for HD with *MPM-Matérn*, may allow for an increase in accuracy, especially in a cross-population context. In the analysis of Heslot and Jannink (2015) across various multi-population contexts, there seemed to be a positive relationship between differences in quality of fit, as measured by the Akaike information criterion (AIC), and differences in prediction accuracy. Although this relationship was quite loose, it could be noted that for very high increases in AIC (), gains in accuracy were null to high, similarly to the situation of *MPM-Matérn* with HD, for which increases in AIC varied from 28.68 to 42.88, across CV replicates in whole-sample calibration, and from 24.57 to 36.25 in cross-population calibration. Therefore, stringent thresholds on AIC increases could probably be used in *MPM* to avoid relative decreases in accuracy. In this study, one criterion more conservative than the AIC, the BIC, could discriminate cases where prediction accuracy was improved by *MPM-Matérn*, compared to *GBLUP* (Figure S4). Therefore, a possible improvement of *MPM* procedures could simply come from model selection as an integral part of the fitting process, based for example on the BIC. The BIC differences relative to *GBLUP* were almost always negative for PH and St in *MPM*. For these two traits, differences in prediction accuracy from *GBLUP* to *MPM* were quite inconsistent, especially with *MPM-Matérn*, so model selection could probably have made *MPM* procedures more robust. However, such conclusions are based on a restricted set of populations and genetic architectures. So future studies on other datasets would certainly be necessary to test this *post hoc* hypothesis and determine whether criteria such as the BIC can indicate cases where *MPM-Matérn* should be used instead of *GBLUP*.

Another way of potentially improving *MPM* procedures would be to use other types of kernels than those used here. For example, one may use linear kernels based on population-level covariates (*e.g.*, PCs) in place of in *MPM-Mixture*, hence taking an approach similar to that of Jarquín *et al.* (2014). Besides, modeling resemblance among population clusters in *MPM-Mixture*, by in place of (where captures similarity based on metrics at the population level), could be useful to increase quality of fit, and possibly prediction accuracy. Finally, an interesting way of extending the *MPM* procedures described here would be to incorporate more information at the population level. Here in *MPM*, population homogeneity was captured through admixture coefficients (*MPM-Mixture*) or differences in PC coordinates (*MPM-Matérn*), the latter reflecting differences in allele frequencies (Appendix 3). However, marker-by-population interactions may also be due to differences in LD patterns (Wientjes *et al.* 2016). Therefore metrics depicting such differences could be particularly appropriate for capturing population heterogeneity. Further research would be necessary to determine the type of metrics to use for reflecting differences in LD patterns, and the appropriate way to parsimoniously combine the different types of information regarding population differentiation in *MPM*. Interestingly, geographical distance may succinctly depict population differentiation, due to differences in allele frequencies and/or differences in LD patterns. Fitting population-level correlations in as a function of distance of origin would then be particularly useful in species under strong geographical structure, which include switchgrass (Grabowski *et al.* 2014), but also human (Coop *et al.* 2009), as was clearly shown in samples from Europe (Novembre *et al.* 2008), Africa (Bryc *et al.* 2010) and Latin America (Ruiz-Linares *et al.* 2014). Models such as *MPM-Matérn*, which are parsimonious yet flexible in the shape of the fitted correlation function (Figure 3), are promising in various applications on diverse samples, in prediction studies, but also in inferential studies aiming at characterizing the basis for population differentiation.

In this study, marker data were based on exome capture sequencing, which targets a selected subset of exons for sequencing and subsequent SNP calling (Hirsch *et al.* 2014, Evans *et al.* 2014). The potential lack of representation of causal variants by our assay may have resulted in loss of prediction accuracy. While total lack of representation of some genomic regions imposes a limit on prediction accuracy achievable by our procedures, the relative overrepresentation of some genomic regions could be, to some extent, alleviated by genomic relationships which account for correlation among markers and differential degrees of tagging of loci in the marker data (Speed *et al.* 2012, Ramstein *et al.* 2016, Wang *et al.* 2017).

Another limitation in our study is the assumed homogeneity of genetic and residual variances across populations. Here we focused on parsimonious models estimating genetic correlations (not covariances) between populations. Extending *MPM* models to capture variance heterogeneity across populations and/or environments would certainly deserve further investigation. Such models ought to fit functions of variance over genetic and/or environmental variables, similarly to Ou *et al.* (2015) who reported improvements in fit and marginal gains in prediction accuracy in swine, by modeling residual variance over sexes and slaughter dates. Indeed, a decisive advantage of models like *MPM-Matérn* (for correlations) and those of Ou *et al.* (2015) (for variances) is their ability to extrapolate population covariances to genotypes from unobserved population backgrounds.

### Applications and prospects

Based on our case study, we would recommend using *MPM* whenever a strong improvement in model fit is achieved. Otherwise *GBLUP* would be the method of choice, since it is often robust enough to perform at least as well as *GBLUP-Target*. However, fitting a GBLUP model to a CS restricted to the target population may be preferred when making predictions on “outlier populations” such as L4X-NE, which are under-connected to other populations and are characterized by relatively high prediction accuracy in a single-population context. Nevertheless, more empirical studies on population heterogeneity should follow to support the conclusions from our specific application. Such studies could apply to various contexts: in particular, predictions on diverse samples and dynamic breeding programs. The former includes analyses similar to our case study as well as analyses on more complex data, such as historical datasets, in which not only population heterogeneity but also genotype-by-environment interactions must be taken into account (Dawson *et al.* 2013, Rutkoski *et al.* 2015). The latter involves selection across multiple breeding generations, which might not necessarily suffer from strong population heterogeneity (Sallam *et al.* 2015, Auinger *et al.* 2016) but could nonetheless benefit from robust multi-population models for potential increase in persistency of accuracy over generations (Habier *et al.* 2007). In the context of diverse samples or dynamic breeding, simulation studies could also be useful for assessing the suitability of procedures to accommodate population heterogeneity. Differences in allele frequencies and differential LD patterns could be simulated by various genealogies, as was done for example by de Roos *et al.* (2009). Additionally, dependency of marker effects on allele frequencies could be simulated through underlying non-additive genetic effects, as well as allele fixation in specific populations. Indeed, marginal additive marker effects, as captured by linear models such as standard GBLUP, depend on allele frequencies at the loci with which they interact (Hill *et al.* 2008, Mäki-Tanila and Hill 2014, Hill and Mäki-Tanila 2015). Therefore, dominance and epistatic effects could be simulated to generate dependency of marker effects on allele frequencies, which may then be captured by methods such as *MPM-Matérn* (Appendix 3). Though investigations based on simulations would be complex and, to some extent, arbitrary by their choice of genealogies and genetic architectures, they would provide useful frameworks for assessing procedures, such as those presented in this study, in contexts of population heterogeneity.

## Acknowledgments

We are grateful to Jeremy Schmutz of the Department of Energy Joint Genome Institute and Hudson Alpha for his work on the switchgrass genome. We are also grateful to Nick Baker and Joseph Halinar, USDA-ARS, Madison, WI for assistance with field operations and data collection. This research was funded in part by the following agencies and organizations: the U.S. Department of Energy Great Lakes Bioenergy Research Center, DOE Office of Science BER DE-FC02-07ER64494 (laboratory operations, genotyping, and bioinformatics), the U.S. Department of Energy Joint Genome Institute supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231 (sequencing), Agriculture and Food Research Initiative Competitive Grant No. 2011-68005-30411 from the USDA National Institute of Food and Agriculture (CenUSA; field operations and phenotypic measurements), USDA-ARS Congressionally allocated funds (field operations, technical support, and logistics), and the University of Wisconsin Agricultural Research Stations (field operations). Mention of commercial products and organizations in this manuscript is solely to provide specific information. The USDA is an equal opportunity provider and employer. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

## Conflict of interest

The authors declare that they have no conflict of interest.

## Appendix 1

### Inference of recent relationships by the graphical LASSO

Let be a regularized form of , with G the genomic relationship matrix as defined by VanRaden (2008), and **P** the matrix of PCs. We applied the graphical LASSO to to infer a sparse matrix , which yielded a graph of relationships among individuals. Indeed, a zero *ij*-element in indicates that individuals *i* and *j* are unrelated conditionally on all other individuals, which corresponds to no edge between individuals *i* and *j* in the underlying graph of recent relationships.

For a given sample covariance matrix Σ, the graphical LASSO infers a sparse precision matrix by maximizing the Gaussian likelihood of the data (as represented by Σ), penalized by a *L _{1}*-norm penalty , where

*λ*is the regularization parameter and is the sum of absolute values in (Friedman

*et al.*2008).

In this study, regularization of was performed as follows:

Standardizing to obtain the corresponding correlation matrix :

Applying the graphical LASSO algorithm to , to obtain the regularized correlation matrix

Rescaling to obtain

The graphical LASSO algorithm was run using the R package huge (Zhao *et al.* 2012). The regularization parameter *λ* was chosen to maximize the restricted likelihood for the following model:where **y** is the *n*-vector of genotype means at a given trait; , depending on *λ*, consists of regularized relationships; is the variance of breeding values (here not equal to the variance of marker effects due to scaling and regularization on ); is the *n*-vector of intercept values; is the covariance matrix for errors considered independent and identically distributed. Here we chose the value of *λ* for which the corresponding matrix resulted in the highest restricted likelihood for the aforementioned model fitted to the whole sample. Possible values of *λ* were the *q*-quantiles of absolute values in , with *q* varying from 0.05 to 1 by step of 0.05. The restricted likelihood as a function of *λ* depended on **y**, so optimization of *λ* was performed for each trait separately.

## Appendix 2

### Multi-population GBLUP models for heterogeneous calibration sets

In this section, , , and refer to the vector of ones, identity matrix, and matrix of ones, respectively, of dimensions t, and (where *t* is specified).

Consider the following model for population-specific marker effects with respect to *K* populations and *n* genotypes:where ⊗ indicates the Kronecker product; is the *Kn* × *p* design matrix for the *p*-vector α of fixed effects; is the *Kn* × *Km* marker-data matrix for the *Km*-vector of marker effects at each of the *K* populations, with variance . The matrix reflects covariances in marker effects between populations. The *Kn*-vector , containing the phenotypic values for the *n* genotypes at each of the *K* populations, is hypothetical (and ill-defined from a practical standpoint), since genotypes typically do not belong to more than one population. The *Kn*-vector of residuals , with unspecified variance , is assumed to be uncorrelated to marker effects .

Let be the *Kn*-vector of additive genetic effects at each of the *K* populations, as a linear combination of a normally-distributed vector (), follows a normal distribution with expectation and variance as follows (Lehermeier *et al.*, 2015):So a multi-population model for breeding values that is equivalent to the model described above, by identical mean and variance structures, is as follows:Now assume that , and each population corresponds to the specific genetic background of each individual separately. By considering only observations at every individual’s specific genetic background, the above model reduces to: where ○ is the Hadamard product; y is the typical *n*-vector of observed phenotypic values; u and e are the corresponding additive genetic effects and residuals, respectively. Individual-specific marker effects are therefore accounted for by multiplying each element of the relationship matrix by the corresponding element of , thereby reflecting correlations in marker effects among individuals’ genetic backgrounds.

In general, we propose to infer by , where ϕ is some function of the *m*-vectors of marker variables and , for any pair of individuals *i* and *j*, and *κ* is a valid kernel function guaranteeing that be positive semi-definite. In *MPM-Mixture*, (*K*-vector of admixture coefficients for *i*) for any individual *i*, and the kernel function is , so that is a “mixture” between a matrix of correlations restricted to population clusters and a matrix allowing full exchange of information across clusters, as in a standard GBLUP model. More generally, one could define the kernel function as , where is a *K* × *K* matrix depicting relationships among clusters. Here, we simply set and adjusted the kernel function (by restricted maximum likelihood) for *ρ* only.

In *MPM-Matérn*, (*d*-vector of PC coordinates for *i*) for any individual *i*, and the kernel function is a Matérn function of , where is the Euclidean norm. Notably, it can be shown that is proportional to , where () is the *m*-vector of individual-specific allele frequencies for individuals *i* (*j*), defined by projection of matrix **X** onto the column space of (Appendix 3). So , which reflects differentiation with respect to coordinates at the leading PCs of **X**, also reflects differentiation with respect to individual-specific allele frequencies, with an underlying population structure represented by the same PCs. The allele frequencies have been introduced by Conomos *et al.* (2016), in a study where they also recommended using principal components from a subset of unrelated individuals in **X**. Here, we simply applied PCA on the whole matrix **X**.

## Appendix 3

### Relationship between Euclidean distance based on principal components and Euclidean distance based on individual-specific allele frequencies

In this section, , , and refer to the vector of ones, identity matrix, matrix of ones and matrix of zeros, respectively, of dimensions t, , and (where *s* and *t* are specified).

We will consider the case where the PC matrix **P** consists of the first *d* PCs of **X**, and individual-specific allele frequencies are defined as (Conomos *et al.* 2016):where represents population structure through an intercept and the effects of the first *d* PCs of **X**. Vector () then consists of individual-specific allele frequencies (with respect to ) for individual *i* (*j*), such that:and similarly for ( refers to the -vector of population-structure variables from for individual *i*).

We will show that for any pair (*i*, *j*), *i.e.*, Euclidean distances based on *d* PCs are equivalent, by proportionality, to those based on *m* individual-specific allele frequencies, with such frequencies as defined above.where .

Below, we will specify **M** more explicitly, to subsequently show that .

Let be the matrix of marker variables centered around their respective overall mean. Assuming , by eigenvalue decomposition , with **U** the matrix of eigenvectors and Λ the diagonal matrix of eigenvalues of ; and , where is the matrix of leading eigenvectors and is the diagonal matrix of corresponding singular values, assumed strictly positive.

Because consist of left-eigenvectors of a column-centered matrix (associated with strictly positive eigenvalues), so .

Besides, .

So:Moreover because .

Sowhere

Finally,

with:Therefore, for any pair of individuals (*i*, *j*):So:

## Footnotes

Supplemental material available at Figshare: https://doi.org/10.25387/g3.7464863.

*Communicating editor: G. de los Campos*

- Received December 13, 2018.
- Accepted January 10, 2019.

- Copyright © 2019 Ramstein, Casler

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.