## Abstract

This paper gives a short review of the development of genetic parameter estimation over the last 40 years. This shows the development of more statistically and computationally efficient methods that allow the fitting of more biologically appropriate models. Methods have evolved from direct methods based on covariances between relatives to methods based on individual animal models. Maximum-likelihood methods have a natural interpretation in terms of best linear unbiased predictors. Improvements in iterative schemes to give estimates are discussed. As an example, a recent estimation of genetic parameters for a British population of dairy cattle is discussed. The development makes a connection to relevant work by Bill Hill.

## 1. Introduction

One important strand of Bill Hill's research has been the application of genetics to livestock improvement. This requires the accurate prediction of breeding values, which in turn requires knowledge of the variances and covariances of random genetic and environmental effects. This paper gives a short review of the development of genetic parameter estimation. Again, this is an area to which Bill Hill has made major contributions. Typically, phenotypic covariances between relatives are partitioned into causal components such as variances due to additive effects, and temporary and permanent environmental effects (Falconer & Mackay 1996). In this paper, because of space limitations, general principles are illustrated using an infinitesimal additive model for normally distributed traits. In this model, the estimation of genetic parameters is often equivalent to the estimation of variance and covariance components. The book of Lynch & Walsh (1998) gives a very readable and comprehensive account of most of the statistical and genetic principles in this review. As an example, a recent estimation of genetic parameters using random regression models for a British population of dairy cattle is discussed.

## 2. Estimation of genetic parameters

### (a) Direct methods based on covariance between relatives

At the simplest levels, analysis of variance methods can provide estimates of phenotypic covariances when groups can be constructed with the same degree of genetic relationship for all members of a group. In the 1960s, at the beginning of Bill Hill's career, estimation was commonly based on paternal half-sib groups or groups of parents and offspring. Once phenotypic covariances between relatives had been estimated, these estimates were interpreted to give estimates of genetic parameters. Some designs generated both types of relationship, and it is sometimes appropriate to combine the estimates. For example, if several offspring are raised from each sire, sire–offspring and half-sib covariances could be combined. These estimates are correlated (Hill & Nicholas 1974), and weighted estimates can be derived. However, sib covariance estimates are biased by the selection of parents (e.g. Robertson 1977). Henderson *et al*. (1959) and Curnow (1961), using a sequential decomposition of the likelihood, show how maximum-likelihood (ML) techniques can be used to correct for this bias (provided the base animals are unselected). Optimal designs are given by Hill & Nicholas (1974) for unselected parents, and by Thompson (1976) for assortative mating and the selection of parents in a variety of hierarchical mating structures with measurement on parent and offspring. The general conclusions are that ML gives the most gain (about 20%) in accuracy from combining the estimates at intermediate heritabilities (*h*^{2}=0.2) when there is no selection of parents. At low values of *h*^{2}, most of the information comes from sib covariances, and at high values, from offspring–parent regression. When there is selection of parents, the combination of estimates is more useful at low heritabilities.

In the field it is uncommon to obtain equal sized families, and it is common to need to adjust data for environmental effects. Henderson (1953) described general methods for these situations, based on the analysis of variance idea of constructing sensible partitions of sums of squares and equating these sums of squares to their expectation. This, in general, was not optimal, as it used a fixed effect model to construct the sums of squares and then a mixed model to interpret them. In some special cases it was found that incorporating data with small family sizes gave larger sampling variances than when data on the smallest family sizes were removed. This was partly a consequence of inappropriate weighting to the family means based on different family sizes, and Robertson (1962) suggested a way of alleviating this problem by giving optimal weights for different family sizes for a one-way classification.

### (b) Maximum-likelihood methods

This section is based on a recent review by Thompson & Mäntysaari (2004) who give a more algebraic presentation. Patterson & Thompson (1971) introduced a residual maximum likelihood (REML) method based on maximizing the likelihood of error contrasts, i.e. contrasts that contributed no information on fixed effects. This REML method agreed with analysis of variance methods in balanced cases and effectively eliminated bias in variance estimation due to not knowing the fixed effects. The implementation of REML estimation depends intimately on the mixed model equations (MME) introduced by Henderson (1973) to construct best linear unbiased predictors (BLUP). To allow the accurate prediction of breeding values, it is important to take account of genetic relationships between all animals. Animal models have been introduced to allow the specification of all additive covariances between relatives by incorporating a random effect for the breeding value of each animal. In the formation of the MME it is the inverse of the numerator relationship matrix, ** A**, that is needed. Algorithms have been developed (Henderson 1976; Quaas 1976) to calculate the inverse directly from a list of pedigrees. Since the only non-zero elements of the inverse are those linking parents to offspring and individuals to themselves,

*A*^{−1}is very sparse. One way of explaining the sparsity is to say that given all ancestors’ breeding values, the breeding value of an individual depends only on its parents’ breeding values. Before the introduction of the animal model, parameter estimation had been a three-stage procedure: adjusting data for environmental effects, estimating covariance between relatives, and interpreting them in terms of genetic parameters. Now, it consisted of fitting a mixed linear model with specification-of-expectation and variance submodels.

The easiest way to maximize the likelihood is to find the zeros of the first differential. In the simplest case, this is equivalent to equating a residual sum of squares and a sum of squares of predicted values to their expectation. These expectations are functions of the inverse of the coefficient matrix, *C*^{−1}, in the MME. For a one-way classification, the method is equivalent to Robertson's (1962) suggestion. In most cases explicit estimates are not available and iterative schemes are needed to locate the maximum of the likelihood. A Fisher scoring scheme based on the expected value of the second differentials, again a function of *C*^{−1}, was used by Patterson & Thompson (1971).

An alternative ‘expectation–maximization’ (EM) algorithm based on thinking of the random effects as ‘missing’ was suggested by Dempster *et al*. (1977). This algorithm gives an iterative scheme for updating the variance parameters that can be found from the manipulation of the equations through equating the first differentials of the likelihood to zero. Two advantages of this EM algorithm are that variance parameters stay positive, and there is a guaranteed increase in likelihood at each M-step. However the method can be slow to converge, especially if the heritabilities are small or if there are missing data in multivariate cases. Indeed, this method is said to be the most widely used in animal breeding studies in terms of numbers of iterations. It also requires the inversion of ** C** in each iteration.

An important development was the introduction by Smith & Graser (1986) of an alternative form for the likelihood that naturally leads to a sequential formation of the likelihood. The likelihood can be thought of as a log determinant term (*D*) and a residual sum of squares (*S*). The terms *D* and *S* can be formed sequentially by using terms associated with eliminating the fixed and random effects from the MME, or equivalently, in terms of the factorization of ** C**. An advantage of this is that

*C*^{–1}is not calculated and, because of the large number of zero elements, computation can be substantially reduced by using sparse matrix methods. One disadvantage was that the differentials of the likelihood were not easily available. To maximize the likelihood with one parameter, Smith & Graser (1986) suggested using a quadratic approximation. With more than one parameter, methods that avoid calculating derivatives become a popular, flexible alternative. These methods were used for animal and reduced animal models, and for both univariate and multivariate data (Meyer 1989; Thompson & Meyer 1990; Meyer 1991). The basic framework was extended to include more biologically appropriate models with genetic components, including maternal models with both Wilham and Falconer terms (Koerhuis & Thompson 1997), and models with mutation terms (Wray 1990). A major disadvantage of derivative-free methods is that the computational effort increases dramatically as the number of variance parameters increases.

An important advance in derivative-based methods was the rediscovery (Misztal & Perez-Enriso 1993) of an algorithm (Takahashi *et al*. 1973) allowing calculation of the terms in the inverse of ** C** that are required for forming the first differentials (a ‘sparse’ inverse of

**) without calculating all the elements of the inverse. The sequential formation of this sparse inverse of**

*C***can be described in a way that parallels the sequential formation of the likelihood (Thompson**

*C**et al*. 1994). Meyer & Smith (1996) introduced an alternative way of calculating these first differentials by performing the ‘automatic’ differentiation of the Cholesky decomposition of

**These techniques for forming first derivatives both require twice the computational effort (at least in terms of the number of multiplications) compared with forming the likelihood.**

*C.*This result allowed the implementation of EM algorithms that could estimate variance parameters (Misztal 1994). These were an improvement on derivative-free methods, but could still be slow to converge. It is possible to calculate second differentials using automatic differentiation (Smith 1995), but the computation of each second differential requires six times as many multiplications as those involved in a single likelihood calculation (Smith 1995), and this becomes more costly as the number of parameters increases. There are various suggestions on approximating the second differential. Mäntysaari & Van Vleck (1989) suggest accelerating the EM algorithm based on the observed geometric rate of convergence. A quasi-Newton scheme using first differential values to build up an approximate second differential was suggested by Neumaier & Groeneveld (1998). A third method, based on manipulation of the alternative information matrices, was suggested by Thompson and co-workers (Gilmour *et al*. 1995; Johnson & Thompson 1995; Jensen *et al*. 1997). They show that the average of the observed and expected information (AI) is relatively easy to calculate. The AI matrix is based on the differential of the sum of squares *S*, and can be written in a similar way to *S* itself, using ‘working’ variables based on estimates of the fixed effects and predictors of the random effects.

A synthesis of comparisons of the different algorithms used to compute REML estimates was carried out by Hofer (1998), and was updated by Thompson & Mäntysaari (2004). These comparisons show the expected improvement of EM methods over derivative-free methods. The comparisons also show that most second differential methods converge in relatively few iterations.

### (c) Other parameterizations and iterative schemes

In some cases, transformations can aid in estimation. If we have multivariate data with two (*p*×*p*) variance matrices to estimate, say ** G** and

**, then a canonical transformation (Hayes & Hill 1980; Meyer 1985, 1997) can help in reducing one**

*R**p*×

*p*estimation into

*p*independent analyses.

A related problem is that often we require ** G** and

**to be positive definite, and schemes based on second differentials do not necessarily lead to positive definite matrices. One suggestion is to use transformed parameters. In the univariate case (**

*R***=**

*R**σ*

^{2}

**),**

*I**σ*or log

*σ*have been suggested instead of

*σ*

^{2}. Multivariate analogues such as Cholesky transformations have been suggested by Lindstrom & Bates (1988), and Groeneveld (1994). Recent work on EM algorithms (Foulley & Quaas 1995; Meng & Van Dyk 1998) has suggested that this Cholesky, or linear, parameterization has a natural interpretation and can lead to faster convergence.

For example, Foulley & Quaas (1995) use a model with residual variance *σ*_{E}^{2} and genetic variance *σ*_{G}^{2}, and the breeding values *u*** *** scaled by

*σ*

_{G}. Estimation of

*σ*

_{G}follows a two stage procedure of first predicting

*u***with mixed model equations. Then, regression of data on breeding values (taking into account the uncertainty of the breeding values) gives a natural way of estimating**

****σ*

_{G}and keeps

*σ*

_{G}

^{2}within the parameter space. For a balanced sire model, Foulley & Quaas (1995) note that this algorithm has a geometric rate of convergence with the distance of a parameter from its final value approximately reducing by a factor,

*r*, in each iteration. This factor

*r*is a function of

*σ*=

*n*/(

*n*+

*σ*), with

*σ*=

*σ*

_{E}

^{2}/

*σ*

_{G}

^{2}. For

*σ*=0.2, 0.5 and 0.8, estimation of

*σ*

_{G}using an EM algorithm gives

*r*=0.27, 0.45 and 0.31 respectively, compared with 0.03, 0.25 and 0.63 for a scheme based on updating

*σ*

_{G}

^{2}. This shows the advantage of the

*σ*

_{G}parameterization for small values of

*σ*. The first scheme estimates a linear parameter

*σ*

_{G}and the second scheme estimates a quadratic parameter

*σ*

_{G}

^{2}.

A more recent development is the parameter extension or PX–EM algorithm suggested by Liu *et al*. (1998). In our case it involves estimating *σ*_{G}^{2}=(*σ*_{G1})^{2} *σ*_{G2}^{2} and *σ*_{G1} by the first linear parameterization scheme, and *σ*_{G2}^{2} by the second quadratic parameterization scheme. This scheme is at first sight counter-intuitive in that *σ*_{G1} and *σ*_{G2}^{2} look confounded, but each is individually updated and *σ*_{G}^{2} is estimable. The scheme has a rate of convergence that again depends on *σ*, but is faster than the two previous schemes, with rates of convergence of 0.30, 0.60 and 0.80 for *σ*=0.2, 0.5 and 0.8 for the balanced sire case (Thompson & Mäntysaari 2004).

There has been interest recently in parameterizing variance matrices in different ways to lead to more parsimonious models. One way that has similarities with PX–EM methods is to write the *p* breeding values on one animal as ** u**=

**+**

*Sf***with var(**

*e***)=**

*f***and var(**

*U***)=**

*e***, a diagonal matrix. The matrix**

*D***(**

*S**p*×

*f*) could be thought of as representing a set of

*f*factors, and the parameterization can represent a reduced rank or latent regression parameterization. Then an estimation of

**can be found from the prediction of**

*S***and then regressing data on**

*f***, taking into account the uncertainty in**

*f***(Thompson**

*f**et al*. 2003). This provides insight into the PX–EM schemes and might suggest alternative parameterizations or hybrid schemes that have better iterative convergence properties.

We think that AI iterative schemes are attractive in that they usually only need a small number of iterations. The two drawbacks are that they do not always improve the likelihood, and that they can lead to estimates outside the parameter space. The first difficulty reduces as parameters get nearer to a maximum value of the likelihood. One hybrid scheme that is attractive is the suggestion (Cullis *et al*. 2004) of starting with a PX–EM scheme and switching to an AI scheme when the iterates are ‘close’ to a REML solution.

### (d) Sampling approaches

Thus far we have concentrated on exact methods of analysis. A recent, excellent book (Sorensen & Gianola 2002) has discussed Bayesian and Markov chain Monte Carlo (MCMC) methods. One reason often advocated for using a Bayesian approach is that it allows the natural use of ‘prior’ information on genetic parameters. This prior evidence is not often well explained. For example, in the recent 2002 World Congress on Genetics Applied to Livestock Production, over thirty papers presented Bayesian analyses of quantitative genetic parameters. Over half the papers did not quantify prior knowledge, and a third used vague or flat prior knowledge for the genetic parameters. In some papers at least it seems that only lip service was paid to the Bayesian paradigm.

In a sense there is a direct analogy between direct and iterative estimation in linear estimation, and exact and sampling-based methods in quadratic estimation. One can think of Gibbs sampling methods as adding noise at every step of a simplified exact analysis. For instance, estimate fixed effects and add noise, predict random effects and add noise, form sums of squares for breeding values and add noise to give an estimate of *σ*_{G}^{2}. One does not need to be Bayesian to use MCMC methods: Guo & Thompson (1994) use the above paradigm, with the estimation of *σ*_{G}^{2} given by an EM step. In a sense the difficulties of calculating prediction error variances are replaced by sampling them. Thompson (1994) and Groeneveld & García-Cortés, (1998) have pointed out that the sampling error can be reduced when updating *σ*_{G}^{2} by taking account of the variance of the noise added to random effects, although this is simpler to do for uncorrelated effects. One can also get nearer to exact methods by using block updating, but this leads to more complicated variance correction formula. It is not always clear which computational scheme, exact, Gibbs sampling or intermediate, will minimize computational effort.

A recent suggestion by Clayton & Rasbash (1999) for imputation can also reduce the computational effort. In a simple application, their idea suggests fitting two models. In the first model the data are adjusted for predicted random effects, and the fixed effects are estimated; and in the second model the data are adjusted for fixed effects, and breeding values are predicted and genetic variance and residual variance are estimated. The procedure is repeated many times, at each stage adding noise to the estimates of the fixed and random effects. The iterative estimates have some sampling variation and this can be reduced by averaging a sequence of estimates after ‘burn-in’. This scheme is similar to Gibbs sampling but should have less sampling variation. Thompson (1994), and more recently Garcia-Cortes & Sorensen (2001) and Harville (2004), have discussed related ideas.

### (e) Validation of models

The purpose of a large amount of parameter estimation is to allow the efficient prediction of breeding values and efficient selection procedures. In some ways it is more important to know if we are selecting the best animals, rather than whether we are fitting the best model. In some ways, using Falconer's realized heritability, relating selection response to selection differential could be thought of as more important than estimates of heritability based on covariances between relatives in unselected base populations. Hill (1971) put the estimation of realized heritability from the comparison of divergent lines, or a selected line and a control line, on a sound and rigorous scientific and statistical basis, in particular deriving the appropriate sampling variance matrix for the generation-line means. Sheridan (1988) reviewed experiments in all animal species in which base population and realized heritability estimates were available, and concluded that there was not a strong agreement between the realized genetic and base genetic parameters. James (1990) suggested that a major cause of this discrepancy appeared to be sampling errors.

One concern with the validation of animal models is whether different types of relationship provide consistent information. Thompson & Atkins (1994) developed methods to allow within- and between-line information on genetic variation to be disentangled. This lead to an empirical verification of Hill's analytical justification for the sampling variances of the generation-line means. Thompson & Atkins (1994), found there was substantial agreement between the two types of estimates for an extensive sheep-selection experiment. Another attempt to disentangle the contribution of different components of the individual animal model to heritability estimation was due to Visscher & Thompson (1992), who fitted a model with different male and female heritabilities in the analysis of a British dairy cattle population. They showed that approximately two-thirds of the information came from female (mostly dam–offspring) relationships.

Another way of partially validating a model was the method R introduced by Reverter (Reverter *et al*. 1994*a*, *b*). The initial suggestion was to check whether a model is appropriate by comparing predictions based on ‘early’ data with predictions based on ‘recent’ data. Reverter *et al*. (1994*a*) show that the regression of ‘recent’ prediction on ‘early’ predictions is 1. Informally, this statistic is asking the question: does recent data change the prediction of early animals? In a sense this is looking backwards. By contrast, the analysis of selection experiments and the partitioning of likelihood often look forward and ask: does response agree with prediction for ‘recent’ animals? Reverter *et al*. (1994*b*) have suggested that the method could be extended to estimate genetic parameters, essentially choosing an estimated heritability to make the regression coefficient 1. One problem with ML techniques is the dependence of the procedures on having estimates of the prediction error variances. Method R introduced an ingenious algorithm that avoids calculation of prediction error variances. This method has been used in several large genetic situations (e.g. Misztal 1997; Van Tassell *et al*. 1999), but the method is not completely understood. A concern is whether or not the method can be used with selected data. Recent simulation evidence (Cantet *et al*. 2000; Schenkel & Schaeffer 2000) and theoretical evidence (Thompson 2001) has suggested that method R estimates are biased in selected populations.

The introduction of multiple-trait across country evaluation (MACE; Schaeffer 1994) has also led to interest in validating models. One concern is that if the genetic trend is not well estimated it might influence predictions of breeding values and their comparison. Boichard *et al*. (1995) introduced three methods for checking on the genetic trend. In the first, predictions from first parity records are compared with prediction from all records using a repeatability model, on the basis that the first parity model is less affected by parity effect biases. In the second method, daughter yield deviations are analysed within sire by calving-year to determine if these remain stable over time or if there is evidence of selection biases. The third method, an extension of that of Reverter *et al*. (1994*b*), compares recent with early predictions, but includes terms to measure differences between the two predictions and a term to allow the estimation of genetic trend. The methods have been influential in validating and improving predictions of data used by INTERBULL, although the results of the tests have not been published (U. Emanuelson, personal communication).

## 3. Genetic parameters for the uk test day model

Current software programs such as Asreml (Gilmour *et al*. 2002) for fitting the mixed model incorporate many of the developments described above. This allows problems involving a very large number of variance components to be tackled which would have been impossible even a few years ago. As an example, we consider work in preparation by I. M. S. White, S. Brotherstone and R. Thompson concerning the estimation of about 800 variances and covariances for the UK test day model.

Data were available on 22 343 daughters of 823 bulls. Milk (M), fat (F) and protein (P) yield in each of the first three lactations were analysed as nine traits. The mean lactation curve was fitted as a quartic polynomial function of days in milk (DIM). Sire and cow (permanent environmental) deviations from the mean curve were fitted as Legendre polynomials (*L*_{0}, *L*_{1} and *L*_{2}) with random coefficients. Covariance matrices were therefore 27×27 (3 variables×3 lactations×3 coefficients). The advantages of such random regression models are that they allow more appropriate modelling of environmental effects, more flexible modelling of the lactation curves, and the prediction of functions of these curves, including persistency. Space does not allow us to adequately discuss these random regression models, so we refer the reader to Meyer & Hill (1997) and Meyer & Kirkpatrick (2005) for a more extensive discussion.

The traits were analysed in sets of three, with milk in first lactation always included to allow for selection. There were thus 28 runs required to analyse all 9 traits, each run estimating a 9 × 9 submatrix of the complete 27×27 between- and within-sire matrices. The contemporary group (CG) effect was allowed to depend linearly on DIM, with an intercept and slope for each of milk, fat and protein (independent of lactation). Variation in intercept and slope was treated as fixed between herds and random within herds. This required the estimation of a 6×6 within-herd covariance matrix (intercept and slope for each of milk, fat and protein). Complete matrices were reconstituted from overlapping submatrices so that they were positive definite. This was achieved by using the Cholesky factors of submatrices as ‘pseudo-data’ for the full set of traits. This computationally attractive procedure is similar to the ‘iterative summing of expanded part matrices’ of Mäntysaari (1999). The idea can be illustrated with a simple example of 3 traits (*X*, *Y*, *Z*) with covariance matrix * V* (3×3). Suppose the traits are analysed in pairs, and the analysis of (

*X*,

*Y*), (

*X*,

*Z*) and (

*Y*,

*Z*) produces estimated covariance matrices (2×2) with Cholesky roots

**,**

*A***and**

*B***. The pseudo-data take the form**

*C*Here * represents a missing value. This pseudo-data is presented to a REML program as multivariate normal data with zero mean and covariance matrix ** V**. For the test day model, between- and within-sire matrices were dealt with simultaneously, with constraints to ensure that both genetic (4×between-sire) and permanent environment (within-sire−3×between-sire) components were positive definite. Residual variances and covariances were estimated separately for each stage of each lactation (i.e. there were twelve 3×3 residual covariance matrices for milk, fat and protein).

Each of the 28 runs estimated up to 135 variance–covariance components and involved up to 836 500 mixed model equations. Each iterative round took approximately 5 min on a Pentium 3, 850 MHz personal computer. Approximately 20% of the time taken in each iteration was concerned with the likelihood calculation and approximately 40% was concerned with the calculation of the sparse inverse of ** C**, in agreement with the theoretical suggestions. The other major proportion of time, 30%, was associated with the computation of the first differentials of the likelihood. The number of iterations varied from run to run and ranged from 10 to over 30 iterations. The increased iterations occurred when the estimated matrices were nearly singular.

### (a) Legendre polynomial coefficients

Figure 1 shows a colour image of the genetic correlation matrix for the Legendre polynomial coefficients, ordered as intercept, slope and curvature, then by lactation, then by the traits milk, fat and protein yield. Within each type of coefficient (intercept, slope or curvature) there are some strong positive correlations between the traits (M1–P3). Correlations between intercepts or curvatures and slopes tend to be small, while correlations between intercepts and curvature coefficients are generally substantial and negative, reflecting a tendency for deviation curves above the mean curve to be concave, and *vice versa*. The structure of the 27×27 matrices is approximately that of a Kronecker product of a 3×3 lactations matrix and a 9×9 traits matrix. Visscher (1991) and Veerkamp & Goddard (1998) have found similar multiplicative models useful in the analysis of multivariate dairy-cattle data. To a first approximation, the correlation between parameters in different lactations (e.g. intercept for milk in lactation 1 and slope for fat in lactation 2) is the product of two correlations, one depending only on the parameters chosen (milk intercept and fat slope), and one depending only on the two lactations (e.g. 1 and 2).

### (b) Heritabilities

Heritabilities calculated from the variance components are shown below. For milk, the average heritability in the first two lactations is about 0.45, with the maximum about day 150 (table 1). In lactation 3 the heritability is slightly higher on average and more nearly constant over the lactation. For protein, heritability drops from about 0.4 in lactation 1 to about 0.35 in lactation 3. Heritabilities for fat show a rising trend within lactations, but average values drop from about 0.35 in lactation 1 to about 0.25 in lactation 3.

### (c) Genetic correlations

Table 2 gives estimated genetic correlations between the traits at DIM=100 days.

Numbers on the diagonal are heritabilities. In each lactation there are strong correlations between protein and the other traits, and a weaker correlation between milk and fat. Cross-correlations between traits in different lactations follow the same pattern, but are reduced by a factor of approximately 0.8 for adjacent lactations and 0.6 for lactations 1 and 3. Lagged correlations (not shown) show a similar pattern, but are reduced by a factor depending on the separation of the time points.

## 4. Conclusions

We have shown that the area of genetic parameter estimation has advanced tremendously over the last 40 years, allowing more appropriate models to be fitted to larger datasets.

What do we expect to happen in the next 10 years? We expect to see even more complex models fitted using molecular information. The infinitesimal model is often a useful first step in understanding genetic data and remains the basis of practical improvement schemes. Even when more precise molecular models are fitted, it is often useful to include infinitesimal model terms to sweep up unexplained genetic variation.

Fernando & Grossman (1989) have shown how to incorporate marker information into prediction, essentially extending the Quaas (1976) results on the inverse of the ‘average’ relationship matrix to the case when there is information on markers in pedigrees. Meuwissen & Goddard 2000 show how to incorporate this work into fine scale mapping using both linkage and linkage disequilibrium information.

In some ways, work on estimation of variance parameters, validation of models and prediction of breeding values are thought of as separate exercises, partly because of the difficulty of estimating variance parameters on complete populations. Faster genetic progress might be possible if we could integrate the methods, for example through a synthesis of sampling and analytic methods. Work is needed to reconcile the sometimes discrepant results obtained from sampling variances of parameters, for example those calculated with the average information method, and confidence intervals obtained from profile likelihoods. This work may highlight cases where it is difficult to disentangle parameter estimates. Sales & Hill (1976) suggested formulae for genetic progress taking account of uncertainty in parameters in simple cases. We expect this to be extended to more complicated models, allowing alternative genetic models to be compared based on measures of genetic progress, rather than the statistical model selection criteria (e.g. Akaike's information criterion; Akaike 1973) that correct the model likelihood for the estimation of parameters.

## Acknowledgments

R.T. would like to thank Bill Hill for his multifaceted and unstinting support throughout his career; this has ranged from wise counsel to the Animal Breeding Research Organization and its successors and involvement in the joint supervision of over fifteen Ph.D. students (when R.T. learnt more than the students), to encouragement to learn at the feet of Oscar Kempthorne. The authors would like to acknowledge the comments and patience of Peter Visscher and the two referees.

## Footnotes

One contribution of 16 to a Theme Issue ‘Population genetics, quantitative genetics and animal improvement: papers in honour of William (Bill) Hill’.

- © 2005 The Royal Society