## Abstract

If genetic constraints are important, then rates and direction of evolution should be related to trait evolvability. Here we use recently developed measures of evolvability to test the genetic constraint hypothesis with quantitative genetic data on floral morphology from the Neotropical vine *Dalechampia scandens* (Euphorbiaceae). These measures were compared against rates of evolution and patterns of divergence among 24 populations in two species in the *D. scandens* species complex. We found clear evidence for genetic constraints, particularly among traits that were tightly phenotypically integrated. This relationship between evolvability and evolutionary divergence is puzzling, because the estimated evolvabilities seem too large to constitute real constraints. We suggest that this paradox can be explained by a combination of weak stabilizing selection around moving adaptive optima and small realized evolvabilities relative to the observed additive genetic variance.

## 1. Introduction

Linking macro- to microevolution is one of the fundamental challenges in evolutionary theory. Population and quantitative genetics provide precise predictions for the short-term dynamics of allele frequencies and phenotypes, but how far can these predictions be extrapolated? It is customary to distinguish two extreme positions. The first is the extrapolationist view that macroevolution is microevolution writ large, or simply that macroevolution can be fully understood by use of concepts and parameters from quantitative genetic theory (e.g. [1–6]). The alternative extreme is that macroevolution is decoupled from microevolution in such a way that microevolutionary theory is largely irrelevant, and different conceptual tools must be used when studying the two levels (e.g. [7–10]). Most biologists, including those cited above, would probably agree that the truth is somewhere in between these extremes, but exactly how far microevolutionary models can be extended remains an open question [11].

The research paradigm in evolutionary quantitative genetics initiated by Lande and Arnold (e.g. [12,13]) is a good illustration of the extrapolationist view. The fundamental assumptions of this approach include the view that quantitative genetic parameters such as the additive genetic, or at least the mutational, variance parameters remain stable over long stretches of time, allowing rather simple extrapolations of single-generation responses to selection. On this basis, predictions have been derived for patterns of among-species variation based on a variety of models from life-history theory, sexual selection, behavioural ecology or neutral theory (e.g. [14–16]).

A key test of the macroevolutionary relevance of evolutionary quantitative genetics is to see whether macroevolutionary divergence is influenced by patterns of genetic variation as measured in contemporary populations. If there is no such relationship, then either genetic constraints are not important or they are not captured by the observed patterns of genetic variation. Many studies have asked this question, most concluding qualitatively in support of constraint (see Discussion). However, such studies face substantial conceptual and methodological challenges [17–20].

The most important determinant of phenotypic divergence among populations is likely to be the dynamics of local peaks in the adaptive landscape [3,6,21–23]. Low evolvability will affect the degree of divergence by creating a lag or even precluding populations from tracking moving peaks in a changing environment. Only if evolvabilities are small relative to the rate of change in the adaptive landscape do we expect a real constraint on divergence. Thus, looking for a relationship between evolvability and divergence constitutes a test of the importance of constraints in evolution. In particular, it may help clarify the relevant timescales at which genetic constraints are important and thereby the generality of microevolutionary models.

Here, we use recently developed theory on the measurement of evolutionary potential in a multivariate context [17], and connect this with patterns of divergence by explicit evolutionary models. Our approach enables us to investigate how empirical data fit a range of evolutionary scenarios. We also investigate the effect of integration on divergence by comparing independent trait evolution within two sets of traits differing in their degree of evolutionary integration. To do this, we have estimated G-matrices of floral traits in two distinct, albeit unrecognized, species in the Neotropical *Dalechampia scandens* species complex, which we then compare with among-population divergence in 24 populations (12 each from the two species).

## 2. Theory

### (a) Measuring evolvability

Trait evolvability can be measured as the expected proportional response per generation to linear directional selection of unit strength [17,24,25]. Unit strength of selection is the strength of selection on relative fitness as a trait and is given by a (mean-scaled) selection gradient of unity. We denote this measure as *e*, and from the standard equations of quantitative genetics, we get *e* ≡ Δ*z/β* = *I*_{A}, where Δ*z* is the mean-scaled selection response, *β* is the mean-scaled selection gradient and *I*_{A} is the mean-standardized additive genetic variance [25]. A value of *e* of, say, 0.01 means that the expected response per generation per unit directional selection is 1% of the trait mean. In the following, we will drop the conceptual distinction between *e* and *I*_{A}, and just use the symbol *e*.

While *e* is a straightforward measure of evolvability for a univariate trait, the measurement of multivariate evolvability is more complicated, because the response to selection may then deviate from the direction of the selection gradient and the evolvability may be different in different directions in phenotype space. Hansen & Houle [17] proposed three measures of multivariate evolvability that we will consider here. These are all computed as functions of a given selection gradient, **β** (a column vector of partial regression coefficients), standardized to unit length and the additive genetic variance matrix, **G**. The ‘respondability’, *r*(**β**) = √(**β**‘**G**^{2}**β**) is defined as the expected length of the response vector; the ‘evolvability’, *e*(**β**) = **β**‘**Gβ**, is defined as the expected length of the projection of the response vector on the selection gradient; and the conditional evolvability, *c*(**β**) = (**β**‘**G**^{−1}**β**)^{−1}, is defined as the expected length of the response vector when the directional selection along **β** has come to a balance with assumed stabilizing selection orthogonal to **β** (*c*(**β**) depends only on the existence and not on the strength of the stabilizing selection [26]). The respondability may be interpreted as the ability to change in response to selection, the evolvability as the ability to change in the direction of selection, and the conditional evolvability as the ability to change in the direction of selection when there is stabilizing selection on the perpendicular directions. All these measures reduce to *e* when only a single trait is concerned.

### (b) Relating evolvability to evolutionary divergence

How standing genetic variation relates to macroevolutionary divergence is an open question. Simple models based on extrapolating constant selection and evolvability show that very large changes can be produced from typical estimates of selection strength and evolvability. For example, the mean trait value expected after *t* generations of constant evolvability, *e*, and selection gradient, *β*, is
2.1where *z*_{a} is the ancestral trait value. If we combine the median estimate of univariate evolvability for morphological traits from Hansen *et al*. [25] of *e* = 0.1% with the median mean-scaled selection gradient from Hereford *et al*. [27] of *β* = 0.9, we get a doubling of the trait value after 770 generations. Even if selection gradients of this strength are not likely to remain constant over long time periods (e.g. [28,29], but see [30]), this illustrates that typical evolvabilities are not likely to generate macroevolutionary constraints by themselves. The naive expectation from this is that among-species variation is generated by differences in adaptive optima, and that phylogenetic effects have to do with similarities in the optimal states of related species [22,23]. For example, under simple quadratic stabilizing selection, the rate of evolution towards an optimum, measured in generations, would be *βe* = –2*s*(*z* − *θ*)*e*, where *s* is the mean-standardized curvature of the fitness function, *θ* is the optimum, and the distance from the optimum is also measured in units of the trait mean. The time it would take to move half the distance towards the optimum under this model would be *t*_{1/2} = ln2/(2*se*) [23]. With *e* = 0.1% and even a relatively small *s* = 1 (which implies the mean would have to be shifted 45% of the optimum to give *β* = 0.9) it would give *t*_{1/2} ≈ 350 generations, which is again nearly instantaneous on macroevolutionary timescales.

Still, there are many indications of correlations between measures of evolvability and among-population variation (see Discussion). Hence, it is at least possible that evolvabilities, and particularly conditional evolvabilities, of some trait combinations may be small enough to constitute detectable constraints on macroevolutionary timescales. If so, we may find a relationship between measured evolvabilities and among-species variation. Note that proportional changes scale with evolvability, so that we expect among-species variances to scale with the square of evolvability. From equation (2.1), we get
2.2where *ρ* is a constant, and the variance in selection gradients may result from different directions of selection in different populations or from fluctuating selection gradients. For stable differences in direction of selection, we expect scaling with the square of the time since divergence (a scaling exponent of *ρ* = 2), while for fluctuating selection gradients, we expect linear scaling with the time since divergence (a scaling exponent of *ρ* = 1) because the trait mean will then evolve as a Brownian motion. The scaling with the square of evolvability differs from predictions under neutral models, where among-species variance scale linearly with evolvability (e.g. [31–33]). If the trait is tracking a moving optimum, we get different scaling relationships (e.g. [34,35]). Under a simple model of quadratic selection, outlined above (*βe* = −2*s*(*z* − *θ*)*e*), around an optimum, *θ*, that moves according to an Ornstein–Uhlenbeck process (see appendix A in the electronic supplementary material for the analytical derivation), the equilibrium among-species variance in the trait mean becomes
2.3where *V* is the stationary variance of the optimum and *α* is the pull parameter in the Ornstein–Uhlenbeck process. This yields a positive relationship between among-population variance and evolvability that eventually flattens out at an asymptote (figure 1). Note that if , the among-population variance goes to zero. The optima move too fast to be tracked and the populations will experience this as a constant (multiplicative average) optimum. If, on the other hand, the population can track faster than the optimum moves , then among-population variance converges on the variance of the optima, and the relationship with evolvability disappears. Hence, stationary fluctuating optima can explain a relationship between evolvability of traits and among-population variance if at least some of the traits have rates of adaptation (2*es*) of the same order of magnitude as the rates of movement in the optimum (*α*). Note also that this common rate would have to be consistent with observed phylogenetic signal in the data. This may require phylogenetic half-lives (*t*_{1/2} = ln2/*α* ≈ ln2/2*es*) on the order of 100 000 generations or more.

These considerations concern the divergence of a one-dimensional trait. Linking evolvability to patterns of multivariate divergence is more complicated, because we rarely have direct information about the multivariate directions of selection or the positions of optima. In most cases, we only have differences between populations to go by. This makes it unclear whether it is the respondability, the evolvability or the conditional evolvability that is most relevant statistic to use. We will assess all of these and test which one gives the best predictions.

## 3. Material and methods

### (a) Study species and blossom traits

*Dalechampia scandens* L. (Euphorbiaceae) is a Neotropical vine distributed from Mexico to Argentina. Its blossoms (pseudanthial inflorescences) comprise a cluster of three pistillate flowers situated below a cluster of 10 staminate flowers (figure 2) [36,37]. Each female flower contains three ovules so that each blossom can produce a maximum of nine seeds. The flower cluster is subtended by two involucral bracts that may provide a signal to pollinators [38], and may also have a protective role as they close to protect the whole structure at night and during fruit maturation [39]. A gland that produces terpenoid resin is associated with the staminate flowers. The resin varies in colour among *Dalechampia* species and is collected for use in nest construction by various bees in the genera *Eulaema, Eufriesea, Euglossa* (Apoidea: Euglossini), *Hypanthidium* (Megachilidae: Anthidiini) and/or *Trigona* (Apidae: Meliponini). Which bees are attracted depends on characteristics of the blossom and location of the population [40]. Larger glands produce more resin [41], and blossoms with large glands attract larger bees than blossoms with smaller glands [40,42,43]. The efficiency with which different bees transfer pollen is influenced by the distances between the resin gland and the stigma (GSD) and anthers (GAD) [40,42,44].

*Dalechampia scandens* contains at least two reproductively isolated groups with overlapping geographical distributions (figure 3). The two ‘species’ differ in blossom size, and particularly in the size of the resin gland. Microsatellite analysis show that they fall out as well-separated groups on the phylogeny (figure 3). Despite many attempts, we have never managed to obtain fertile hybrids between these two species [45,46]. Judging from blossom morphology, the small-glanded species seems to be able to self-pollinate more easily than the large-glanded species.

The morphological measurements used in this study are illustrated in figure 2 and summarized in table 1. One observer (C.P.) measured all plants in the quantitative genetics experiments, and a second observer (G.H.B.) measured all plants in the among-population dataset. Two blossoms were measured for most plants.

### (b) Quantitative genetics experiments

The plants used in the quantitative genetics analyses were derived from seeds collected in two distinct populations, a large-glanded population near Tulum, Mexico (20°13′ N; 87°26′ W) and a small-glanded population near Tovar, Venezuela (8°21′ N, 71°46′ W). Fruits with seeds were collected from separate individuals in these two populations in 1998. We germinated one seed per maternal family and conducted two separate block diallels in which 12 and nine sets of five parents, in Tulum and Tovar, respectively, were combined into complete 5 × 5 diallels with both reciprocals and selfed crosses. Two individuals were raised from each cross and subsequently measured. The first diallel (Tulum) was conducted between 1999 and 2000 and results from this have been published [24,47,48]. The second diallel (Tovar) was conducted between May 2005 and June 2006. The measurements in the two diallels were similar, but while blossoms with one to three open male flowers were measured in the Tulum diallel, only blossoms with a single open male flower were measured in the Tovar diallel.

### (c) Among-population data

In total, we obtained data on trait means from 24 populations (electronic supplementary material, tables S1–S3), including the two populations on which the quantitative genetics experiments were conducted (see above). The measured plants in the remaining 22 populations were from fruits with seeds sampled from roadsides in Mexico (states of Veracruz, Tabasco, Campeche, Yucatán and Quitana Roo; see the electronic supplementary material, table S1 for exact locations) during the autumn of 2007. All plants were grown in the same greenhouse in Trondheim (Norway) during the same time period. Sample sizes ranged from one to 33 (median 12) plants per population (electronic supplementary material, table S2).

We constructed a neighbour-joining tree (figure 3) based on the genetic-distance measure *D*_{A} [49] using 70 microsatellite markers developed for *D. scandens* [50]. The genetic distances had a perfect tree structure, suggesting limited gene flow between populations. We therefore interpret the genetic distances as reflecting time since divergence. The program Populations 1.2.31 was used to estimate this tree, which we interpret as a phylogeny. See appendix B in the electronic supplementary material for details.

### (d) Data analysis

All analyses were conducted on two sets of traits (table 1). The first set included five functionally related traits: gland–anther distance, gland–stigma distance, style width, gland size (square root of gland area) and bract size (square root of bract area). The second set included six morphologically integrated bract traits: upper-bract-length centre, upper-bract-length sides, upper-bract width, lower-bract-length centre, lower-bract-length sides and lower-bract width.

#### (i) Within-population variation

For the quantitative genetic experiments, we fitted mixed models with the R package `MCMCglmm` [51]
where *z* is the trait value, *u* is the trait mean, *a* is the breeding value, *b* is the non-genetic plant-level effect, *d* is the measurement-date effect and *q* is the residual within-plant effect. The subscripts, *i*, *j*, *k* and *l* represent trait type, plant, day and blossom, respectively. The random effects are assumed to be distributed as **a** ∼ *N*(**0**, **G** ⊗ **A**), **b** ∼ *N*(**0**, **B** ⊗ **I**), **d** ∼ *N*(**0**, **F** ⊗ **I**) and **q** ∼ *N*(**0**, **E** ⊗ **I**), where **A** is the additive relatedness matrix, **I** is the identity matrix and ⊗ is the Kronecker product. The model estimates the additive genetic variance matrix **G**, the among-plant environmental variance matrix **B**, the among-date variance matrix **F** and the residual variance matrix **E**. The traits were mean standardized before the analyses to obtain mean-standardized variance matrices. The complete posterior distributions of all G-matrices are reported in the electronic supplementary material, tables S4–S7.

As priors for the Bayesian mixed models (`MCMCglmm`), we used zero-mean normal distributions with very large variances (10^{8}) for the fixed effects, half-Cauchy distributions with scale parameter 20 [52] for the variance components, and inverse-Wishart distributions with parameters **V** and *n* for the residuals. The matrix parameter **V** was a crude guess based on the phenotypic variance matrix, and the value of *n* was set to *x* − 0.998, where *x* is number of traits in the analysis [51]. The models were robust against changes in the priors, but note that the influence of these priors on functions of variance components is not well understood (JD Hadfield 2014, personal communication). These models ran for 1 100 000 iterations, with a burn-in phase of 100 000 and a thinning interval of 1000. Visual inspection of trace plots showed that the posterior distributions had good convergence and mixing of chains. The autocorrelation was less than 0.1 per sampled iteration for almost all chains.

#### (ii) Evolvability measures

To calculate evolvability measures from the estimated G-matrices, we followed the approach of Hansen & Houle [17]. We have implemented functions to calculate these measures in the R package `evolvability` (see appendix C, electronic supplementary material). The measures *e*(**β**), *r*(**β**) and *c*(**β**) are explained in the theory section. We also use a measure of evolutionary integration, *i*(**β**) = 1 − *c*(**β**)/*e*(**β**), that measures the fraction of additive genetic variance bound up in the other traits. This integration index varies between zero, no integration, and one complete integration. To calculate the average, the minimum and the maximum evolvability (*e*_{mean}, *e*_{min} and *e*_{max}) for each G-matrix, we used the average, the minimum and the maximum of the eigenvalues. The minimum and maximum evolvability correspond to the evolvability in the directions of the smallest and largest eigenvector of **G**, while the average evolvability corresponds to the expected evolvability in a random direction. To calculate the average respondability, conditional evolvability and integration (*r*_{mean}, *c*_{mean} and *i*_{mean}) we took an average over 1000 random selection gradients uniformly distributed on the unit sphere for *r*(**β**), *c*(**β**) and *i*(**β**) instead of using the analytical approximations in Hansen & Houle ([17], see also [53]). Note that some of these measures are biased due to estimation error in the estimated matrices. For example, the largest eigenvalue is overestimated and the lowest eigenvalue is underestimated. The mean of the eigenvalues is not biased, however.

To generate the set of random selection gradients uniformly distributed in *k* dimensions, we used the function `randomBeta` in the `evolvability` R-package. This function samples each element of each vector (selection gradient) independently from a zero-mean Gaussian distribution with unit variance, and subsequently normalizes each vector to unit length.

For direct comparison of G-matrices, we calculated the squared correlation coefficient, *R*^{2}, between evolvabilities, *e*(**β**), *r*(**β**) and *c*(**β**), computed along 1000 random selection gradients. These estimates of *R*^{2} describe the amount of variance across directions in the evolvability parameter of one G-matrix that can be explained by the same parameter from another G-matrix. All measures were calculated at each iteration of the posterior distribution of the G-matrices to include uncertainty.

#### (iii) Among-population variation

We estimated the among-population variance using phylogenetic mixed models [54,55]. Because of the small number of populations considered, we fitted only univariate models for the among-population variance. We used the natural logarithm of the trait values in the analysis because variances on this scale are directly comparable to evolvability measures. The small- and large-glanded populations were analysed separately. The phylogenetic mixed models were specified as
where *z* is the trait value, *u* is the trait mean, *a* is the phylogenetic effect, *p* is the residual population effect, *b* is the plant-level effect, *d* is the measurement-date effect and *q* is the residual within-plant effect. The subscripts, *i*, *j*, *k* and *l* represent population, plant, day and blossom, respectively. Random effects were assumed to be identically independently normally distributed, with the exception of the phylogenetic effects for which we allowed phylogentic correlations as , where **A** is the phylogenetic relatedness matrix composed of shared branch lengths between populations. We also fitted models in which the residual population effects (*p*) were excluded, which gave us an estimate of the evolutionary rates , that is, the phylogenetically corrected among-population variance. This is the parameter we used when comparing population divergence and evolvability. Evolutionary rates are measured as increase in variance per unit of time, here the length of the phylogeny, among taxa evolving independently as if by a Brownian motion. The phylogenetic mixed models were fitted using the R-package `MCMCglmm` with the same specifications as in the genetic models.

#### (iv) Population and species divergence

To understand whether population differentiation had happened along lines of high evolvability, we compared evolutionary rates and evolvability (*e*, *r* and *c*) for all traits and for 1000 random directions (unit-length selection gradients). The different evolutionary models predict different scaling relationships between evolvability and evolutionary divergence (figure 1). These scaling relationships were investigated by plotting the log evolutionary rates against log evolvability and comparing this to isometry (a scaling exponent of 1). We did not estimate the scaling exponents directly because a rigorous statistical method for this has not yet been developed. The direction of the vector of species divergence (**β**_{species}) was compared to the range of evolvabilities of the G-matrices to see whether this direction had high or low evolvability. Each element of **β**_{species} was calculated by subtracting the average trait values (on the natural log scale) of the small species from the average trait values of the large species estimated in the phylogenetic mixed models, and dividing by the norm of this vector for standardizing to unit length. Uncertainty was assessed by evaluating the complete posterior distributions.

Morphological integration may constrain the independent evolution of individual traits [56]. We investigated the effect of integration by comparing *i*(**β**) with the fraction of independent among-population variance (the ratio of conditional variance among populations over the total among-population variance). Because of statistical power, we were only able to estimate two-dimensional among-population variance matrices using the phylogenetic mixed model described above (not including the independent population effects, *p*). We therefore only compared the pairwise combinations of all traits for autonomy and fraction of independent among-population variance. All analyses were done in R v. 2.15.2 [57].

## 3. Results

### (a) Patterns of evolvability

Average functional-trait evolvabilities were *e* = 0.55% and 0.23% for the small-glanded and large-glanded species, respectively (table 2). These averages are well within the normal range, but larger than the median *e* = 0.09% for linear traits reported in a recent compilation [25]. The traits ranked similarly regarding evolvability in the two species, with the striking exception of gland–anther distance being the most evolvable (*e* = 0.85%) in the small-glanded species, but the least evolvable in the large-glanded species (*e* = 0.06%; table 3). Much of the variation in trait evolvabilities was probably due to estimation error. The functional traits were not very integrated, with mean integration across random directions of *i* = 0.31 and 0.44 for the small- and the large-glanded species, respectively (table 2). This means that conditional evolvabilities on average were as high as 69% and 56% of the unconditional evolvabilities. Hence, most combinations of functional traits had a high degree of independent evolutionary potential.

Bract traits had average evolvabilities of *e* = 0.40% and 0.27% for the small- and large-glanded species, respectively (table 2). These averages were similar to those of the functional traits, but the bract traits were much more integrated and had more equal evolvabilities (tables 2 and 4). The high average integration of *i* = 0.91 and 0.92 means that conditional evolvabilities would be very low for most combinations of bract traits. The averages of *c* ≈ 0.01% indicate a potential change of only a hundredth of a per cent per generation under unit selection, and the minimum evolvabilities were almost an order of magnitude below this. Note, however, that bract traits may still be highly evolvable along a few directions. Indeed, the maximum evolvabilities, which equal the maximum conditional evolvabilities, were 2.1% and 1.5% for the small- and large-glanded species, respectively (table 2).

Although the small-glanded species was roughly twice as evolvable as the large-glanded species, their G-matrices were qualitatively similar in the general patterns and levels of integration. There were a lot of particular differences, however, and, for the functional traits, the evolvability measures along different directions were poorly correlated between the matrices (*R*^{2} = 12%, 12%, 7% for *r*, *e* and *c*, respectively). The bract-trait matrices were more consistent (*R*^{2} = 96%, 97%, 36% for *r*, *e* and *c*, respectively).

In both small- and large-glanded populations, the patterns of respondability were similar to the patterns of evolvability. We will therefore not discuss respondability further (but see the electronic supplementary material). As for non-genetic variance components, we note that there were large components of temporal variance (‘day’) and that most of the residual variance was among individual blossoms and not among plants.

### (b) Patterns of evolutionary rates

The phylogenetic structure explained a substantial amount of variation in all the measured traits, with phylogenetic heritabilities [54] ranging from approximately 0.5 to 0.8 (tables 5 and 6). For this reason, we focus on estimated evolutionary rates along the phylogeny instead of the raw among-population variances. These evolutionary rates are the phylogenetically corrected among-population variances . We focus on the square roots of these variances (i.e. CV_{rate}) because they scale isometrically with the evolvabilities under linear selection (see equations (2.1) and (2.2)). The mean-scaled variance accumulated over the length of the phylogeny was around 0.02 (ln mm)^{2} for all traits; this equals a CV_{rate} of around 14%. The exception was bract traits in the small-glanded species, which had rates of evolution an order of magnitude lower than the other traits (tables 5 and 6).

### (c) Relationship between evolvability and divergence

Note first that the rates of evolution were very small relative to the estimated evolvabilities. With our estimated evolvabilities, changes of this magnitude could be produced by natural selection over just a few generations. We do not have direct information about the absolute age of the phylogeny, but because the deepest split in the phylogeny is between species, we regard it as likely that the populations within each species have been separated by hundreds of thousands of years. Hence, there is no obvious reason to expect an influence of genetic constraints. However, table 7 shows that there was high evolvability in the direction of the species divergence, and figures 4 and 5 show that there was a relationship between evolvability and population divergence, with populations having diverged more in directions of high evolvability. This holds true for conditional and unconditional evolvability in both functional traits and bract traits in both species. Note in particular the strong, nearly isometric, relationship between evolvability and evolutionary rate in the bract traits shown in figure 5. These relationships are not just due to a vague general match between the G-matrices and the among-population variance matrices. If we swap the G-matrices, and try using the small-glanded G-matrix to predict population divergence in the large-glanded species or vice versa, the relationships disappear for the functional traits (electronic supplementary material, figures S1 and S2). This underscores that **G** can change over time, and thereby changing the predictions of among-population divergence. In general, divergence is best predicted by evolvability, less well by conditional evolvability and hardly at all by respondability (see the electronic supplementary material, figures S3 and S4 for results on respondability).

General estimates of conditional evolvability involving many traits are error prone, and it is not surprising that the relationship with among-population variation was noisy. We can get more precise estimates by conditioning single traits on each other. This is asking how much one trait is likely to constrain the evolution of another trait. We did this for all pairwise combinations of traits and then tested whether the integration (*i* = 1−conditional evolvability/evolvability) between pairs of traits predicts the independent evolution of the traits. There was a strong negative relationship between integration and independent evolution for the large-glanded species, but a less clear relationship for the small-glanded species (figure 6).

## 4. Discussion

Our results are consistent with genetic constraints on trait divergence. Both the direction with highest among-population divergence and the direction of divergence between the two species had high evolvabilities compared to average directions (figures 4 and 5; table 7). This was true for both the functionally related pollination traits and for the morphologically related bract traits. The two trait groups differed strikingly in their patterns of integration, however, with the morphologically related bract traits being much more integrated, with much lower conditional evolvabilities.

### (a) Modes of evolution in *Dalechampia scandens*

*Dalechampia* blossom morphology is under direct selection from both pollinators and seed predators [38,58–60]. Comparative analyses show that the fitness optima of *Dalechampia* blossom traits are influenced by several factors, including bee community composition, availability of other resin sources for the bees, presence of other *Dalechampia* species and energetic constraints [40,42,44,61], but only a small part of the interpopulation variation has been explained by models of selective factors [44,61]. This may be due to the crude way the selective factors have been modelled, or it may be due to genetic constraints. Previous studies have shown that pleiotropic constraints can be important in the evolution of blossom traits in *D. scandens* [47,62] and in *Dalechampia* in general [63]. This is supported by this study.

The scaling exponent between evolvability and evolutionary rates for the functional traits was clearly below one for the large-glanded populations and closer to one for the small-glanded populations (figure 4). Such scaling relationships are consistent with models of moving optima in which the population means can almost keep pace with their optima (*α* < 2*es*, figure 1). The similarity in the scaling relationship for unconditional and conditional evolvability makes it hard to judge if stabilizing selection has constrained evolution in certain directions or not.

Among the functional traits, bract size in the small-glanded populations stands out. This trait has a similar amount of additive genetic variance as the other functional traits, but much less among-population variance. This suggests that there is less dispersion of optima for this trait. This pattern is consistent across all bract traits in the small-glanded populations.

The relatively tight scaling relationship, with a scaling exponent below one, between evolutionary rate and evolvability for the bract traits in the small-glanded populations (figure 5) indicates that the population means lag behind their moving optima (*α* ≈ 2*es*, figure 1). With an evolvability of say 0.1%, and moderately weak stabilizing selection (e.g. *s* = 1), the value of *α* will be too large to be consistent with the observation of phylogenetic signal (*α* ≈ 2*es* = 0.001 gives *t*_{½} ≈ 700 generations). For very weak selection (e.g. *s* = 0.01), however, this model may be plausible (*α* ≈ 2*es* = 0.00001 gives *t*_{½} ≈ 70 000 generations). The tight isometric relationship observed for the large-glanded populations (figure 5) can also be consistent with the same model, but the stabilizing selection needs to be even weaker, because *α* would need to be of the order of 10*es* to be consistent with the isometric relationship (figure 1). Such an isometric relationship is also consistent with models of neutral evolution, but the ratio of among-population variance to evolvability, which equals the ratio of generations to effective population size under drift, is orders of magnitude too small.

The main difference between functional and bract traits was their degree of evolutionary integration. This had a strong effect on the relationship between **G** and divergence in both the small- and large-glanded species. The effect of integration on evolution was also reflected in the correspondence between the integration index and independent evolution of the traits in the large-glanded populations (figure 6). This reinforces previous results indicating correlated evolution among blossom traits in *D. scandens* [47,62]. Note, however, that several traits achieved independent evolution despite a high level of integration in the small-glanded populations.

### (b) The paradoxical relationship between **G** and divergence

Taken at face value our evolvability estimates and even most of our conditional-evolvability estimates predict very rapid evolution at macroevolutionary timescales. Yet, we also find clear evidence for a relationship between evolvabilities and patterns of evolution indicating that genetic constraints may be important. How can these two results be reconciled? It is not that this finding is unique to our study. We know of many studies reporting a relationship between patterns of genetic variation and population divergence or evolutionary rates [17,24,64–80] and also many studies reporting a relationship between phenotypic variation and divergence [69,81–92]. The macroevolutionary relevance of evolvability is not that clear cut, however, and some studies have failed to find a relationship and concluded that genetic constraints are not important for divergence [33,84,85,93–100]. Before generalizing from this body of work it is important to realize that there are many unsolved methodological problems stemming both from the difficulties of constructing quantitative measures of constraints in absence of a realistic quantitative theory of macroevolutionary change on a wide range of timescales, and from statistical difficulties with achieving reasonably accurate estimates of **G**. The field is also marred by fundamental measurement/theoretical problems such as use of inappropriate or incommensurate scales, use of theory-free indices and use of statistical significance testing in place of estimation [101]. Hence, the seemingly clear evidence that evolutionary divergence is often constrained must be regarded as tentative. However, we think that the problems will tend to obscure the relationships between evolvability and divergence rather than enhance them. We will briefly go through some such problems and evaluate the studies that have found evidence against the constraint hypothesis in this light.

A common problem, especially with the early studies, is the use of correlation matrices (e.g. [84,94,95,97]). Correlation matrices are poorly suited for investigating the relationship between evolutionary potential and divergence, because they may obscure any order among the measured traits both in amount of divergence and in amount of genetic variation by standardizing all these values to one. The severity of this problem can be seen from the finding that there is no correlation between mean-scaled and variance-scaled additive variances, i.e. no correlation between ‘evolvability and heritability’ [25,102]. Hence, any relationship between ‘evolvability’ and divergence is predicted to be completely randomized after variance standardization such as forming a correlation matrix. We suspect that some studies have failed to find evidence for constraint due to variance standardization.

Schluter's [79] test of constraints based on estimating the angle between the direction of divergence and the largest eigenvalue of **G**, *g*_{max}, is a common denominator across many studies not finding evidence for constraints [85,93,96,97,99,100]. However, this method cannot falsify the constraint hypothesis, because there may be more than one direction with high evolvability [17,66,87,103]. Note that estimating the angle to the directions of several eigenvalues of **G** cannot falsify the constraint hypothesis either, because there may be directions of high genetic variance in between the eigenvectors.

Many studies test constraints by use of various matrix comparison methods to assess the similarity between the G-matrix and among-population D-matrices [33,84,93–95]. Some of these methods such as correlation of matrix elements, often in combination with the overused Mantel tests, have obscure meaning and little statistical justification, but others such as common-principal-component analysis may at least be statistically correct (but see [104]). It is hard to interpret the results from such methods, however, because there is no established theoretical link between the dissimilarity statistics and evolutionary models. Two matrices may be simultaneously similar and dissimilar in many different ways. We do not know how to recognize the influence of genetic constraints in such studies.

Note that *Q*_{ST} − *F*_{ST} studies are not directly relevant for testing the genetic constraint hypothesis, because these are designed to test the null hypothesis of neutral divergence and usually not the relationship between **G** and **D** beyond this ([105–108], but see [98]).

We are left with the two studies of Chenoweth & Blows [98] and Kimmel *et al*. [100] that convincingly demonstrate no constraining effect of **G** on the evolutionary divergence. However, one of these, Chenoweth & Blows [98], is not consistent with the conclusion of a reanalysis of the same data [72] regarding evidence for constraints. We therefore conclude that there is little evidence against and quite a lot of evidence for, the genetic constraint hypothesis, although the methodological problems are also abundant in several of the studies that report a relationship between **G** and divergence. At the same time, quantitative genetic estimates of additive variance generally support high evolvabilities [25,102], and this sends us back to the question of how seemingly high evolvabilities can still be correlated with evolutionary change on million-year timescales.

We consider three possible explanations for the paradoxical relationship between **G** and divergence. First, natural selection may shape within- and between-population variation in a similar manner. This is hard to rule out in the absence of direct information about historical patterns of selection or the movement of fitness optima, but in our opinion theory does not support a strong match between **G** and patterns of selection [109,110]. Also, Blows *et al*. [111] did not find any relationship between a G-matrix and estimated patterns of selection. It is particularly hard to believe that natural selection can explain the paradox when there is strong match between the patterns. For example, in the case of our bract traits in the large-glanded population (figure 5), the distribution of fitness peaks must be almost exactly proportional to **G**.

Second, adaptive optima may move within a restricted area at a pace at which they can be tracked, but not reached. If so, there will be a correlation between evolvability and divergence because populations will track better in directions with high evolvability. This requires, however, that stabilizing selection is very weak; otherwise, the models predict that the population means would perfectly track the optima for any observed levels of evolvability (see equation (2.3)).

The third and last alternative is that realized evolvabilities are much smaller than measured evolvabilities, yet correlated with them. For example, it is possible that a conditional evolvability relative to a set of unmeasured traits under stabilizing selection could be quite small, due to a high degree of pleiotropy, and also correlated with the unconditional evolvabilities, since they both depend on the total variation. Similarly, for macroevolutionary changes, standing genetic variation may be less relevant for long-term response than the supply of ‘mutational evolvability’ based on how much genetic variation is generated each generation, and the mutational and standing evolvabilities are likely correlated. This last alternative is in line with several recent reviews concluding that there is good evidence that genetic constraints are important in evolution [11,18,112,113].

Distinguishing between these and other explanations is an empirical question, but in our opinion, a combination of relatively weak stabilizing selection and small realized evolvabilities is tentatively the most plausible. This can explain why some populations evolve fast on microevolutionary timescales [4,6,114,115], which would not have been possible if realized evolvabilities are always very small, and, at the same time, not completely at odds with the general notion of strong natural selection [18,27,29,116].

## Data accessibility

Data available from the Dryad Digital Repository doi:10.5061/dryad.4ff1t.

## Funding statement

We thank the Norwegian Research Council (grant no. 196494/V40 to C.P.) and the US National Science Foundation (grant nos. DEB-0444157 to T.F.H. and DEB-0444745 to W.S.A.) for financial support.

## Acknowledgements

We thank Liv Antonsen and Grete Rakvaag for maintaining the plants in the greenhouse at the Department of Biology, Norwegian University of Science and Technology, the Instituto de Ecologíca (INECOL) in Jalapa, and J. Garcia-Franco, V. Rico-Gray, J. Lopez-Portillo, V. Vazques and E. Bjørkvoll for logistic support.

## Footnotes

One contribution of 14 to a Theme Issue ‘Phenotypic integration and modularity in plants and animals’.

© 2014 The Authors. Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/3.0/, which permits unrestricted use, provided the original author and source are credited.