Sources of non-independence in population data. (a) Diagrammatic representation of the history of population splits and gene flow linking populations within species. This history results in three genetic contributions to measured population phenotypes, shown diagrammatically for a set of four populations: (i) contributions owing to shared common ancestry (represented by the colours of internal branches in the population tree), (ii) evolution specific to each population owing to selection and drift (represented by colour changes along terminal branches), and (iii) impacts of gene flow (exchange of migrants or gametes) between populations (indicated by arrows, for simplicity shown only for population 1). (b) Gene flow brings into a recipient population a subset of the genetic variation in source populations. Three source populations (1–3) contribute migrants to a recipient population (4). Imagine recipient population 4 has a higher value for a trait (distribution x in the frequency distribution diagram at right) under selection/drift equilibrium than the source populations (which, for simplicity, all share distribution y). Migration into population 4 followed by interbreeding displaces the trait value distribution for this population downwards to a new equilibrium (distribution z). The impact of gene flow is greatest when, relative to a recipient population, source populations have very different equilibrium trait distributions and contribute large numbers of migrants. Under such circumstances, the phenotypes measured in any population may be a poor guide to the selective forces acting on it. Migration effects must be accounted for before local selective effects can be estimated. (c) Population models assumed by different analytical approaches discussed in the text. Assumption of population independence implies no impact of either gene flow or history. This occurs when there is no gene flow and populations are either entirely unrelated (i) or influenced only by population-specific processes (ii), as might happen when selection acting on populations is so rapid and strong that ancestral states can be ignored. Analyses that incorporate only population history (iii) assume no gene flow, while analyses that incorporate only gene flow (iv) assume no population similarity through common ancestry.
Consequences of phylogenetic non-independence for inferring relationships between variables across populations. Consider four populations, with mean values for two variables (independent variable x and dependent response variable y) as shown at top right. Forgetting gene flow for the moment, if these populations are equally unrelated phylogenetically (a), data for them can be considered independent, and the relationship across all four populations is a positive correlation (b). However, imagine that populations 1–2 and 3–4 comprise two pairs of closely related populations (c). The high trait values shared by both 1 and 2 (and the low values shared by both 3 and 4) are likely not to be independent, but to reflect low divergence within each pair from a common ancestor with high and low trait values, respectively. Now the relationship between x and y is negative within each population pair (black lines in (d)), but positive when analysed across the ancestors of each population pair (red line). Each of these three relationships is phylogenetically independent. A different pattern of relationships among the same set of populations can generate diametrically opposing relationships between x and y, as shown in (e). Now the relationship within each species pair is positive (black fitted lines in (f), right), while the relationship across the ancestors of the two species pairs is negative. These issues pertain whether the populations are sampled in the wild or grown in a common garden or provenance trial.