## Abstract

Hamilton's formulation of inclusive fitness has been with us for 50 years. During the first 20 of those years attention was largely focused on the evolutionary trajectories of different behaviours, but over the past 20 years interest has been growing in the effect of population structure on the evolution of behaviour and that is our focus here. We discuss the evolutionary journey of the inclusive-fitness effect over this epoch, nurtured as it was in an essentially homogeneous environment (that of ‘transitive’ structures) having to adapt in different ways to meet the expectations of heterogeneous structures. We pay particular attention to the way in which the theory has managed to adapt the original constructs of relatedness and reproductive value to provide a formulation of inclusive fitness that captures a precise measure of allele-frequency change in finite-structured populations.

## 1. Introduction

‘Hamilton's Rule’ [1] states that a trait will be favoured by selection when the overall fitness benefit to the recipient multiplied by its genetic relatedness to the actor is greater than the overall fitness cost to the actor [2]. In symbols,
1.1and here we use boldface notation for **b** and **c** to emphasize the principle that they must represent the total fitness effect on the individual [3,4]. This will distinguish it from the fecundity effects *b* and *c*, which will generally represent partial fitness effects. This rule is a mantra for many evolutionary biologists and there is good reason for that. Although the formulation covers only the fitness effects on two individuals (actor and recipient), it is simple and compelling and does capture Hamilton's fundamental insight that the effect on the recipient should carry a weight representing relative probability that affects the gene responsible for the behaviour. More generally of course, particularly in evolutionary graphs, an item of behaviour will typically affect the fitness of many individuals, who can all be regarded as ‘recipients’, and what is called *the inclusive-fitness effect* is the weighted sum of these fitness effects **b*** _{i}* where the weights are the relatedness coefficients

*R*of the actor to the

_{i}*i*th recipient (one of whom will be the actor itself) 1.2In his 1964 paper, Hamilton argued that under certain general assumptions, an allele causing such behaviour will increase in frequency precisely when

*W*

_{IF}is positive. In rare cases, there might be only two affected recipients, the actor 0 and its ‘partner’ 1, and in that case 1.3and with

**b**= –

_{0}**c**and the relatedness of an actor to itself equal to

*R*

_{0}= 1, we recover the simple version.

Our purpose here is to trace the development of the theory of inclusive fitness with particular attention paid to the effect on the evolutionary outcome of population structure, all the way from the implicit homogeneity of the early models to a study of the explicit effects of heterogeneity and class structure. Our setting is finite evolutionary graphs and our particular focus is on recent developments around coefficients of relatedness and reproductive values (RVs). We illustrate our remarks with two examples, one homogeneous (the 5-cycle) and the other heterogeneous (the 3-star).

## 2. Evolutionary graphs

We work with a finite-structured population of constant size. We represent it as a graph, that is, a finite set of nodes *i* with edges (*i*, *j*) between certain pairs of nodes [5]—see, for example, figure 1. Nodes that are joined by an edge are called neighbours. Each node is occupied by a haploid asexual individual and at the end of each time interval, one node is chosen to produce a single offspring, which displaces one of its neighbours. Each individual carries one of two possible alleles *A* and *B* assorting at a fixed locus. Offspring carry the parental allele except for mutation, which occurs at birth with a small probability *u.* In that case, both *A* and *B* mutate to an *A* form with probability *p _{N}* and to a

*B*form with probability 1 –

*p*. We choose this notation as

_{N}*p*turns out to be the long-term allele frequency (frequency of

_{N}*A*) in the neutral population.

We suppose that in each time interval each *A*-individual, as actor, gives pay-off *b* to each of its neighbours at cost *c* per neighbour*.* We assume that these pay-offs represent small increments in fecundity, small enough that we can ignore second-order effects (w-weak selection as in [6]). These pay-offs are added to a baseline fecundity of 1 to give each individual a *relative fecundity*. To be clear, this assumption of additivity applies to pay-offs received from multiple *A*-nodes. For example, an *A*-node with three neighbours, one of which is *A*, will have relative fecundity 1 + *b* – 3*c*.

At the end of each time interval, the fecundity (probability of giving birth) of each node is determined by an ‘update rule’ and here we will work with a Moran process with either a birth–death (BD) or death–birth (DB) updating [7,8]. Under a BD updating, the birth node is determined at random using the relative fecundities as weights. Under a DB updating, a random (unweighted) node is chosen and the occupying individual is removed. The replacement offspring is chosen at random from the neighbours of the chosen node using the relative fecundities as weights. In other treatments, these pay-offs might also represent increments in survival but we do not consider that here. It turns out that in the class of models we work with here there is a symmetry between the results for fecundity and survival pay-offs [9].

It is clear that as a result of these interactions the fitness effect on each individual will in general be the result of the actions of a number of *A*-nodes. However, Hamilton's inclusive-fitness effect, briefly described above, considers the effects of only one *A*-actor. For this to capture the overall picture, this ‘focal’ actor must somehow represent all actors and for that to happen we need some assumptions of symmetry in the population structure. This question of structure, and the different forms of internal symmetry it might exhibit, is our main objective of study in this paper.

## 3. An example: the 5-cycle

As an example, we consider the cycle graph (figure 1) [10] with both BD and DB updating and fecundity pay-offs. We fasten attention on an *A*-individual at node 0 and examine the inclusive-fitness effect of its behaviour [11]. The primary effects are fecundity gifts of *b* to each of nodes 1 and 4 and a consequent fecundity loss of 2*c* for node 0.

From this point on, the story is different for the two update rules (table 1). Under BD, the gifts represent changes in the probability that the nodes will be selected to reproduce. These fecundity effects are called *primary*, as they are directly connected to the pay-offs. They will, however, lead to additional effects on survival classed as *secondary*. For example, the gain of *b* offspring to node 1 will give nodes 0 and 2 a survival loss of *b*/2 each, and the loss of 2*c* offspring from node 0 will give nodes 1 and 4 a survival gain of *c* each. However, under DB, no reproduction is possible until we have a death. For example, the fecundity gift of *b* to node 1 must await a death from either node 0 or node 2 to be realized, and in that case the competitive effect of that gift will be to reduce the effective fecundity of the node ‘on the other side’. For example, if node 0 dies, the competition to colonize the vacant node is between nodes 1 and 4, so that node 1's *gain* of *b* will create an effective *loss* to node 4's fecundity. Actually, that only affects node 4's reproduction in half of its opportunities; the other half being the death of node 3. Thus, the overall effect on node 4 is –*b*/2. Similar stories apply to the fecundity gift of *b* to node 4 and the fecundity loss of 2*c* to node 0. The take-away message is that the secondary effects of primary effects on fecundity act on survival under BD and on fecundity under DB.

The inclusive-fitness effect of all this is obtained by adding these effects up, each weighted by the relatedness of node 0 to the recipient node. The details are laid out in table 1 and the coefficients of relatedness are calculated in appendix A. The calculations give us the inclusive-fitness effects *W*_{IF} = –*b*/2 – 2*c* for BD updating and *W*_{IF} = *b*/2 – 3*c* for DB updating.

As we have mentioned, *W*_{IF} gives us the sign of the rate of increase of the allele *A* and, for the 5-cycle, the conditions for selection to act to increase the frequency of *A* are
3.1and
3.2Under DB updating, altruism (*b, c* > 0) will be selected if the benefit inflicted on a neighbour is at least six times the cost. However, under BD updating, altruism can never be selected, but spite (–*b, c* > 0) will be selected if the *harm* inflicted on a neighbour is at least four times the cost. It turns out that these are examples of a general result for ‘homogeneous’ populations, those with a special type of internal symmetry called transitivity.

## 4. Transitive graphs

A graph is called *transitive* [12,13] if given any pair *i*, *j* of nodes there is a bijection *T* of the node set mapping *i* to *j*, which preserves all edges (that is *T*(*i*) is a neighbour of *T*(*j*) if and only if *i* is a neighbour of *j*—such bijections are called *isomorphisms*). Clearly our 5-cycle is transitive (the isomorphisms are the rotations) and some additional examples are provided in figure 2. With this assumption, the graph ‘looks the same’ from every node and that allows us to work with a single ‘focal’ node, as we have done in the example above. But a striking result is that in this case of a transitive graph, the condition *W*_{IF} > 0 is independent of the structure of the graph and depends only on the number *N* of nodes and the update rule. For example, for the Moran process with fecundity pay-offs, the inclusive-fitness effect is positive exactly when [9,13–16]
4.1and
4.2where *k* is the degree of each node (the number of neighbours) so that for a cycle graph, *k* = 2. For the case *N* = 5, *k* = 2, we get the conditions above for the 5-cycle. A transparent explanation in terms of ‘circles of compensation’ of why the DB protocol is friendlier to altruism than BD is found in Grafen & Archetti [13]. A further discussion of the relationship between these two protocols is found in Taylor [9]. For example, if pay-offs affect survival rather than fecundity, the transitive population conditions above are reversed: equation (4.1) belongs to DB and equation (4.2) belongs to BD.

It turns out, not surprisingly, that the transitive graphs are precisely those for which the simple (one might say ‘initial’) form of Hamilton's inclusive-fitness analysis works. For these graphs, the fitness effects on the population of the actions of a single focal *A*-individual are able to capture the selective advantage of an *A*-allele. Our objective is to see how inclusive fitness can be generalized to apply to graphs with more complex symmetry patterns. The short answer is that the analysis works fine but that we need to incorporate RV and consider a focal individual for each reproductive class [17].

## 5. Allele-frequency change: the Price equation

Price's [18] covariance formula for the selective allele-frequency change (ignoring mutation) over a single time step can be written as
5.1where *W* is individual fitness effect (the overall fitness effect of all pay-offs), *x* is individual genotype (=1 for an *A*-node and 0 for a *B*-node) and is the population-wide allele frequency (the average genotype). To calculate the overall fitness effect *W _{i}* on each node

*i*, we need to know the

*state*of the population, that is, which nodes are

*A*and which are

*B*and then the fitness effects for each node are calculated as the difference between the fecundity and the mortality effects. For example, for the 5-cycle (figure 1) with BD updating suppose that nodes 0 and 2 are

*A*and the other four nodes are all

*B*. As a result of the assumption of additivity among effects, we can calculate

*W*by adding the separate effects of the behaviour of nodes 0 and 2. The calculation is given in table 2 and the resulting selective change of allele frequency is 5.2where we have used the fact that

_{i}*E*(

*W*) = 0, as the population size is constant.

The calculation in equation (5.2) gives the allele-frequency change in a particular state. However, over time, the population will wander among many possible states each with its own long-term frequency. One might well ask, in this case, how we might construct a reasonable measure of the overall selective advantage of the allele *A*. The generally accepted answer to this was provided in 2000 by Rousset & Billiard: take the average state-specific selective allele-frequency change and then average these over all states, each state weighted by its long-term frequency of occurring. That is not an easy calculation, as the state frequencies are not generally analytically accessible, but under certain general conditions this average can be calculated with the inclusive-fitness effect.

The transition from this average allele-frequency change to the inclusive-fitness effect is technically rather interesting. We summarize the process here and provide the missing calculation in appendix B. For simplicity, we work with a transitive population but we show later how the analysis generalizes to heterogeneous populations. What we need to calculate is
5.3where *π _{s}* is the frequency of state

*s*and cov

*is the covariance over all nodes of the population in state*

_{s}*s.*The notation can get a bit hard to keep track of, so we adopt a notational device found in Taylor

*et al.*[16]. We use round brackets to represent an average or a covariance over all nodes in the population, and square brackets to represent an average or a covariance over all population states. With this notation, equation (5.3) is written 5.4The expression on the right-hand side asks us to calculate the covariance of fitness with genotype over the population in a fixed state and then average these over all states. In this form, the calculation is intractable. The trick, and the essence of Hamilton's 1964 approach, is to argue that under certain assumptions it is permissible to interchange the order of the operations and to calculate instead 5.5The equivalence of (5.4) and (5.5) is established in appendix B. In equation (5.5), we take a fixed node and calculate the covariance between its genotype and fitness over all possible states, and then average this over all nodes.

In the case of a transitive population, the result of this first covariance will be the same for all nodes and the final average is unnecessary. In this case, the formulation for average allele-frequency change becomes
5.6where *i* = 0 is a randomly chosen ‘focal’ node. We see later how to extend this formulation to heterogeneous graphs.

## 6. The inclusive-fitness effect: relatedness

In a graph-structured population, the overall fitness *W _{i}* of each node

*i*will be, to first order in the pay-offs, a linear function of the node genotypes 6.1where

*b*is the fitness effect on

_{ij}*i*of an

*A*-allele at node

*j.*It is important to note that

*b*are independent of state, and in that case they can be pulled out of the square brackets in equation (5.5): 6.2Here, for clarity, we have replaced the random variables

_{ij}*W*and

*x*in equation (5.5) with

*W*and

_{i}*x*and the expectation

_{i}*E*is over all nodes

*i.*The covariance in equation (6.2) takes a fixed pair of nodes and asks how their genotypes covary over the long-term life of the population. And the point is that this can be calculated with a simple argument. As a result of our assumption on the mutation process, the nodes

*i*and

*j*are either identical by descent (IBD) or independent [16]. The probability in the first case is the coefficient of consanguinity (CC)

*G*(appendix A), and in this case the covariance will be the genic variance var[

_{ij}*x*], and in the second case the covariance is zero. Thus, 6.3and 6.4

In the middle term of equation (6.4), the *j*-summation ranges over the actors and the final expectation is over all recipients *i.* The final term moves towards an inclusive-fitness formulation by reversing them so that the summation ranges over the recipients and the expectation is taken over the actors [17,19].

The *G _{ij}* are sometimes used as coefficients of relatedness, and indeed in infinite population models they do play that role, though under diploidy they should be normalized by dividing by

*G*the CC of the actor with itself [20]. However in finite population models, they should not, strictly speaking, be considered as relatedness coefficients. For one thing, they are always greater than or equal to 0, and in finite populations altruism and spite are often studied [21] and negative relatedness plays an important conceptual role. In the literature, there seems to be a diversity of candidates for relatedness in finite population models [4,22–24]. Such formulations subtract from

_{jj}*G*an average value calculated over some reference population, so that the relatedness of actor

*j*to recipient

*i*has the form 6.5where the denominator ensures that the focal relatedness to itself is 1. This

*G-*average might be calculated over the whole population or over an ‘economic neighbourhood’ [25]—the set of nodes that experiences the negative effects of the primary pay-offs. With a graph structure, such a node set is not so easy to get hold of, as the secondary effects are not uniform. In a transitive population, a standard normalization takes to be the average CC of the focal actor to the whole population; an advantage of that is that the focal node has average relatedness zero to the population.

As we have assumed the population size is constant, the sum of all fitness effects on any actor *j* must be zero: and we can (except for the denominator in equation (6.5)) replace *G _{ij}* in equation (6.4) by

*R*. We define the inclusive-fitness effect to be 6.6and then from equation (6.4) 6.7so that the inclusive-fitness effect has the same sign as the long-term average one-step change in allele frequency.

_{ij}As above, in a transitive population, the summation in the round brackets of equation (6.6) will be the same for each node *j*, and we can take a focal actor and eliminate the expectation:
6.8Here *b _{i}* is the fitness effect of the focal actor on recipient

*i*and

*R*is the probability the focal actor is IBD to recipient

_{i}*i.*We now look for an analogue of equation (6.8) in a heterogeneous population.

## 7. Heterogeneous populations: reproductive value

The RV of a node is roughly defined as its expected contribution to the future gene pool of the population. Fisher [26] was perhaps the first to understand that individuals with different RV need to be treated as belonging to different ‘reproductive classes.’ As an illustration, he produced a plot of RV against age in humans (p. 28) and pointed out that ‘The value shown is probably correct … for assessing how far it is worthwhile to give assistance to immigrants in respect of infants … for such infants will usually emigrate with their parents’ (p. 29).

Possibly the first appearance of RV in a calculation of inclusive fitness is found in Hamilton's [27] work with altruistic behaviour in eusocial species, in which, in the case of the haplodiploid genetic system, the diploid females have twice the RV of the haploid males. In this paper, Hamilton corrects (p. 204) an error he made in his 1967 paper on extreme sex ratios in which a neglect of this RV ratio led him to an incorrect value for the evolutionarily stable strategy sex allocation under haplodiploidy (p. 485). To our knowledge, this is the first example of an inclusive-fitness calculation in a heterogeneous population in which the wrong result is obtained if RVs are not used.

In the post-Hamiltonian world, it was assumed, without much in the way of formality, that inclusive-fitness arguments need to take account of variations in RV—after all, it is surely the future gene pool that counts [28]. As an ‘obvious’ example, consider a mother feeding her nestlings giving total benefit *b* at cost *c*. Suppose that half her nestlings will die. Suppose that we are at the point where the mother can tell which half will die. Then, we might imagine two alternative strategies: feed only those who will live and feed only those who will die. An inclusive-fitness calculation that did not take account of RV would provide the same condition for both cases, but clearly they have very different implications for gene-frequency increase. Later (tables 3 and 4), we provide specific results for the *N* = 3 star graph.

Another early example is found in West Eberhard's [29] suggestion that altruistic acts by individuals of low RV towards related individuals of high RV are more likely to be selected for than is altruism in the reverse direction. This is taken from Charlesworth & Charnov [30] who provided the first rigorous connection between RV-weighted fitness effects and gene-frequency increase with an inclusive-fitness model in an age-structured population. A nice example of the deployment of RVs in a class-structured population (in which there are breeders, helpers and waiters) is found in Pen & Weissing [31].

In the more recent graph-structured evolutionary models, the class structure derives not so much from different roles (male–female, parent–offspring, breeder–worker) as different edge configurations, but the same general considerations apply. To define these classes, we return to our concept of graph isomorphism, and define the reproductive classes to be the orbits under the set of all graph isomorphisms. That is, two nodes are in the same class if there is an isomorphism mapping one to the other. Thus, a transitive graph has only one reproductive class; otherwise, there is more than one class and the graph is called heterogeneous (figure 3).

Our rough definition above of RV seems to neglect mutation. In our finite graph model, the long-term future of a population without mutation is simply fixation of one of the current genes, and thus our definition of RV is equivalent to fixation probability. Place an allele *A* at node *i* with *B* at all other nodes. Let the population run with neutral selection (all pay-offs zero) and no mutation. Then, the RV *v _{i}* of node

*i*is the probability that the population will become fixed in a pure

*A*state. With this definition,

*v*are normalized to have sum 1.

_{i}It should be clear that nodes in the same reproductive class will have the same RV, but the converse is not true. In fact, it has been shown [34] that the RV of a node is proportional under DB updating to its degree and under BD updating to the reciprocal of its degree. Note that this makes qualitative sense. Under DB, it is good for a node to have high degree, as this provides many opportunities for reproduction; conversely, under BD, it is good for a node to have low degree, as this reduces the probability of death. But note that degree does not determine class, as seen, for example, in the Frucht graph [33] (figure 3*c*). Here, all nodes have the same degree, and hence the same RV but they are all in different reproductive classes. The same behaviour at two such nodes will have the same effect on the population allele frequency, but in other ways can affect the population differently.

## 8. The heterogeneous price equation

Now let us return to the Price equation for the one-step selective allele-frequency change. We must first ask, in a heterogeneous graph, what we should even mean by allele frequency. A gene on a more valuable node will contribute many more copies of itself to the future than a gene on a less valuable node. It would thus appear that any measure of the ‘effective’ frequency of an allele *A* should take account of the value of the nodes it occupies. That has prompted the definition of *RV-weighted allele frequency* in any state *S* as [35–37]. A closely related way to realize the vector is as a left eigenvector. Regard as the total RV of all *A-*genes in state *S*. Under neutrality, any copy of *A* must hold its RV over each update step, and that gives us a system of forward recursive equations for . Now those equations, when framed in vector-matrix form, tell us that is the left eigenvector of the state transition matrix *A* at neutrality [35].

To calculate the one-step selective change in fitness effects must be RV-weighted, that is, when an individual dies the RV of its node is lost and when an offspring is born the RV *of the node it occupies* is gained. Using for this RV-weighted form of fitness, Price's formula for average RV-weighted allele-frequency change is
8.1This formula calculates the one-step selective change in RV-weighted allele frequency of the population in any particular state. Its long-term average is
8.2and this is the general version of equation (5.4).

## 9. Heterogeneous inclusive fitness

Our transformation above of Price's formula into an inclusive-fitness effect will also work with RV-weighted allele frequency. In particular, the argument in appendix B which leads to equation (5.5) remains valid with *W* replaced by The analogue of equation (6.4) is
9.1where is the effect of an *A*-allele at node *j* on the RV-weighted fitness of individual *i*, and, following equation (6.6), the inclusive-fitness effect is more generally defined as
9.2where *R _{ij}* is again given by equation (6.6) with some suitable state-independent average CC value. We return to this in the Discussion.

Now we generalize equation (6.8). In a heterogeneous population, the expectation in equation (9.2) will be the same for nodes *j* in the same reproductive class so that we can in fact take the expectation over each class of actor using the relative class size as a weight. We get
9.3where *N _{J}* is the size of class

*J*, is effect of an

*A*-allele at a class

*J*focal node on the RV-weighted fitness of individual

*i*, and

*R*is the relatedness of the class

_{iJ}*J*focal actor to recipient

*i.*One further comment: the

*J-*sum in equation (9.3) needs only be taken over these classes, which exhibit the behaviour being studied. We return to this in the Discussion.

## 10. Example. The star graph with *N* = 3 nodes

The graph (figure 4) has a hub and two leaves; the hub is connected to both leaves such that in each time step the hub has two interactions and each leaf has one. Offspring dispersal from a leaf is only to the hub and from the hub is to each leaf with neutral probability 1/2. There are clearly two reproductive classes, class *H* containing the hub and class *L* containing the two leaves. In tables 3 and 4, we present the inclusive-fitness analyses for both BD and DB updating. Both follow the pattern found in the homogeneous case of table 1 with a few notable differences.

—

*Focal actors.*As we have two reproductive classes, we need two focal actors, a hub actor and a leaf actor. We do the analysis for each actor class separately and then add the results with each class weighted by its size (the number of actors in the class).—

*RV.*Fitness effects are weighted with RV—a death weighted by the RV of its node and a birth weighted by the RV of the node the offspring occupies. The way we have set up the calculations facilitates this weighting: each fitness effect occupies its own line and displays both the primary and secondary relatedness coefficients. The RVs differ for BD and DB updating, being proportional to either the degree (DB) or the reciprocal of the degree (BD) of the node [34].—

*Relatedness.*Perhaps the first thing to note is that the relatedness coefficients change between BD and DB updating. This is not the case in a homogeneous graph. The difference can be seen in the structure of the recursive equations. To form a recursive equation for*G*, the CC between nodes_{ij}*i*and*j*, we look at where the most recent offspring replacement came from. In a homogeneous graph, the symmetry between nodes*i*and*j*means that we need only look at one of them. But in a heterogeneous graph, not only will the new CC depend on which node is replaced (dies) but the death probability can also be different between the nodes. Take for example, the CC*G*_{1}between hub and leaf. Under DB (table 4), the two nodes have equal probability of death, but under BD (table 3) the hub is replaced four times as often as is the leaf, giving us probabilities of 4/5 and 1/5 in the*G*_{1}recursion.—

*Secondary effects.*As is the case for homogeneous graphs, when the primary effects are on fecundity, the secondary effects act on survival under BD and on fecundity under DB.

## 11. Discussion

### (a) Class-specific behaviour

In early models of heterogeneous populations, the classes were role based and actors typically belonged to only one class. More precisely, even when studying interaction between classes the behaviours belonged to different sets of genes. For example, in parent–offspring conflict the genes for offspring begging were different from the genes for parental resistance. In studying the coevolution of these traits, we would work with two inclusive-fitness effects, *W*_{IF}(P) and *W*_{IF}(O), each depending upon the level of behaviour of both, and we would set them both to zero and solve this system for the equilibrium for both behaviours. Thus, the problem of how to combine the two effects into one did not arise. Indeed, both the general class-structured analysis in Taylor & Frank [17] and the model of helping at the nest of Pen & Weissing [31] assumed that there was only one actor class, although recipients belonged to different classes. Even in more general studies of altruism, one might generally suppose that the altruism of a parent was different from that of an offspring.

In evolutionary graph studies, it has been often the case that all nodes had the same behavioural repertoire and this was determined at the same locus or loci. In that case, the inclusive-fitness effects belonging to different actor classes would be added together, weighted by class size (equation (9.3)) and that is the assumption presented in tables 3 and 4. One can of course suppose that facultative behaviour is possible and a hub-dweller might want to use a different level of altruism (or spite) from a leaf-dweller. In that case, we would have a system of two interacting equations as in the parent–offspring example above.

### (b) The calculation of *W*_{IF}

The calculations we have presented here are ‘heuristic’ and a careful analysis is needed to justify them. For example, our analysis presented in table 1 of the 5-cycle (figure 1) worked directly with the fecundity effects *b* and *c* rather than using the realized fecundity (the probability of reproducing). Is that valid? Suppose that with BD updating node 1 gets a fecundity gift of *b* and all others have only their neutral fecundity. In that case, the realized fecundity (probability of reproducing) of node 1 is to first order in *b*, while the other four nodes have realized fecundity , again to first order. Thus, the effective fecundity increase to node 1 is only 4*b*/5 while the others have an effective loss of *b*/5. The adjustment can be regarded as a secondary *fecundity* effect of the original gift, but in our analysis of table 1 we do not bother to account for that. In fact it turns out that, in the analysis, one does not need to take account of effects which apply equally to all individuals in the population, but one needs to ‘know’ that this is the case or else at some point to do a calculation to verify it. It turns out that in a heterogeneous population this principle, that one does not need to account for effects that apply to all individuals, need *not* hold if fitness effects are not weighted by RV and it is the RV-weighting that allows us to ignore such effects [35]. The take-home message is that there is a ‘lore’ surrounding the formulation of inclusive fitness and one must think carefully. Hamilton himself (PDT Hamilton 1987, personal communication) said that he is not absolutely sure an inclusive-fitness calculation is correct until he can check his calculations by making them another way.

### (c) Relatedness

What should we use for coefficients of relatedness in a heterogeneous graph? The problem is what to choose for in equation (6.5). In a homogeneous population, a standard choice is the average CC of a focal node to the whole population, but in a heterogeneous population this will generally be different for focal nodes in different classes. If the behaviour is expressed by only one class *K*, then the specification:
11.1for *j*, a focal class *K* node [37], will serve well where is the average CC of a focal node of class *K* to the whole population. In this case, the average relatedness of a class *K* actor to the whole population will be zero. But if there are several classes acting from the same allele, we cannot use these class-specific coefficients and still combine the contributions of different classes of actors in the simple additive way given in tables 3 and 4. A reasonable solution is to use the average of the and define
where *N _{K}* is the size of class

*K*. This is what we have done for the calculations presented in tables 3 and 4 (appendix A).

### (d) Inclusive-fitness effect and fixation probability

The fixation probability *ρ _{A}* (or

*ρ*) of the allele

_{B}*A*(or

*B*) is the probability that an

*A*-mutation (or

*B-*mutation) arising in a pure

*B-*state (or

*A*-state) will colonize the next pure state. In a homogeneous (transitive) population, it has been shown [12,22] that the inclusive-fitness effect of the allele

*A*has the same sign as the difference

*ρ*–

_{A}*ρ*in the fixation probabilities, but as yet no parallel result has been obtained for heterogeneous populations. Tarnita and Taylor have a forthcoming paper that investigates this interesting question.

_{B}### (e) Inclusive fitness at age 50

The theoretical development of inclusive fitness over the past 50 years has enhanced and sharpened many of its positive features and has clarified its limitations. The essence of the method is to turn the standard (direct) way of calculating allele-frequency change inside–out. For example, Price's equation (equations (5.1) and (8.1)) adds up the fitness effects on a focal allele of the behaviour of many actors, whereas inclusive fitness adds up the behavioural effects of a focal allele on the fitness of many recipients, each effect weighted by a measure of genetic closeness to the actor. A powerful consequence of this point of view is its construction of a ‘maximizing agent’ [38], whereby an actor is able to attach to its own fitness various fractions of the effects of a proposed action, and thereby obtain a quantity that its behaviour should maximize.

For us, the most interesting challenge that inclusive fitness has faced in its 50-year journey has been the incorporation of population structure, and this is the main theme of our paper. Already in 1964 Hamilton speculated that population structure could well have a moderating effect on the fitness benefits and costs of a trait, particularly in what he called ‘viscous’ populations [1,39] but it was only many years later [25,40–42] that a more formal study of the ‘secondary’ effects of population structure began. Later still, a formal distinction between homogeneous and heterogeneous structures was made and it was realized that general results on evolutionary stability could be formulated for all homogeneous populations [13,16]. These results also suggested how a class-structure could be formally defined in abstract populations (graphs) that mirrored in some sense the traditional structure based on ‘roles’ (male–female, parent–offspring, breeder–helper) and that Fisher's original concept of RV can be incorporated into the formulation of inclusive fitness such that it is able to predict changes in allele frequency under heterogeneity [17]. Currently, the effect of heterogeneity on the formulation and nature of inclusive fitness is one of its most active areas of investigation and it is giving us new insights into the evolutionary process.

## Funding statement

P.D.T acknowledges support from the Natural Sciences and Engineering Research Council of Canada.

## Acknowledgements

The authors thank Corina Tarnita for insightful comments that helped in shaping this work. Two anonymous reviewers provided hard-hitting and well-deserved valuable comments.

## Appendix A. Relatedness

The coefficient of consanguinity (CC) between two nodes is the probability that they are identical by descent (IBD), that is, that they are descended from a common ancestor without an intervening mutation. To first order in the mutation rate *u*, these are expected to have the form *G* = 1 – *gu*, where *g* is the coefficient of the first-order term in *u*, as when *u* = 0, the population will drift to a state in which all nodes are IBD.

A significant property of *G* is that it can be readily calculated recursively and examples are found in tables 1, 3 and 4. (But see Maciejewski [43] for an interesting alternative approach that uses established results from the theory of random walks.) The equations are obtained by asking for the *G*-coefficient just before the most recent replacement affecting the pair. For example, for *G*_{1} in the 5-cycle (below), the replacement offspring came from one of the pair or from a neighbour on the other side, each with probability 1/2 such that the average previous coefficient was (*G*_{0} + *G*_{2})/2. Of course, that applies only if the replacement did not mutate, and in that case the coefficient is zero.

An important remark is that these recursive equations do not take account of small selective variations in fecundity or survival, that is, they assume a neutral distribution of alleles. That introduces an error in the coefficients, which is first order in the pay-offs *b* and *c*, but it is clear from the form of the equation for the inclusive-fitness effect *W*_{IF} that the resulting error will then be of *second* order and can be ignored.

**(a) The 5-cycle (figure 1 and table 1)**

Let *G _{k}* (and

*R*) be the CC (and relatedness) between nodes at distance

_{k}*k.*We have the recursive equations These solve to give

*G*

_{1}= 1–4

*u*and

*G*

_{2}= 1–6

*u*.

Then,

Finally, giving us *R*_{0} = 1, *R*_{1} = 0, *R*_{2} = –1/2.

**(b) The N = 3 star (figure 4)**

Let *G*_{1} (and *R*_{1}) be the hub–leaf CC (and relatedness) and *G*_{2} (and *R*_{2}) be the CC (and relatedness) between leaves.

**(i) Birth–death updating (table 3)**

The recursive equations are

These solve to give and Then, the average CCs are
Finally, , giving us *R*_{0} = 1, *R*_{1} = –5/16, *R*_{2} = –7/8.

**(ii) Death–birth updating (table 4)**

The recursive equations are

These solve to give and .

Then, the average CCs are
Finally, , giving us *R*_{0} = 1, *R*_{1} = –1/4, *R*_{2} = –1.

## Appendix B

Let fitness *W* and genotype *x* depend on both node *i* and population state *S*. We use round brackets to represent an average or a covariance over all nodes in the population, and square brackets to represent an average or a covariance over all population states *endowed with the neutral distribution*.

### Theorem B.1.

*Suppose that for each node the genotype value averaged over all states with the neutral distribution is p _{N}*:

*E*[

*x*] =

*p*,

_{N}*and in each state S*,

*the average fitness effect is zero*:

*E*(

*W*) = 0.

*The latter will hold in a population of constant size. Then*,

### Proof

## Footnotes

One contribution of 14 to a Theme Issue ‘Inclusive fitness: 50 years on’.

- © 2014 The Author(s) Published by the Royal Society. All rights reserved.