## Abstract

In natural populations, dispersal tends to be limited so that individuals are in local competition with their neighbours. As a consequence, most behaviours tend to have a social component, e.g. they can be selfish, spiteful, cooperative or altruistic as usually considered in social evolutionary theory. How social behaviours translate into fitness costs and benefits depends considerably on life-history features, as well as on local demographic and ecological conditions. Over the last four decades, evolutionists have been able to explore many of the consequences of these factors for the evolution of social behaviours. In this paper, we first recall the main theoretical concepts required to understand social evolution. We then discuss how life history, demography and ecology promote or inhibit the evolution of helping behaviours, but the arguments developed for helping can be extended to essentially any social trait. The analysis suggests that, on a theoretical level, it is possible to contrast three critical benefit-to-cost ratios beyond which costly helping is selected for (three quantitative rules for the evolution of altruism). But comparison between theoretical results and empirical data has always been difficult in the literature, partly because of the perennial question of the scale at which relatedness should be measured under localized dispersal. We then provide three answers to this question.

## 1. Introduction

Many of the behaviours expressed by an individual during its lifetime are statistically influenced by its genes. The change in allele frequency in a population over time is generally referred to as evolution (Fisher 1930; Wright 1931; Haldane 1932). There are four fundamental evolutionary forces resulting in a change in allele frequency: natural selection, which favours those genes conferring to their carriers higher vital rates (fecundity and/or survival; Caswell 2000) than alternative genes; random genetic drift, which results in fluctuations of allele frequency owing to sampling effects in finite populations; recombination, which reshuffles genes within individuals; and mutation, which introduces new genetic material into the population (Crow & Kimura 1970; Bürger 2000; Kirkpatrick *et al.* 2002; Ewens 2004). Understanding the ultimate factors driving the evolution of a behaviour boils down to understanding how the demographic forces (selection and genetic drift) and the organismal ones (recombination and mutation) interact to drive the changes in the gene pool underpinning the behaviour and how the resulting changes feed back on the evolutionary forces themselves.

Most natural populations do not consist of a randomly mixing gene pool. Instead, they tend to consist of a series of demes connected by dispersal, the level of which depends on the geographic distance and the environmental conditions between demes. Such population subdivision has important consequence for the evolution of behaviours and other phenotypes. The change of allele frequency in the population then depends on the interactions between the evolutionary forces at a local scale (the scale of the deme when space is discrete), instead of the forces interacting at the global, total population scale, with dispersal tuning the magnitude of this effect.

Many of the behaviours expressed by one individual also affect the vital rates of others. Such traits are called ‘social traits’ in evolutionary biology and were classified by Hamilton (1964*a*, 1970) into four categories: selfishness, spite, cooperation and altruism (see also the introduction to this volume, Brosnan & Bshary 2010). In a subdivided population, where local population size tends to be small, essentially any behaviour expressed by one individual is likely to affect the vital rates of another. This is a consequence of the fact that resource availability follows a conservation law, implying that the gains or losses in resources to one individual are balanced by the losses or gains to others. Most life-history behaviours, such as dispersal, sex ratio or senescence, may then have a social component.

In natural populations, the vital rates of one individual are thus likely to depend on the phenotype of others and, therefore, on the distribution of genotypes within and between demes. The force of directional selection on an allele affecting a social behaviour will thus be determined by how the evolutionary forces interact at a local scale. In order to understand how evolution shapes sociality, it is thus necessary to understand how life-history (or life cycle) features affect this local interaction.

A particular class of social behaviours has received a lot of attention over the last decades: helping behaviours by which individuals tend to increase the vital rates of recipients (cooperation and altruism). In this paper, we first recall the main theoretical concepts usually used to formalize the evolution of helping behaviours in the presence of local interactions. We focus on inclusive fitness theory as it allows us to conveniently address the evolution of the diversity of social traits considered by behavioural ecologists. We then discuss how life cycle features promote or inhibit the force of directional selection (inclusive fitness effect) on helping and compare the outcomes for the evolution of this trait for a large number of models developed over the last decades.

This analysis leads us to distinguish three types of quantitative outputs for the selective pressure on costly helping behaviours (altruism), which are characterized by the critical benefit-to-cost ratio beyond which helping is selected for in evolutionary models (three quantitative rules for the evolution of helping). While we highlight the effect on selective outcomes of varying the assumptions of various models, we do not discuss here directly the empirical relevance of endorsing different assumptions. Although different empirical studies favour different scenarios, we find it difficult to reach firm conclusions for each empirical model and the more so to obtain a global picture. However, we discuss how the relatedness coefficients involved in inclusive fitness calculations can be estimated empirically under localized dispersal, and provide three answers to the perennial question of the scale at which relatedness should be measured.

Although we focus on the consequences of limited dispersal, family structured populations involving a stage of complete dispersal, and which have often been the main focus for understanding the evolution castes in insects and communal breeding (Wilson 1975; Bourke & Franks 1995; Clutton-Brock 2002), can be seen as special cases of spatially structured populations. Hence, the arguments developed below can be thought to apply to both family and structured population settings and can be extended to essentially any trait, as substantial literature on sex ratio (e.g. Hardy 2002; West 2009), dispersal (e.g. Ronce 2007) or foraging (e.g. Giraldeau & Caraco 2000) demonstrates.

## 2. Essential biological features

In order to discuss the factors promoting or inhibiting the evolution of helping behaviours, we assume that the population consists of a discrete number of individuals, which reproduce at different positions in space. The population can typically be envisioned as a certain number (possibly infinite) of demes located in a one-, two- or three-dimensional habitat, where each deme consists of one or more individuals. We consider that there are three types of baseline biological events that affect the individuals in this population:

—

*Reproduction and survival*. Each adult individual in the population may reproduce and the number of offspring produced by an individual is a variable that can take different values (i.e. a random variable). Hence, the number of offspring produced by an individual follows some probability distribution, for instance, a Poisson or a negative-binomial distribution. After reproduction, an individual may either die or survive to the next reproductive period and survival induces overlapping generations.—

*Competition*. Resources come in finite supply so that competition for resources used for reproduction and survival occurs between the individuals within and/or between demes. Competition may occur for abiotic resources (space) or for biotic resources (those that can be transformed into gametes). The main consequence of competition is that the population is regulated at some point or another during the life cycle.—

*Dispersal*. Each individual, adult or newborn, may either stay in its deme (natal deme for the offspring) or disperse to another spatial position where it may or may not reproduce. The distance of dispersal from the current to the new spatial position follows some distribution; for instance, a geometric or an exponential distribution if dispersal is localized. Dispersal results in gene flow in the population and it shifts competition from being local to being global. Dispersal, therefore, tends to reduce competition between neighbours.

The reproduction, survival, competition and dispersal events experienced by an individual may all depend on its interactions with others. Hence, the fitness of a focal individual, which is defined here as the total number of its descendants after one full iteration of the life cycle of the organism (thus including itself through survival and its offspring in order to have a full count of genotype frequencies over one life cycle iteration), depends not only on the focal individual's phenotype (and thus genotype) but also on the phenotype of others. Understanding how behavioural effects translate into allele dynamics thus requires a careful account of how such effects convert into fitness costs and benefits. To that aim, we now introduce the notions of fitness function, gradient of selection, relatedness and local competition.

## 3. Theoretical survival kit

### (a) Selection strength and gene action

Since we are mainly interested in the effect of life history and demographic features for the evolution of a focal phenotype (e.g. provisioning of care to offspring, probability of becoming a worker, strategy in a multimove game, learning rule for imitating neighbours, etc.), we endorse the most minimalist genetic assumptions. In particular, we consider that a single locus controls the expression of the focal phenotype, that gene action is additive and that only two alleles segregate in the population. Those individuals that carry a mutant allele express a mutant phenotype denoted *z*_{•}, while those individuals that carry a wild-type, resident allele express a phenotype denoted *z*, whose magnitude differs from that of the mutant phenotype (a list of symbols is given in table 1).

The above assumptions are implicit in most models of social evolution considered by behavioural ecologists, which, therefore, ignore (for worthy reasons) the complexity introduced by adding recombination and mutation. These assumptions allow one to explicitly evaluate the evolutionary dynamics of a focal phenotype in the presence of the demographic forces (selection and genetic drift) under a very large class of biological scenarios involving social interactions. Moreover, models involving phenotypic gradient approximations, where gene action is additive, remain often the most useful simple approximations for evolution of traits with a multilocus genetic basis.

In the light of the continued confusion about inclusive fitness theory, it is worthwhile to emphasize that the above assumptions are not integral to the theory, which can actually take into account any strength of selection and gene action (Queller 1992; Frank 1997; Gardner *et al.* 2007; Roze & Rousset 2008); rather, they are only the most useful simplifications used by behavioural ecologists. Further, the concepts and techniques reviewed below remain quite useful when the assumptions are relaxed, such as when there is dominance in diploid populations, a multilocus basis of the trait or stronger selection is considered, and various forms of frequency dependence result from the departures of the simplest assumptions (Ajar 2003; Roze & Rousset 2004; Lessard & Ladret 2007; Lehmann *et al.* 2007; Rousset & Roze 2007).

### (b) Notions of fitness function and selection gradient

#### (i) Fitness in a panmictic population

Suppose that the phenotype under focus represents the expression of an act of helping, which reduces the level of competitiveness (varying between 0 and 1) placed into the extraction of a common resource. Higher competitiveness is assumed to result in a cost to others because it may cause fights or scrambles between interactants (e.g. social carnivores fighting over a kill). We consider that the number of offspring produced by a focal individual is given by *z*_{•}(*K* − *z*), where *K* is a constant. Thus, the number of offspring produced increases with the level of competitiveness *z*_{•} of the focal individual and decreases with the average level of competitiveness *z* expressed by the individuals in the population.

If the population is panmictic and of constant and very large size (say infinite), the fitness of a focal individual is given by the expected number of offspring it produces, *z*_{•}(*K* − *z*), relative to the expected number *z*(*K* − *z*) of offspring produced by an average individual in the population:
3.1
which is equal to unity when everybody carries the same phenotype (when *z*_{•} = *z*).

#### (ii) Selection gradient in a panmictic population

The change in the frequency *p* over one generation of the mutant allele (frequency of individuals expressing phenotype *z*_{•}), which results in a small phenotypic deviation * δ* relative to the phenotype expressed by individuals carrying the resident allele, and causes selection to be weak, can be written as
3.2
where

*p*(1 −

*p*) is the genetic variance in the population and

*S*(

*z*) is the force of directional selection on the phenotype (selection gradient), which is frequency independent in the presence of additive gene action (Rousset 2004). The selection gradient is given by

*S*(

*z*) = ∂

*w*/∂

*z*

_{•}, the partial derivative of the fitness function with respect to the mutant phenotype evaluated at the resident value (at

*z*

_{•}=

*z*). Hence, the fate of a mutant allele depends only on the effect of its expression by the carrier on its fitness and its speed of advance depends on the genetic variance in the total population.

By a gradual, step-by-step transformation caused by the successive invasion of mutant alleles resulting in different phenotypic values from resident alleles fixed in the population, the focal phenotype will progressively converge to an equilibrium point (e.g. Eshel 1996; Geritz *et al.* 1997); namely, a candidate evolutionarily stable strategy (Maynard-Smith 1982). For the example of competitiveness, the selection gradient is positive for all values of *z* between 0 and 1: *S*(*z*) = 1/*z*. Assuming that the ecological dynamics reach an equilibrium before new mutations arise (e.g. Vincent & Brown 2005), successive invasion of mutants will then cause competitiveness to increase until the point where the population will eventually go extinct because the fecundity of individuals is lower than unity (*K* − *z* < 1).

The competitiveness example shows that, even in an elementary scenario driven by frequency-independent selection at the genetic level, the mean fecundity of the population does not increase with time. Further, despite the fact that in every generation, individuals with higher fitness better transmit their genes to the next generation, selection does not increase the mean fitness of individuals in the population over evolutionary time. Claims to the contrary must refer to concepts of fitness other than the number of settled offspring measured by *w*.

### (c) Notions of inclusive fitness effect and relatedness

#### (i) Fitness in a structured population

In the competitiveness model, the selection gradient on the phenotype *z* depends only on the change in the fitness of a focal individual resulting from them expressing the mutant allele, although the phenotype under focus has a social component. We now introduce the concept of inclusive fitness effect, where various categories of actors expressing the mutant allele may change the fitness of a focal individual. To that end, we introduce a reference life cycle, where individuals live in a population with an infinite number of demes, each of finite size *N* (Wright's infinite island model, Wright 1931), and where social interactions occur between individuals within demes (Taylor 1992*a*). Each individual in a deme produces a large number of offspring (ideally infinite), offspring disperse independently of each other with probability *m* to some new random deme. In each deme, only *N* offspring reach adulthood.

Individuals that bear the mutant allele express an act (or a series of actions during their lifetime) that reduces their reference fecundity by some cost *C*, and which increases the summed fecundities of their neighbours by *B*. Importantly, both *C* and *B* can take both positive and negative values, and we refer to §7 for a more detailed interpretation of these two variables. A focal individual then produces a relative number 1 + *Bz*_{0} − *Cz*_{•} of offspring, where *z*_{0} is the average phenotype in the focal deme, excluding the focal individual. A fraction 1 − *m* of these offspring remain philopatric and then compete with (1 − *m*)[1 + (*B* − *C*)*z*_{0}^{R}] juveniles produced in the focal deme, where *z*_{0}^{R} = *z*_{•}/*N* + (*N* − 1)*z*_{0}/*N* is the average phenotype in the focal deme including the focal individual, which takes into account the fact that the focal individual contributes to focal patch productivity in proportion to 1/*N*.

The focal individual's philopatric offspring compete against (a relative number) *m*[1 + (*B* − *C*)*z*] immigrant juveniles, where *z* is the average phenotype in the population. Finally, a complementary fraction *m* of the offspring of the focal individual disperse, in which case they compete only against juveniles produced in other demes by individuals with phenotype *z*. Collecting all terms then gives the fitness of the focal individual as a function of all phenotypes
3.3

Comparing this fitness function with equation (3.1) illustrates that it depends not only on the phenotype *z*_{•} of the focal individual and the average phenotype *z* in the population, but also on the average phenotype *z*_{0} of the neighbours of the focal individual in its deme.

Regardless of the exact demographic assumptions, when dispersal is limited and demes are of small size, genetic drift will result in fluctuations of allele frequencies within demes. Two individuals from the same deme are then more likely to carry the same genotypes (and thus express similar phenotypes) than are two individuals sampled from different demes. In other words, relatedness between group members is likely to build up (Hamilton 1971). This consequence of local genetic drift must be taken into account when evaluating the force of directional selection on the focal phenotype.

#### (ii) Inclusive fitness effect

The change in the frequency *p* over one generation of the mutant allele can now be written as
3.4
where
3.5
is the so-called inclusive fitness effect, and it depends on three terms (Hamilton 1964*a*, 1970). First, the change − *c* = ∂*w*/∂*z*_{•} in the fitness of a focal individual stemming from it expressing the mutant allele during its lifetime, where the derivative is evaluated at the point where all phenotypes are the same (at *z*_{•} = *z*_{0} = *z*). Second, the change in the fitness of the focal individual *b* = ∂*w*/∂*z*_{0} stemming from all its neighbours expressing the mutant allele. Finally, the relatedness *R* between the focal individual and a randomly sampled neighbour from its patch. Equation (3.5) also illustrates that inclusive fitness is a decomposition of (the average) individual fitness of the carrier of some gene into sources of variation given by the gene of the carrier and those of other categories of individuals (note that strictly speaking, inclusive fitness is given by 1 + *δ**S*_{IF}(*z*)).

It follows from equations (3.4) and (3.5) that the mutant allele may invade the population when Hamilton's rule is satisfied: 3.6

Because the inclusive fitness effect (*S*_{IF}) is independent of allele frequency, empirical estimates of *R*, *b* and *c* allow one to assess the direction of selection on a social trait, regardless of the current allele frequencies.

#### (iii) Interpretation of relatedness

The relatedness coefficient, *R*, can be thought of as a ratio of two standardized transmission coefficients. It measures the extent to which the recipient of the act of the focal individual is more likely to transmit the mutant allele to the next generation than an individual sampled at random from the population, relative to the extent to which the actor is more likely to transmit the allele than a random individual (Frank 1998). Relatedness is, therefore, a three-parties concept, involving a focal actor, a recipient and a randomly sampled individual from the population (Grafen 1985).

Relatedness can also be interpreted in two different ways. First, as a correlation, where it is given in terms of the covariance between the mutant allele frequency in a focal individual and that in a recipient relative to the variance in mutant allele frequency in the population. Second, in terms of coalescence events, as the probability that a gene copy from the focal individual, and a gene copy from a recipient of the act, have their most recent common ancestor (coalesce) in the deme of the focal individual.

The classical computation of relatedness from pedigrees rests on a similar interpretation. If fitness (*w*) depends on (say) half-sisters' interactions, then the inclusive fitness effect depends on a relatedness coefficient that depends on half-sister ‘identity by descent’, which can be understood as the probability that gene copies from half-sisters coalesce in their common parent. For more general family relationships, identity by descent is the probability that the gene copies coalesce within the pedigree defined by the relationship considered.

Compared with the classical pedigree relationships, however, it is important to note that both relatedness, *R*, and the fitness function, *w*, depend on life cycle features. In equation (3.3), the fitness depends on the dispersal rate and so will relatedness (see equation (3.7) below).

### (d) Notion of local competition

Is it worthwhile to pay a direct fitness cost in order to help neighbours under limited dispersal? The answer to this question is not straightforward. By helping neighbours to produce more offspring, the intensity of competition experienced by the focal individual's offspring and that of its neigbours is increased. Helping neighbours thus leads to an increase in local competition, here understood as the extent to which an actor and a recipient (or their offspring) are more likely to compete against each other for the same resources than are two adult individuals (or offspring) sampled at random from the population. This tends to inhibit the evolution of helping.

Under the demographic scenario described by equation (3.3), the additional number of offspring produced by neighbours through helping (each weighted by their relatedness to the focal) are exactly offset by the increase in local competition. In order to prove this, one needs to substitute equation (3.3) into equation (3.5) and use the equilibrium value of relatedness for the island model, which is given by Wright's (1951) measure of population structure (*R* = *F*_{ST}). Standard calculations (reviewed for example in Rousset 2004, p. 28) then show, first, that
3.7
which decreases as both *m* and *N* decrease, and, second, that the direction of selection on the mutant allele takes the form
3.8
(see equations (A1)–(A5) of the electronic supplementary material).

Inequation (3.8) shows that helping neighbours is selected for only insofar that the actor's fecundity (number of juveniles produced and counted before the competition stage) is increased as a result of it expressing the mutant allele (Taylor 1992*a*,*b*). Regardless of the level of migration and deme size (value of *R* in equation (3.7)), the focal individual gets no benefits from helping neighbours if the act of helping reduces its lifetime fecundity. Costly helping is thus selected against.

Inequation (3.8) illustrates the general fact that limited dispersal has two major but antagonistic consequences for the evolution of social behaviours (Hamilton 1964*b*; Grafen 1984; Taylor 1992*a*; Queller 1994). First, because social interactions take place between related individuals, organisms may benefit from increasing the vital rates of neighbours. Second, since related neighbours are also more likely to compete for the same local resources, increasing the vital rates of neighbours is likely to hurt those of the focal individual.

## 4. Factors promoting and inhibiting selection on helping

Because the increase in local competition exactly balances out the increase in the benefits to neighbours in Taylor's (1992*a*) model, it provides an ideal reference model for relaxing life cycle assumptions in order to identify those life history and demographic factors that promote or further inhibit the evolution of social behaviours (timing of social interactions, modes of competition and dispersal, social structures, environmental and demographic dynamics, and so on). We now turn to a discussion of the effect of these factors for the evolution of helping.

Although our discussion will focus mainly on qualitative effects, it is useful to gain some quantitative insights into how varying life cycle assumptions affects the selective pressure on helping. We then provide, if simple enough, analytical expressions of the fecundity or survival (vital rates) cost-to-benefit ratio * κ* under which selection favours the mutant allele. That is, we always rearrange Hamilton's rule (equation (3.6)) such that the form of the invasion condition of the mutant allele can be written in terms of changes in vital rates:
4.1
where

*depends on life cycle parameters and can be positive, negative or take the value 0 (as in equation (3.8)). Because*

*κ**B*and

*C*are not the

*fitness*costs and benefits considered in Hamilton's rule, but costs and benefits measured in units of vital rates, care must be taken with the interpretation of the

*coefficient, which may be thought of as a scaled relatedness coefficient, where the effect of competition has been included (Queller 1994). We return to this issue in §5 below.*

*κ*In order to be able to easily identify the factors leading to high and low selective pressures on helping, we also evaluate the * κ* coefficients under the weak migration large population size limit (as

*m*→ 0 and

*N*→ ∞ while holding

*N*

*m*constant), which we refer to as the ‘

*Nm*limit’. In order to facilitate comparison between models, we always consider (unless specified) that

*B*is an effect on neighbours of the focal individual (thus excluding the focal; that is, ‘others-only’ helping; Pepper 2000). We refer to the appendix for a list of the fitness functions (referred to as ‘F: equation AX’ in the main text for equation AX in the appendix) leading to the

*coefficients presented below, and table 2 lists the*

*κ**coefficients evaluated under the weak migration, large population size limit.*

*κ*### (a) Timing of life cycle events

#### (i) Regulation before dispersal

In the reference model (equation (3.3)), density-dependent competition (regulation) occurs after the dispersal of offspring, but it may also occur before their dispersal, or be a mixture of these two cases. When competition occurs only before dispersal, it occurs solely between the individuals from the same deme. A focal individual producing more gametes than another from another deme will not have a higher fitness than the latter, whenever the two individuals have the same productivity relative to their own deme productivity. Because any individual receiving help receives it in the same amount as any other individual in the focal deme, except the focal individual, the relative fecundity of an individual being helped is only increased relative to that of the focal individual, but not relative to that of an individual from another deme. Hence, the contribution of a deme with helpers to the population is not greater than that of a deme with defectors, and helping does not increase the inclusive fitness of a focal individual.

It follows from these considerations that when competition occurs only before dispersal, that is, only at a local scale and that all demes contribute equally to the population, the selective pressure on (unconditional) helping depends only on the direct consequences of the behaviour of the focal individual on its fitness (e.g. −*c*) and not on the indirect effects on the fitness of neighbours (e.g. *b*). Costly helping is then selected against and harming, which reduces the fecundity to neighbours, may be selected for when deme size is small (F: equation (A6), * κ* = −1/(

*N*− 1); Rousset 2004, p. 125).

For costly helping to evolve, some competition (or regulation) must occur at a global scale, between individuals from different demes. Hence, holding everything else constant, some regulation must occur after dispersal. Although this condition is necessary (Wade 1985), the reference model discussed above shows that this is not sufficient (Taylor 1992*a*,*b*), and we now relax further assumptions of this model.

#### (ii) Helping after reproduction and before dispersal

In the reference model, social interactions occur only between the *N* adults in a deme before reproduction and after dispersal and regulation. But social interactions may also occur after reproduction before the dispersal of juveniles; either among the juveniles in a deme, or between individuals of the parental and the offspring generation before the latter disperse and then compete (regulation). Under these two cases, the benefits of helping are directed towards individuals that are on average more related to a focal individual than when helping occurs before reproduction. In effect, a focal individual benefits more from helping its offspring, or its offspring helping him, or even its offspring helping each other, than it benefits from increasing the offspring production of other adults in the focal patch. Because the intensity of local competition is not affected by the timing of social interactions, the selective pressure on helping is increased under this scenario (F: equation (A9), * κ* = 1/

*N*; Taylor 1992

*a*, p. 355).

### (b) Modes of reproduction, dispersal and competition

#### (i) Propagule dispersal: competition between individuals

In the reference model, each individual disperses independently of each other to a new, randomly chosen deme. But individuals might also disperse jointly with other members of their natal deme, which leads to propagule pool or budding dispersal (Slatkin 1977; Clobert *et al.* 2001). In the presence of propagule dispersal, the relatedness between individuals is maintained during dispersal, so the relatedness between group members is likely to be higher under propagule than under independent dispersal. But propagule (budding) dispersal also implies that individuals from the same propagule (bud) are more likely to compete against each other after dispersal for resources or vacant breeding spots. Hence, the benefits to neighbours are not more decoupled from local competition than in the reference model, with the result that propagule dispersal does not in itself promote selection on helping (F: equation (A11), * κ* = 0; Lehmann

*et al.*2006).

#### (ii) Propagule dispersal: competition between groups

Individuals might not only disperse as a group but may also compete as a group against other groups for access to whole group breeding spots (competition occurs *stricto sensu* between groups). The winners of such group contests can then occupy whole demes. If propagule dispersal is coupled with competition between buds (or propagules), then competition within groups is greatly reduced because individuals only compete against other individuals from other groups. If demes of helpers produce more propagules than demes of defectors, then helping can invade the population. In the absence of dispersal between demes, relatedness within groups will take its maximum value of unity and groups can be seen as functioning like clones. Because local competition is not increased as a result of the expression of helping, this biological scenario may lead to the strongest possible selective pressure on helping (F: equation (A14), Nm: * κ* = 1/(1 + 2

*N*

*m*); Gardner & West 2006; Lehmann

*et al.*2006; Traulsen & Nowak 2006).

#### (iii) Selective emigration

Benefits to neighbours and local competition are also decoupled when helping specifically affects the number of emigrant juveniles produced (but not philopatric ones) and that dispersers compete only with dispersers from other demes. This results in ‘selective emigration’ (Rogers 1990), where groups with more helpers produce more dispersers but not more philopatric individuals. Selective emigration may occur if helping specifically increases the survival rates of dispersing progeny. As was the case for competition occurring only between propagules, this process does not affect the level of local competition, with the result that the selective pressure on helping is increased relative to that occurring in the reference model (F: equation (A16), * κ* = (1 −

*m*)

^{2}/{(2 −

*m*)(

*N*− 1)}, Nm:

*→ 1/2*

*κ**N*; Rogers 1990, p. 402).

#### (iv) Variance in vital rates

In the reference model, the coalescence rate per generation, which increases relatedness, is equal to the inverse of the local census size (i.e. 1/*N* in equation (3.7)). It might be felt that relatedness may further increase if the local effective size is lower than the census deme size. This may occur if the variance in fecundity or mating is in excess of a Poisson distribution, for instance, because the mating system is skewed or females have a high variance in fecundity (note that fecundity in the reference model follows a Poisson distribution, either with infinite mean or with finite mean, where in the latter case the concomitant demographic fluctuations are neglected). Importantly, such features will not only affect the dynamics of relatedness, *R*, but also the expression of the fitness function, *w*, which depends on the variance in vital rates (Gillespie 1975, 1977). An increase in the fecundity variance may then increase the selective pressure for helping by raising relatedness (Nm: * κ* →

*σ*

_{v}

^{2}/

*N*, where

*σ*

_{v}=

*σ*/

*f*is the coefficient of variation (Lynch & Walsh 1998, p. 23), which is assumed to be small relative to

*N*, and where

*σ*and

*f*are, respectively, the mean and the variance of the fecundity distribution; Lehmann & Balloux 2007, eqn 16).

### (c) Demographic structures

#### (i) Age structure

In the reference model, each individual dies after reproduction but individuals may also survive from one generation to the next. If surviving adults remain in their natal patch and only juveniles disperse, the average relatedness between patch members builds up relative to that in the reference model because the effective dispersal rate is lower when adults do not disperse than when they do. In the same time, the probability that an offspring from the focal individual competes for the same local breeding spot as the offspring from another individual still depends on the probability that both offspring are philopatric. A consequence of this feature is that the benefits to neighbours tend now to be more decoupled from local competition.

Another factor promoting the selective pressure on helping is thus the presence of overlapping generations (Taylor & Irwin 2000; Irwin & Taylor 2001), or, in other words, the presence of ‘asynchronous’ rather than ‘synchronous’ updating (Nakamaru *et al.* 1997; Koella 2000; Ohtsuki *et al.* 2006). If in the reference life cycle, each adult individual survives independently with probability *s* to the next generation, then the selective pressure on costly helping is increased (F: equation (A18), * κ* = {2

*s*(1 −

*m*)}/{2

*s*(1 −

*m*) +

*N*[2 −

*m*(1 −

*s*)]}, Nm:

*→*

*κ**s*/

*N*; Taylor & Irwin 2000).

But the overlapping generation effect would not work if all adults also dispersed independently of each other to new demes at the same rate as juveniles. In this case, the average relatedness between patch members would be lower than in the null model because the effective migration rate would be the same but the coalescence probability lower than in the reference model. Dispersal of adults may then select for harming (reducing the survival of neighbours) instead of helping (F: equation (A21) with equation (A20), Nm: * κ* = {

*s*(

*m*(1 −

*s*) −

*m*

_{a})}/{

*N*(

*m*(1 −

*s*) +

*m*

_{a}

*s*)}, where

*m*

_{a}is the migration rate of adults).

With overlapping generations, one may also suppose that the expression of the mutant allele increases the survival probability *s* of neighbours by *B*, and decreases the survival probability of the focal individual by *C*. Such effects on survival actually result in a weaker selective pressure on helping than effects on fecundity because they increase local competition more than fecundity effects do. By reducing the probability that neighbours die, fewer breeding spots are vacated and available to the offspring of the focal individual. But, by contrast to fecundity effects, where the intensity of local competition depends on the probability that two offspring from the focal patch compete against other (effect of order (1 − *m*)^{2}), the intensity of local competition under survival effects depends on the generally higher probability that an offspring from the focal patch settles locally (effect of order 1 − *m*). As a result, the selective pressure on helping with effects on survival is lower than that with effects on fecundity (Nakamaru *et al.* 1997; Taylor & Irwin 2000), and harming is again selected for (F: equation (A19), * κ* = −(1 −

*s*)(1 −

*m*)/{2

*N*− (1 −

*s*)[1 +

*m*(

*N*− 1)]}, Nm:

*→ −(1 −*

*κ**s*)/2

*N*).

The distinction between effects on fecundity and effects on survival (effect on *s* in the last paragraph) also helps us to understand the difference of selective pressure resulting from different reproductive schemes under overlapping generations with exactly one individual dying per generation (the so-called Moran process; Ewens 2004). Under this life history, it was observed that one demographic regime, the so-called death–birth protocol (DB), allows for costly helping, whereas another, the so-called birth–death protocol (BD), does not (Ohtsuki *et al.* 2006; Grafen 2007; Taylor *et al.* 2007*a*). Under the DB protocol, an individual sampled at random from the population dies and the neighbours then compete to replace the vacant spot with their relative pay-off affecting those chances of replacement. This corresponds to effects on fecundity. By contrast, under the BD protocol, a random individual is chosen to reproduce, with a probability equal to its relative pay-off. A random neighbour of the reproducer is then killed to make a space for its offspring. This can be interpreted as effects on survival because the act of helping by a focal individual increases the average lifespan of its neighbours, as it increases their chances of not being killed and reproducing instead.

#### (ii) Sex structure

In the reference model, both males and females have exactly the same behaviours. However, the dispersal rate of males and females might differ. In this case, the relatedness asymmetries between the sexes stemming from sex-specific dispersal may select for altruism under certain conditions, and asymmetries in the number of adult individuals may also do so (Johnstone & Cant 2008; Gardner 2010). Selection may then favour the sex that disperses less to help both males and females. However, when the sex bias in dispersal becomes extreme, selection will favour harming behaviour, so that the set of parameter values where sex-specific dispersal results in a higher selective pressure on helping than in the reference model (and holding everything else unchanged) is rather small (Nm: * κ* → 0; Johnstone & Cant 2008, p. 323), while that for sex-specific adult number is larger (Nm:

*→ (1/*

*κ**N*

_{m}− 1/

*N*

_{f})/2 for an action performed by a female, and where

*N*

_{i}is the number of individuals of sex

*i*(Johnstone & Cant 2008, p. 323).

#### (iii) Social structure

Under the reference model, each adult individual carrying the mutant helping allele helps its neighbours to produce more offspring and bears the cost of helping in terms of reduced reproduction. But each such adult has to reproduce, otherwise forgoing reproduction results in demographic fluctuations, a feature that greatly complicates the analysis of the selective pressure on social behaviour (demographic fluctuations are discussed below). An adult individual in the reference model cannot be interpreted as being a sterile worker like those occurring in social insects (Wilson 1975; Bourke & Franks 1995).

In order to get a representation of sterile workers, one has to introduce castes into the model and a simple way to do this is to assume that the *N* adult individuals within groups are all queens and that they produce both queens and workers. One can then consider that workers help to raise the brood of all queens in their natal patch before any dispersal of juveniles occurs, which is commonly observed in social insects (Wilson 1975; Bourke & Franks 1995). This is equivalent to ‘helping before dispersal’ as discussed above, and a worker caste is selected for (* κ* = 1/

*N*). But the worker caste would also evolve when helping occurs after the dispersal of both workers and queens (

*= (1 −*

*κ**m*)

^{2}/

*N*, Nm:

*→ 1/*

*κ**N*; Lehmann

*et al.*2008). Further, costly helping would also evolve if it occurred between dominant and subordinate individuals (Johnstone 2008).

#### (iv) Geographic structure: explicit versus implicit space

The discussion so far has been centred only around ‘patch-structured’ populations, where well-defined boundaries separate the individuals from the same group and where dispersal is random between groups (Wright's 1931 island model). By contrast, in natural populations dispersal is usually localized; that is, migrants preferentially move nearby rather than homogeneously over the landscape, a feature accounted for in models of isolation by distance (e.g. Malécot 1973, 1975). In such models, the relatedness between two individuals taken from different groups typically decreases as the distance between the groups increases, as more distant individuals are less likely to share recent ancestors than more distant ones.

From the point of view of social behaviours, introducing explicit space is akin to introducing additional categories of actors. The fitness of a focal individual then no longer depends only on its own phenotype, average phenotype of patch mates and the average phenotype of individuals in the population (see equation (3.3)), but may be affected differently by individuals living at different spatial locations, so that Hamilton's rule now needs to be evaluated with multiple classes of recipients (see §5). For instance, competition between plants (for light or nutrients in the soil) might decrease with the spatial distance between them.

Although there is no doubt that spatially explicit models are more realistic than patch-structured models with random migration (e.g. Comins *et al.* 1980; Rogers 1990; Taylor 1992*b*; Irwin & Taylor 2001; Hauert & Doebeli 2004; Rousset 2004; Ohtsuki *et al.* 2006), they add substantial mathematical and dynamic complexity without necessarily leading to new insights concerning the conditions favouring or inhibiting the evolution of helping. For instance, spatial pattern formation can lead to intricate temporal dynamics in deterministic models, but it has been investigated mainly in models where pure Defect is opposed to Tit-for-Tat in the Prisoner's Dilemma game, rather than a continuum of ‘mixed’ strategies as in, e.g. Taylor & Irwin (2000). Beyond such pattern formation, discrepancies between the island and the isolation by distance setting are essentially quantitative, and occur in models of the evolution of the dispersal rate (Gandon & Rousset 1999), of the distribution of dispersal distance (Rousset & Gandon 2002) and of costly helping itself (Lehmann *et al.* 2007).

But importantly, the qualitative features exposed in spatially explicit models for the evolution of helping behaviours can generally already be observed in the simpler island models. For instance, the direct generalization to isolation by distance of the overlapping generation model with fecundity effects discussed above shows that the selective pressure on helping has the same qualitative and quantitative features than under the island model for the weak migration large population size limit (F: equation (A23), Nm: * κ* →

*s*/

*N*). For these reasons, we will ignore the more realistic features of isolation by distance models, and continue our discussion of the life cycle factors affecting the evolution of helping mainly within the context of the island model.

### (d) Environmental dynamics

So far, the dynamics of the environment were assumed to be constant. Each individual in each deme in each generation faces exactly the same environmental conditions as any other individual from any other generation. But biotic and abiotic environments are unlikely to remain constant over time and they may change owing to fluctuations, for instance, in resources, weather, diseases, predation, or even the behaviour of conspecifics. Such environmental fluctuations are likely to affect the fitness of several or of all individuals within a group, which may then change the selective pressure on helping.

#### (i) Environmental stochasticity

A simple way to introduce environmental fluctuations into the reference model is to assume that each deme may go extinct in each generation with probability 1 − *s*_{d} (where *s*_{d} is the survival probability of a deme), a formulation that underlies the classic metapopulation models (Slatkin 1977; Hanski & Gilpin 1997). Such patch destruction rate continuously generates empty breeding spots (empty demes), which can be re-colonized. It might then be expected that demes with more helpers are more likely to re-colonize empty patches. Introducing metapopulation dynamics does in itself not change the intensity of benefits to neighbours relative to the concomitant increase in local competition, so that adding extinction does not in itself select for higher levels of helping (* κ* = 0; Lehmann

*et al.*2006).

Phenotypic effects may not only affect the fecundity of neighbours but may also reduce the intensity of environmental fluctuations by increasing the survival probability of whole demes. For instance, the construction of nests and burrows may buffer individuals from temperature changes or may allow them to store food, which reduces extinction risks from starvation. One can then suppose that the expression of the mutant allele may increase the deme survival rate, *s*_{d}, by *B*. This effect on patch demography results in an inclusive fitness benefit to all patch members (including the focal) because the chance of them reproducing is increased. At the same time, the intensity of kin competition is not increased because reducing patch extinction does not in itself increase the productivity of neighbours relative to that of the focal individual. As a result, the selective pressure on helping is much increased (F: equation (A27), * κ* = 1/[{1 −

*s*

_{d}(1 −

*m*)

^{2}}

*N*], Nm:

*→ 1/{(1 −*

*κ**s*

_{d})

*N*}; Eshel 1972; Aoki 1982; Lehmann

*et al.*2006).

#### (ii) Niche construction

Individuals might not only alter the environmental conditions generated by exogenous abiotic or biotic factors but may also generate or construct the environments to which they and other conspecifics are exposed (Dawkins 1982; Odling-Smee *et al.* 2003). For instance, the construction of a nest or a dam, the emission of detritus, or even the behaviour of an individual, can be seen as an environment affecting other individuals, in which case the environment can be thought of as being endogenously determined (to some extent at least). Such extended phenotypic effects might not only change the vital rates of others living in the generation of the actor, but also that of individuals living in the next, or subsequent generations. Because limited dispersal generates relatedness between actors and recipients both within and across generations (Malécot 1973, 1975), even if there is a multigenerational gap between behavioural modification of the environment and fitness consequences on recipients, selection may favour social behaviours that are costly to the actor and increase the fitness of individuals living in downstream generations.

Suppose that the phenotypic effect *B* on other individuals affects the reproduction of individuals living in the focal deme in future generations and that it decays with time at rate *λ* (when *λ* = 0 the effect, e.g. a nest, is erased from one generation to the next, while when *λ* = 1 the nest stays forever). This effect on the vital rates of future generations does not increase the intensity of competition experienced by the focal individual or that by its offspring and thus decouples benefits to recipients and local competition. Consequently, the presence of long-lasting effects increases selection on helping (F: equation (A30), * κ* =

*λ*(1 −

*m*)/[{1 −

*λ*(1 −

*m*)}

*N*]; Nm:

*→*

*κ**λ*/{(1 −

*λ*)

*N*}; Lehmann 2007; Wakano 2007; Sozou 2009). It is worth recalling that counting the number of offspring in the next generation is still sufficient for the computation of the selection gradient on long-lasting behaviours: multigenerational effects are taken into account as effects of actors from earlier generations on the one-generation fitness

*w*of a focal individual (equation (A30)).

### (e) Population dynamics

Until now, we have considered that the number of individuals in each group is fixed. Such constant group size follows from assuming that, first, there is some ceiling number of individuals that can reach adulthood in each deme or, second, that reproductive output is so large that groups of individuals will always be saturated. As a result, there are no fluctuations in patch size in the population. But in natural populations, fecundity is neither infinitely large nor is regulation necessarily of the ceiling type. Hence, deme size may actually fluctuate between a whole spectrum of sizes, which may affect selection on social traits.

#### (i) Demographic stochasticity

While environmental stochasticity refers to situations where several individuals are affected by a common factor, demographic stochasticity refers to hazards experienced independently by each individual. Under demographic stochasticity, a maximum number of breeding spots need not be imposed to regulate the population. But there is an intermediate number of settled individuals in a deme that would maximize its future genetic contribution to the population as a result of a trade-off between number of settled individuals and fecundity or survival of offspring. At equilibrium, the population may be undersaturated, i.e. average deme size may be below this maximizing number, and the difference is analogous to empty breeding spots, which may be filled if individuals produced more offspring as a result of helping. Filling empty local breeding spots functions like reducing group extinction, as in both cases the average focal group size is increased relative to that of other groups as a result of helping behaviours.

Because this situation is difficult to analyse formally, models taking demographic fluctuations into account often assume that population demography follows a so-called birth and death process (only one individual in a group or in the total population reproduces or dies per unit time; Grimmett & Stirzaker 2001), which induces overlapping generations. Such models may be thought of as demographically explicit versions of the overlapping generation models discussed above when *s* becomes close to unity. Under these birth and death processes, it has recurrently been found that helping can evolve under limited dispersal (e.g. van Baalen & Rand 1998; Le Galliard *et al.* 2003; Lion & van Baalen 2007; Alizon & Taylor 2008; Lion & Gandon 2009). However, overlapping generations is a feature that in itself greatly increases the selective pressure on helping (see §4*c*(i)), which raises the question of the extent to which open breeding spots, rather than overlapping generations, increase the selective pressure on helping in these models.

An analytical discrimination of the effects of overlapping generations and open breeding spots can in theory be performed (Rousset & Ronce 2004). In the presence of demographic stochasticity, the inclusive fitness effect can be decomposed into two terms: *S*_{IF} = *S*_{f} + *S*_{Pr}, where *S*_{f} is a demographic average of the selective pressure encountered so far (e.g. demographic average of equation (3.5)), while *S*_{Pr} captures the additional selective pressure on the mutant allele stemming from it, changing the local demographic states and which quantifies the strength of selection on helping stemming from filling open breeding spots. In the appendix, we compare these two components of selection, *S*_{Pr} and *S*_{f}, for demographically explicit models based on the infinite island population structure (see equations (A33)–(A59)). This allows us to clarify the common features of the inclusive fitness effects arising under birth–death reproduction (e.g. van Baalen & Rand 1998; Le Galliard *et al.* 2003; Lion & van Baalen 2007; Alizon & Taylor 2008; Lion & Gandon 2009) and in those of semelparous reproduction (Rousset & Ronce 2004; Lehmann *et al.* 2006), and suggest that overlapping generations contribute substantially to *S*_{IF}.

Open breeding spots may promote selection on helping behaviours (i.e. *S*_{Pr} > 0) only insofar as populations are undersaturated. In patch-structured models without overlapping generations (semelparous populations), costly helping then evolves under rather stringent conditions because populations are found close to saturation (in which case equation (3.8) applies and *S*_{Pr} ≈ 0), unless fecundity is very low or positive density dependence (Allee effect) interferes with demographic stochasticity (Lehmann *et al.* 2006). But models built on birth and death processes tend to bring in additional demographic stochasticity relative to semelparous reproduction. Under a birth and death process, there is a variance in both the survival and the reproduction of individuals, which increases the demographic variance in the population and may lead to more frequent undersaturation. The demographic component of inclusive fitness, *S*_{Pr}, may be stronger under birth and death processes than under semelparous populations, where all individuals die with certainty in each generation, which may then increase the selection pressure on social behaviours filling open breeding spots.

An estimate about the overall strength of selection on helping under a birth and death demographic process with fecundity effects can be found from lattice models, where a focal individual may interact with up to *N* nearest neighbours; that is, each site on the lattice is connected to *N* other sites. Under these assumptions, * κ* → 1/

*N*under the

*Nm*limit (Lion & Gandon 2009, eqn (14) with relatedness given up on p. 1501). But the different quantitative results stemming from assuming different demographies raise the question of which demographic model is relevant in which situation. A synthesis remains to be done in order to assess the importance of the role of empty breeding spots generated by demographic stochasticity alone for the evolution of helping behaviours.

#### (ii) Niche and range expansion

Average group size might be increased not only as a result of filling empty breeding spots generated by demographic stochasticity, but also by changing the number of local breeding spots or the number of individuals surviving competition. This might occur if social interactions allow individuals to access new resources (niche expansion) or new territories (range expansion), thereby changing the local ecological conditions in which groups are constrained to live. The spatial distribution of resources or the size of prey might prevent their exploitation by isolated individuals, but by mutual cooperation such resources might be seized, which may result in an increase in local group size. Because such group size expansion results in a higher contribution of a focal group to the ancestry of the population, helping behaviours leading to group size expansion can result in fitness benefits without concomitantly increasing kin competition.

Although several models have considered the evolution of optimal group size (Clark & Mangel 1986; Giraldeau & Caraco 2000; Kokko *et al.* 2001), few models have considered the benefits of group size expansion in a structured population setting, which necessarily leads to indirect fitness benefits when total group size remains finite. Nevertheless, it has been shown that, regardless of the level of saturation of a focal deme, a mutant allele increasing the average number of individuals reaching adulthood in the focal deme, for instance because of a reduction of density-dependent competition for resources among juveniles, is under higher selection (Lehmann *et al.* 2006). A rough estimate about the strength of this effect is given by considering that a mutant allele may increase the probability of transition of a focal deme from size *N* to a larger size, say *N*_{+}, with this transition probability being equal to unity when every individual in the focal patch carries the mutant allele, in which case the strength of selection on patch size expansion can be high (Nm: * κ* → (1/

*k*

^{2}− 1)/(2

*N*), where

*k*=

*N*/

*N*

_{+}; Lehmann & Keller 2006, eqn (46)).

## 5. Empirical tests and the conundrum of localized dispersal

The models discussed above show that the strength of selection on helping behaviours depends critically on life history and demographic factors. In order to identify those factors favouring helping in natural populations, one approach is to seek those factors that appear correlated with the occurrence of sociality. For example, correlative studies show that helping behaviours are more common in variable environments (Rubenstein & Lovette 2007). This goes well with some of the results of the models discussed above (e.g. §4*d*). However, the same theoretical models show that the phenotypic consequences of helping (on fecundity of adults versus survival of juveniles) as well as the mode of population recolonization (e.g. propagule mode of dispersal versus individual dispersal) are critical in determining the strength of selection on helping. The relevance of these factors for selection on helping in natural populations remains to be investigated. The propagule (or budding) mode of dispersal may be common in social species (e.g. Sharp *et al.* 2008; further references in Lehmann *et al.* (2006) and Cornwallis *et al.* (2009)) but its importance in promoting helping has been little studied empirically.

Another correlative approach is to measure relatedness under limited dispersal in order to compare the expected magnitude of indirect effects on helping across different demographic conditions. Despite the counter-example provided by the reference model (equation 3.8; Taylor 1992*a*), the idea that higher relatedness favours higher levels of helping has remained prevalent. This is because it works both in family-structured models as originally considered by Hamilton (1964*a*,*b*), and in many of the scenarios encountered above. Experimental studies with bacteria are consistent both with the results of the reference model when its assumptions are enforced in the experimental protocol, and with the idea that relatedness otherwise favours helping (Kümmerli *et al.* 2009*a*,*b*).

The interpretation of estimates of relatedness under limited dispersal is not straightforward. First, kin discrimination may blur relationships between relatedness and helping, as the relatedness between interacting pairs within groups will not be well predicted from spatial patterns alone (e.g. Cornwallis *et al.* 2009). Second, such works have been confronted to a natural feature absent from the simplest model; namely that dispersal is generally localized (§4*c*(iv)). This affects the ‘scale of competition’, i.e. who competes with whom, and whose relatedness should be computed (see studies of unicoloniality in ants by Helantera *et al.* (2009) for a recent example). In particular, since relatedness is a three-parties concept, this raises the problem of assessing the reference population relative to which the relatedness of a pair of individuals is measured. In the sequel of this section, we will show how the problem of the reference population should be addressed in empirical studies, by contrasting three different answers to this question. Readers not interested in the estimation of relatedness can skip this section and directly go to §6.

### (a) Relatedness: island model

We first recall some statistical definitions of relatedness that apply to the simple island model. One definition of relatedness is Wright's classic statistic *R* = *F*_{ST} of population structure, which one can write as
5.1
where *p*_{0} is the average, over focal individuals *that bear the mutant allele*, of the frequency of the mutant allele among patch neighbours, and *p* is this allele's frequency in the total population (i.e. regression definition of relatedness: *p*_{0} = *F*_{ST} + (1 − *F*_{ST})*p*; Grafen 1985; Rousset 2002). The frequency *p*_{0} is increased above *p* only to the extent that a focal individual mutant and a neighbour have a common ancestor in the focal's patch, which matches our earlier probabilistic interpretation of relatedness (and, in a neutral model, is independent of *p*).

The same expression for relatedness can be written as
5.2
where *q*_{0} is the frequency of pairs of gene copies from two neighbours within a patch that bear the same allele (the mutant or the resident one), and *q* is the same frequency for pairs of genes taken from the whole population. Because such ratios of frequencies of identical pairs estimate relatedness defined as the probability of coalescence within the deme, relatedness can be estimated using the same formula now applied to frequencies of identical pairs at ideally neutral loci, not involved in the determinism of a given social trait.

In diploid models, or under isolation by distance as considered below, other functions of frequencies of identical pairs for a social mutant allele may need to be considered, but they can be estimated by the same function of sample frequencies of identical pairs of genes at neutral loci. This forms the basis of widely used estimators of relatedness (Queller & Goodnight 1989) and more generally of moment estimators of Wright's *F*-statistics (e.g. Weir & Cockerham 1984) as further detailed elsewhere (Rousset 2007). More powerful estimators can be defined when additional information is used (e.g. pedigree reconstructions, or when only a small number of kinship ties have to be distinguished, such as sisters versus cousins).

### (b) Relatedness: localized dispersal

We now discuss three different choices of reference population that can be used to evaluate relatedness under localized dispersal: the total population (relatedness is measured relative to the global scale), the deme of a focal individual (relatedness is measured relative to the local scale) and the competitive neighbourhood (relatedness is measured relative to some specific class of individuals).

#### (i) Relatedness relative to the global scale

Under localized dispersal, the change of mutant allele frequency can still be written under the same form as encountered above (see equations (3.2) and (3.4)). Namely
5.3
where *σ*^{2} is a measure of genetic variation in the total population, which reduces to *p*(1 − *p*) in the island model, and
5.4
is a direct generalization of the inclusive fitness effect for the island model (Rousset 2006). The first term in the sum, *b*_{k} = ∂*w*/∂*z*_{k}, is the effect of all neighbours separated by distance *k* from the focal deme, on the focal individual's fitness (all individuals at distance *k* are treated symmetrically). The second term in the sum, *R*_{k} = (*p*_{k} − *p**)/(1 − *p*), is a measure of relatedness, which is expressed in terms of the frequency *p*_{k} of the mutant allele among distance-*k* neighbours, of the frequency *p* of the mutant allele in the population; and the frequency *p** in *any* given class of actors.

The reference class in equation (5.4) does not matter because, by necessity, the sum of the partial derivatives of *w*, with respect to all phenotypes involved, is null, which follows from the fact that the evolutionary dynamics is zero-sum (Rousset & Billiard 2000): when one allele increases in frequency, the other must decrease in frequency. In particular, in the island model ∂*w*/∂*z*_{•} + ∂*w*/∂*z*_{0} + ∂*w*/∂*z* = 0. Hence, the derivative relative to the mean population phenotype ∂*w*/∂*z* is −∂*w*/∂*z*_{•} − ∂*w*/∂*z*_{0} = *c* − *b*.

If we let *p** = *p*, the relatedness coefficients in *S*_{IF} are defined relative to the total population, which matches the original formulation of Hamilton's rule. In particular, in the island model, the inclusive fitness effect takes the usual form as that given by equation (3.5):
5.5
where ‘[0]’ exhibits the null relatedness between the focal individual and the average population. Such relatedness coefficients have also been used in theoretical analyses of localized dispersal (e.g. Grafen 2007; Taylor *et al.* 2007*b*). But it then comes at the cost that data analyses based on this formulation have to identify something that matches the concept of total population size of the models.

#### (ii) Relatedness relative to the local scale

There is a second interpretation of relatedness, relative to the focal deme, and that follows from using the formula for *F*_{ST} (equation (5.1)) and rewriting equation (5.3) as
5.6
where
5.7

Now let *p** = *p*_{0}, so that relatedness coefficients are defined relative to the focal deme. In particular, the relatedness between the focal individual and its deme neighbours is (*p*_{0} − *p*_{0})/(1 − *p*_{0}) = 0, which means that the neighbours are not more related than themselves to the focal individual. In the island model, we then recover the standard inclusive fitness effect in the form *S*_{IF}(*z*) = (1 − *F*_{ST})*S̃*_{IF}(*z*) where
5.8

With localized dispersal, the relatedness coefficients in *S̃*_{IF} can be estimated from local data only because they are of the form *R*_{k} = (*p*_{k} − *p*_{0})/(1 − *p*_{0}) = −*F*_{STk}/(1 − *F*_{STk}) in terms of the *F*_{ST} between pairs of demes at distance *k*: *F*_{STk} = (*p*_{0} − *p*_{k})/(1 − *p*_{k}). Hence, *S̃*_{IF} can be thought of as a localized selection gradient. The expression for the change of allele frequency (equation (5.6)) then conveys two important messages. First, that the fate of the mutant depends essentially only on local features (as quantified by *S̃*_{IF}). Second, that its speed of advance in the total population depends also on its frequency and spatial distribution in the total population (as quantified by (1 − *F*_{ST})*σ*^{2}). This distribution cannot usually be estimated from local data only but it does not affect the direction of selection on helping.

#### (iii) Relatedness relative to the competitive neighbourhood

There is a third interpretation of relatedness, in terms of a competition neighbourhood relative to which relatedness should be measured (Queller 1994). This interpretation can be reached from equation (5.4) as follows. Suppose we can distinguish among the recipients of an act of helping two categories of adult individuals: competitors and another class that is *a priori* more related to the focal individual and that we call ‘beneficiaries’. These two classes of recipients can be obtained from the fitness effects (the ∂*w*/∂*z*_{k}'s) of class-*k* neighbours in various ways. For instance, one may pool all the classes into the two categories, beneficiaries and competitors. Alternatively, one may split the individual of each class-*k* into the two non-symmetric categories of beneficiaries and competitors (e.g. the patch mates of the focal individual consist of an average of more and less-related individuals, such as its siblings and immigrants), and then pool over the classes all individuals belonging to a given category. Either way, equation (5.4) can then be written as
5.9
where the first sum is over all beneficiaries, while the second sum is over all competitors.

We can then define a weighted mean allele frequency among beneficiaries as 5.10 and a weighted mean allele frequency among competitors as 5.11

If we let *p** be equal to *p*_{c}, we can eliminate the last term in equation (5.9), and the selection gradient can be written as
5.12
so that, regardless of the number of categories of recipients, the last factor looks like −*c* + *rb* for relatedness given by *r* = (*p*_{b} − *p*_{c})/(1 − *p*_{c}) and the benefit is given by *b* = ∑_{class k of beneficiaries} ∂*w*/∂*z*_{k}, the sum of the benefits over all beneficiaries.

We see that in equation (5.12), the fitness cost to the actor is the same as in Hamilton's rule but relatedness is now expressed in terms of a weighted allele frequency, which is specific to each biological scenario (i.e. the weights are different for each distinct scenario discussed above) and on the choice of the partitioning of the individuals into the two categories. Hence, while the whole expression (equation (5.12)) gives the same direction of selection as Hamilton's rule (equation (3.6)), its terms do not match those of the *Rb* − *c* formula (equations (3.6) or (5.4)).

The terms of equation (5.12) do not match those of the *B** κ* −

*C*formula (equation (4.1)) either, because each fitness effect ∂

*w*/∂

*z*

_{k}may involve both the cost,

*C*, and the benefit,

*B*, measured in units of vital rates, as each actor may on one side increase the vital rates of a recipient in proportion to

*B*and at the same time decrease the competition experienced by this recipient in proportion to its own cost

*C*. However, each fitness effect ∂

*w*/∂

*z*

_{k}may itself be partitioned into a ‘beneficial effect’ and a ‘competitive effect’ separating

*B*and

*C*terms. By using this alternative partition and following the same argument as above, one can obtain an expression like equation (5.12), but separating

*B*and

*C*terms rather than beneficiaries and competitors, whose terms match those of the

*B*

*−*

*κ**C*formula.

Despite the popularity of the interpretation of relatedness measured relative to the scale of competition (e.g. West *et al.* 2002; Helantera *et al.* 2009; Platt & Bever 2009), inspection of equations (5.9)–(5.12) suggests that it may generate confusions because: (i) it remains unclear which of the partitions are actually envisioned by its practitioners; (ii) its formulation may raise concerns about the interpretation of relatedness coefficients; and (iii) different traits operate within different economic neighbourhoods (Gardner & West 2006). It is thus important to realize that there are many possible partitions of the total fitness effect of a mutant allele (e.g. many different ways of taking the sum in equation (5.9)), but the terms in different partitions cannot have consistent meanings across partitions. We are also unaware of any study that has tried to estimate weighted probability of identity as suggested by equations (5.10) and (5.11) or those resulting from other partitions of the fitness effects ∂*w*/∂*z*_{k}.

In addition to the discrepancy with Hamilton's rule, the interpretation of relatedness in terms of scale of competition may raise additional concerns. For instance, a slight generalization of Taylor's (1992*a*) model (Taylor 1992*b*; Rousset 2004, eqn 7.21) shows that the result that costly helping is not favoured (equation (3.8)) holds whatever the relative sizes of the ‘scale of cooperation’ (the maximum distance of neighbours benefiting from a focal individual's helping act) and of the ‘scale of competition (or regulation)’. The idea that relatedness has to be low when competition occurs over a small scale (Helantera *et al.* 2009) then does not fit with the fact that, to the extent that a ‘scale of competition’ depends on a scale of dispersal, a small scale of competition would imply a small scale of dispersal and then a strong local genetic structure (high relatedness).

## 6. Discussion

The models discussed in this paper illustrate that the selective pressure on helping behaviours under limited dispersal depends considerably on life history and demographic factors (table 2). While the idea that the increase in local competition cancels out the benefit of helping under limited dispersal has become popular (equation (3.8)), we saw that this result relies on very specific assumptions. These assumptions are unlikely to be exactly met in natural populations, and when they are relaxed a situation where costly helping can be selected for usually emerges (table 2). A main message of our analysis is thus that under many conditions (if not most) limited dispersal and small deme size may favour selection on unconditional costly helping (altruism). This fits well with the intuitive notion that higher relatedness between neighbours should lead to higher levels of altruism. More generally, this implies that the selection pressure on most social traits will vary directly with relatedness under limited dispersal.

### (a) Three types of quantitative outcomes

Analysis of the models presented in this paper illustrates that variations in life history and demographic factors may lead to many different selection gradients on helping (and hence * κ* coefficients, table 2). These rules of invasion of costly helping can be divided into three quantitative categories, based on the value that

*takes under strong population structure and large deme size (*

*κ**Nm*limit, see table 2).

The first category encompasses situations leading to vanishingly low selective pressure for costly helping or selection on harming (* κ* → 0 or

*< 0). This encapsulates all situations where helping neighbours only increases local competition but not much productivity relative to other demes (e.g. regulation before dispersal, effects on survival, sex-specific dispersal). The second category of invasion rules encompasses situations leading to selection on helping being proportional to the inverse of deme or neighbourhood size (*

*κ**→*

*κ**q*/

*N*, for some

*q*≤ 1 depending on life cycle features). This encapsulates the cases where benefits to neighbours are partially decoupled from local competition (e.g. selective emigration, social structures, explicit population dynamics, above Poisson fecundity or mating distribution, niche construction if

*λ*is small). Finally, the third category encompasses situations leading to a strong selective pressure on helping (

*can be arbitrarily larger that 1/*

*κ**N*). Here, the benefits to neighbours are strongly decoupled from local competition (e.g. propagule or budding dispersal and competition, effects on group extinction, niche construction if

*λ*is large).

Among all models encountered so far in the literature, the most frequent quantitative outcome is the second; that is, when the selective pressure on helping is at most 1/*N*. This supports the idea that selection for costly helping is negligible when population structure, here characterized by deme size, is weak. Yet, cases where helping evolves may overall be of the third type. In particular, it may be that modelling efforts have been driven away from some important cases, partly for technical reasons. For example, local extinctions and recolonization can both lead to strong spatial relatedness (mainly determined by the minimal deme size) and favour helping (Lehmann *et al.* 2006, p. 1145), yet they do not easily lead to simple theoretical results as presented in table 2. Such metapopulation process are nevertheless common (Clobert *et al.* 2001; Hanski & Gaggiotti 2004), but their effects on the evolution of social behaviours have not been much investigated, as well as the evolution of social interactions in the presence of age-classes with ageing (senescence).

Another even less relaxed assumption for the evolution of social behaviours is that traits affect fitness continuously, as a chemical law of mass action. This may be appropriate for many traits such as dispersal or sex ratio, but may be less appropriate for agonistic interactions between groups, where it may be most important to be bigger than the competitors, and where fitness may be a steep function of the difference between the phenotypes of competitors.

### (b) Other features affecting outcomes

We now spell out some other features, which may markedly affect the evolution of helping, and that have been left out of our discussion so far. For instance, mating systems such as polyandry affect social behaviours in insect colonies (Bourke & Franks 1995). It is thus important to keep in mind that the models discussed here were haploid without Mendelian segregation and that features of the genetic system, such as diploidy or haplo-diploidy, the mode of control of the expression of traits (e.g. parental versus offspring, imprinting), may also affect the selective pressure on helping, or more generally the evolution of any social behaviour (e.g. Hamilton 1979; Taylor 1988; Haig 1997).

We have also not discussed the conditional expression of helping, which is useful to divide into at least two categories. First, the behaviour of an individual may be conditional on the behaviour of its social partner(s), as occurs, for instance, in multimove games such as the repeated Prisoner's Dilemma game (Trivers 1971; Axelrod & Hamilton 1981; Leimar 1997), the Bargaining game (Binmore *et al.* 1997; McNamara 1999) or the Foraging game detailed in §7 (a variety of game-theoretic concepts are also presented in various contribution to this volume, see Connor 2010; Leimar & Hammerstein 2010). This type of ‘strategic conditionality’ is implicitly taken into account in the model discussed above. Hence, if selection is weak and gene action is additive, different strategic situations will lead to different values of the cost *C* and the benefit *B*, without the need to re-evaluate the consequences of the various life history and demographic factors for selection on strategies for each new behavioural scenario (see §7 for an example, Taylor & Irwin (2000); Lehmann & Keller (2006) for other ones, and Day & Taylor (1997, 2000) for a more general formulation of dynamic games).

The second category of conditional helping involves those situations resulting in some form of kin recognition and where the behaviour of an individual is expressed conditionally on some demographic feature. For instance, this may be the case when helping is expressed conditionally on a focal individual being philopatric, or conditionally on the focal individual's social partner(s) being philopatric, or even on the partner having identical recognition tags/markers to the focal individual (e.g. Frank 1998; Axelrod *et al.* 2004; Jansen & van Baalen 2006; Rousset & Roze 2007; El Mouden & Gardner 2008; Johnstone & Cant 2008). In all these cases, the expression of the behaviour is conditional on variables that are themselves function of demographic or life-history features (migration rate, population size, survival). These kinds of scenario are not implicitly taken into account in the models discussed above but usually involve direct extensions of them (e.g. Rousset & Roze 2007; El Mouden & Gardner 2008; Johnstone & Cant 2008).

### (c) Measuring relatedness

Although we saw that there is a large variety of life history and demographic scenarios for the evolution of helping, some may be more plausible than others. The most direct test of a given scenario leading to the evolution of helping may actually be a test of its life history and demographic assumptions. Another approach is to measure relatedness under limited dispersal in order to compare the expected magnitude of indirect effects on helping across different demographic conditions. We have seen that there are different, equally valid ways to represent allele frequency changes for social traits in spatially structured populations (e.g. equations (5.4), (5.7) and (5.12)), but they suggest more or less appropriate data analyses. They rest on at least three different interpretations of ‘relatedness’, which imply different quantities to be estimated empirically.

Relatedness may first be defined relative to the total population allele frequency, which matches the terms of Hamilton's rule but can hardly be estimated in practice because there is a too poor match between the idealized ‘total population’ size of theoretical models and the ‘total population’ of any real species. Further, the data may simply not be there to analyse the ‘total population’ of a species of interest; that is, the scale of intraspecific competition.

Another way to defined relatedness (the third one considered above) is relative to a competitive neighbourhood, which depends on local allele frequencies, but will be specific to each new biological scenario rather than a measure common to a wide range of models, and therefore it will not bear a single relationship with relatedness in Hamilton's rule. Different choices of the reference frequency *p** (equation (5.4)) should lead to the same conclusions about inclusive fitness (as only its magnitude, not its sign, would be affected) but may render across-species (and even within-species) comparisons of relatedness meaningless.

Finally, we have considered relatedness measured relative to the local scale, which solves the above difficulties: the cost and benefits are those of Hamilton's rule, the relatedness coefficients are local and they bear a consistent relationship with relatedness in Hamilton's rule. Hence, it seems than anyone willing to estimate ‘inclusive fitness’ should focus on estimating *S̃*_{IF}(*z*) (equation (5.7)) rather than *S*_{IF}(*z*) (equation (5.4)).

### (d) Conclusion

In summary, with the assumptions of additive gene action and weak selection, theoreticians have been able to derive the consequences of many life history and demographic scenarios for the evolution of helping behaviours. This has provided an increased understanding of how selection and genetic drift interact at a local scale in order to shape the force of directional selection on social behaviours, which may often result in the evolution of unconditional costly helping. However, owing partly to the complications raised by localized dispersal, analyses of spatial variation in relatedness have provided comparatively little insight. A synthesis is still needed in order to better understand the relative importance of different demographic factors on the evolution of social traits.

## 7. What do the fitness effects *C* and *B* represent: the link between demography and strategic behaviour

### (a) Multimove social interactions

Behavioural ecologists tend to consider that the behaviour of an organism can be predicted from knowledge about a set of external stimuli and internal states of the organism (e.g. McFarland & Houston 1981; Leimar 1997; Enquist & Ghirlanda 2005). One can then model behaviour as a function *M* that maps states, *s* (internal and external inputs) to behavioural responses or action, *a*, as
7.1

This is the so-called state–space approach to behaviour (McFarland & Houston 1981; Leimar 1997; Enquist & Ghirlanda 2005), where the function *M* describes how a focal individual responds to its environment (abiotic and biotic) at any point in time, and will result in a sequence of behavioural actions *a*_{0}, *a*_{1}, *a*_{2}, … , which will affect the vital rates of the focal individual and possibly those of its neighbours.

The phenotype *z* defined in the main text may affect the states of the organism, the transitions between the states and/or the function *M* that maps states into actions. In other words, the evolving phenotype *z* may affect either directly or indirectly the actions taken by an individual at any point in time, e.g. *a*_{t}(*z*). For instance, if individuals interact repeatedly (e.g. repeated Prisoner's Dilemma game, Bargaining or Negotiation game, repeated rounds of cultural transmission, etc.) the sequence *a*_{0}, *a*_{1}, *a*_{2}, … of actions expressed by a focal individual during a period of time is affected by *z*, and will then change its vital rates (by magnitude *C*) and possibly that of its neighbours (by magnitude *B*).

Importantly, *C* and *B* capture the total change in the relative fecundity (or survival) of a focal individual stemming from it and its neighbours expressing the mutant phenotype, respectively. The interpretation of *C* and *B* is thus not limited to the outcomes of one-shot social interactions with direct genetic effects but capture as well the outcomes of multimove social interactions, which may be directly or indirectly influenced by *z*, and occur over one iteration of the life cycle.

### (b) Example: cooperative cleaners

In order to illustrate these concepts, we extend the ‘foraging in pair non-cooperative cleaning model’ of Bshary *et al.* (2008, p. 3, electronic supplementary meterial) to interactions occurring between pairs of individuals in a patch-structured population. The model describes the foraging behaviour of two cleaner fishes on a single client. The assumption for the foraging strategy is that a focal cleaner consumes encountered ectoparasites but may ‘cheat’ by taking a bite of mucus with a probability *z*_{•} per unit time (denoted *λ*_{1} in Bshary *et al.* 2008, p. 2, electronic supplementary material), while *z*_{0} denotes the probability that the partner of the focal individual, here an average patch neighbour, takes a bite of mucus per unit time (denoted *λ*_{2} in Bshary *et al.* 2008, p. 2, electronic supplementary material). After a bite of either individual, the client terminates the interaction with probability 1/2. The expected cleaning duration is then *t* = 2/(*z*_{•} + *z*_{0}) and when an interaction ends, the expected time until a new client arrives is *t*_{0}.

The fecundity of the focal individual is assumed to be given by eqn (2) of Bshary *et al.* (2008, p. 2, electronic supplementary material), which, with the present notation becomes
7.2
where *g*_{p}(*t*) is the expected energy gain that accrues to the focal cleaner from consuming ectoparasites, *z*_{•}/(*z*_{•} + *z*_{0}) is the fraction of bites of mucus taken by the focal cleaner, *β* is the expected energy gain from such a bite and the factor 2 reflects the fact that on average two bites occur before the interaction with the client ends. Note that strictly speaking one has to describe how different pairs of individuals interact in a patch in order to write the fecundity function *f* (as, e.g. in Lehmann *et al.* 2007; Rousset & Roze 2007), but we ignore these details as they do not affect the results given below.

For this strategic situation, one has:

7.3 where the prime denotes a derivative, and which gives the change in the relative fecundity of the focal individual stemming from it increasing its mucus biting probability, while

7.4 which is the change in the relative fecundity of the focal individual stemming from its partner increasing its mucus biting probability and did not appear in the original formulation of the model as cleaner fish are likely to interact in a panmictic way.

Using equation (4.1), a candidate evolutionarily stable state is found at the point where *B*(*z*)* κ* −

*C*(

*z*) = 0. Substituting equations (7.3) and (7.4) into the latter equation, we find that the candidate optimal

*z*satisfies 7.5 which shows, first, that when

*= 0, equation (7.5) reduces to the ‘non-cooperative’ solution of Bshary*

*κ**et al.*(2008, eqn (3)), and, second, that when

*= 1, eqn (7.5) reduces to the ‘cooperative’ solution of Bshary*

*κ**et al.*(2008, eqn (2)). Hence, depending on the demographic assumptions, spatial structure can tilt the optimal biting rate from the ‘non-cooperative’ to the ‘cooperative’ solution, in which case individuals provide a better service than if they were alone (Bshary

*et al.*2008).

## Acknowledgements

We thank A. Gardner, L. Keller, S. Lion, R. Bshary and an anonymous reviewer for helpful comments on the manuscript. L.L. is supported by a grant from the Swiss NSF. This is publication ISEM 10-045.

## Footnotes

One contribution of 14 to a Theme Issue ‘Cooperation and deception: from evolution to mechanisms’.

- © 2010 The Royal Society