## Abstract

Humans cooperate in large groups of unrelated individuals, and many authors have argued that such cooperation is sustained by contingent reward and punishment. However, such sanctioning systems can also stabilize a wide range of behaviours, including mutually deleterious behaviours. Moreover, it is very likely that large-scale cooperation is derived in the human lineage. Thus, understanding the evolution of mutually beneficial cooperative behaviour requires knowledge of when strategies that support such behaviour can increase when rare. Here, we derive a simple formula that gives the relatedness necessary for contingent cooperation in *n*-person iterated games to increase when rare. This rule applies to a wide range of pay-off functions and assumes that the strategies supporting cooperation are based on the presence of a threshold fraction of cooperators. This rule suggests that modest levels of relatedness are sufficient for invasion by strategies that make cooperation contingent on previous cooperation by a small fraction of group members. In contrast, only high levels of relatedness allow the invasion by strategies that require near universal cooperation. In order to derive this formula, we introduce a novel methodology for studying evolution in group structured populations including local and global group-size regulation and fluctuations in group size.

## 1. Introduction

Unlike other mammals, humans cooperate in large groups of unrelated individuals. Examples include warfare, the construction of roads, canals and other capital facilities, and risk buffering behaviours such as food sharing and mutual aid. It seems likely that our ability to cooperate played a crucial role in the rapid growth and spread of human populations over the past 50 000 years [1,2]. Beginning with Trivers's seminal paper [3], many authors have argued that human cooperation is explained by reciprocity and other forms of contingent behaviour. Because people can recognize a sizable number of individuals and remember their previous behaviour, selection leads to a psychology in which the behaviour of actors is contingent on the previous behaviour of others. Individuals help only those who have helped them in the past, or punish those who do not cooperate in mutually beneficial activities. If, in the long run, benefits of sustained cooperation exceed the short-term benefits of defection, then contingent strategies supporting cooperation can be evolutionarily stable. Such equilibria can explain the persistence of cooperation among unrelated individuals.

However, showing that cooperation can persist is not enough. Under plausible conditions, contingent strategies can stabilize virtually any behaviour including non-adaptive and maladaptive behaviours [4,5]. A complete explanation must explain why contingent cooperation is a likely evolutionary outcome. Moreover, contingent cooperation, especially in sizable groups, appears to be very rare among primates [6], and thus it is very likely that the ancestral condition in the human lineage is non-cooperative. This means it is not enough to explain the stability of contingent cooperation [7–10]; we must also explain how contingent strategies supporting cooperation can increase when rare. This is problematic because such strategies are altruistic when rare. Because other group members are unconditional defectors, rare contingent cooperators pay the cost of cooperation and benefit others, but do not gain any long run benefit. In a similar way, strategies that punish contingent on others punishing, must punish or make a costly signal of intent to punish in order to determine how many punishers there are in the group.

For reciprocity among pairs, kinship provides an easy solution to this problem. If interactions are repeated many times, the benefits to reciprocity can be very large. This means that rare reciprocators can increase even if they have only a small chance of interacting with another reciprocator, and thus even low levels of relatedness can allow reciprocating strategies to increase [11]. Since population structure often leads to low but positive background levels of relatedness, there is a plausible explanation for the evolution of pairwise reciprocity.

It is not clear whether relatedness can play a similar role in the evolution of contingent cooperation in larger groups. Boyd and co-workers [8,12] have presented models which suggest that the effect of relatedness diminishes rapidly with group size. However, these models assumed that groups are formed by sampling individuals with a constant relatedness to each other. Basic models of population structure are not consistent with this assumption because the biological processes that generate relatedness lead to interdependencies, so that knowing that two individuals share a gene by common descent increases the probability that other members of the group also share that gene by common descent. For a given relatedness, this increases the likelihood that groups will contain enough cooperators to sustain cooperation. As a result, existing work underestimates the possibility that contingent cooperation can increase when rare as a result of assortment due to population structure [13,14].

Here, we derive a rule (5.7) that gives the relatedness necessary for contingent cooperation in *n*-person iterated games to increase when rare. This rule applies to a wide range of pay-off functions, but requires that the strategies supporting cooperation are based on a threshold. Such strategies are common in the literature [5,7,12,15,16]. For example, in the iterated public goods game, a plausible strategy is to cooperate during the first period, and then cooperate if at least a fraction *θ* of the *n* individuals in the group cooperate. The derivation of this rule also assumes that groups are very large, that relatedness is low and generated by an elastic island model population structure [14], or by budding viscosity population structure [17] (propagule dispersal with group competition [18], two-level Fisher–Wright [13]). We will present numerical results which suggest that this rule provides also useful estimates when some of these assumptions are relaxed and the demographic parameters are in the biologically relevant range, including levels of relatedness in the range from 2% to about 10%. In order to derive this rule in §5, we will, in §3 and §4, introduce a novel methodology for studying evolution in group structured populations including local and global group-size regulation and fluctuations in group size. This methodology is also useful for studying other problems and provides new insights about how migration and local regulation affect the evolution of cooperation and altruism.

## 2. The model

Individuals live in groups of a size that may fluctuate, but is usually close to a common value *n*. During a life cycle, they interact *T* times, and in each interaction they can express either a cooperative behaviour A or a non-cooperative behaviour N. Let be the incremental effect of an interaction on the fitness of an individual expressing A given that a fraction *x* of the individuals in the group express A. By fitness, we mean the expected number of adult offspring of an individual. Here, is a constant that gives the strength of selection and that we will always suppose to be small (weak selection). The cooperative behaviour may also affect the fitness of individuals in the group that do not cooperate; let be the incremental effect of an interaction on the fitness of an individual not expressing A given that a fraction *x* of the individuals in the group express A. Non-cooperators neither produce benefits nor experience any personal cost, so that For technical reasons, we also assume, without restricting the applicability of the model, that and are piecewise continuous and always continuous from the right. And we suppose that meaning that social interaction reduces the fitness of an individual behaving cooperatively in a group in which few others behave cooperatively.

There are two heritable strategies. Cooperators express behaviour A during the first interaction and continue to express A during future interactions if the fraction of individuals in the group expressing A during the previous interaction is greater than or equal to *θ*. This means if the fraction of cooperators in the group is at least *θ*, cooperators behave cooperatively during all *T* interactions. We assume that so that such sustained cooperation is mutually beneficial to the cooperators. Defectors never express the cooperative behaviour.

When cooperators are rare and groups are formed at random, virtually all cooperators are in groups without any other cooperators. Thus, cooperation cannot increase because cooperators experience a reduction in fitness in the first round compared to defectors, and thereafter the two types behave identically and receive no pay-off.

Cooperators can increase when rare only if groups are formed assortatively so that there is some chance that they benefit from long-term cooperation. This means that relatedness in the groups is key to the evolution of cooperation. However, knowing the coefficient of relatedness within groups (*R*) alone is not, in principle, enough to determine whether cooperation can increase because the fitness functions that we are considering are nonlinear functions of the frequency of cooperators in a group. To calculate the expected fitness of rare cooperators, the entire probability distribution of frequencies is required [13,14,19]. This distribution depends on the population structure.

Here, we assume non-overlapping generations and that groups are linked by migration so that each generation each individual migrates with probability *m*. And we assume one of the following two kinds of population structure.

(1) Groups form an island model with group size elasticity. This population structure was introduced in [14] under the assumption of purely local regulation. Here, we will extend it to include also global regulation and call it ‘islands with local and global regulation’ (ILGR). The population structure and the relevant results will be summarized in §3. More detail and self-contained derivations that extend the results from [14] are provided in the electronic supplementary material. (This differs from the inelastic island model [20] which assumes completely fixed group sizes and thus cannot accommodate average fitness different from 1. Other approaches to the effects of group elasticity on the evolution of cooperation can be found, for instance, in [21–23] and references therein.)

(2) Groups compete among themselves for the production of new groups in the next generation. This is the ‘budding viscosity’ population structure of Gardner & West [17], called ‘propagule dispersal with group competition’ in [18], and ‘two-level Fisher–Wright’ (2lFW) in [13]. The idea is that cooperators, at a cost to themselves, help their group in its competition with other groups.

After discussing these population structures in the next section, we will indicate why we conjecture that the results and methods that we use should apply to a broader class of population structures.

## 3. Population structures: general results

First, we describe ILGR and summarize the main results derived in the electronic supplementary material. In ILGR, the population consists of *g* groups with a common carrying capacity *n*_{0}. We will assume that *g* and *n*_{0} are large. In the absence of selection, when all the groups have *n*_{0} individuals, the individuals have fitness 1. When the total population size differs from fitnesses are modified through global regulation, and when the size of a group differs from *n*_{0}, fitnesses of its members are also modified through local regulation. We model these effects, including also selection, by setting the absolute fitness (expected number of adult offspring) of a focal individual of type * (there are two types, A and N) as
3.1where *x* is the fraction of types A in the focal's group, *s* = *n*/*n*_{0} is the scaled size of the focal's group, assumed to currently have size *n*, and *S* = *N*/*N*_{0} is the scaled size of the complete population, assumed to currently have size *N*. The pay-off indicates how the fitness of the focal individual is modified by the behaviour of types A and N in its group, and we assume that *h*(*s*) is differentiable and *h*(1) = 1. We assume that is strictly decreasing in *s* and in *S*, is continuously differentiable, takes the value 1 at and that its partial derivatives at this point, and satisfy These assumptions mean that describes local regulation, with strength *λ*, towards group size *n*_{0}, and global regulation, with strength *λ*_{g}, towards average group size *n*_{0}. Note that in the absence of selection (when *δ* = 0), all the individuals have fitness given by and that if *s* and *S* are close to 1, then so that is a stable equilibrium. The fitness (3.1) is the expected number of adult offspring of each individual, and a full specification of the model must include the choice of the offspring distribution with this mean (e.g. Poisson).

The model assumes non-overlapping generations, and random migration at rate *m* after reproduction in each generation, meaning that with probability *m* each individual born in this generation leaves its group once it reaches adulthood and relocates in a randomly chosen group. Since larger groups produce more migrants, migration as well as local regulation drive groups towards the average size *n*_{0}. The comparison between these two forces, quantified by *m* and *λ*, is crucial in the results described below. Selection will be assumed to be weak, meaning that *δ* is positive but small (the precise conditions are discussed in §5 of the electronic supplementary material). This implies that regulation and migration act faster than selection and drive the system to a quasi-equilibrium in which the distribution of group sizes varies little over time (at any time ), while the fraction *p* of types A changes at a rate of order *δ*, and therefore may change substantially in the long time-scale 1/*δ*.

In the electronic supplementary material, we study how in quasi-equilibrium *p* changes over one generation. We show that
3.2and
3.3where is the probability density of a beta distribution with parameters *α* and *β*, *l* = (1/*R*) − 1 (recall that *R* is group relatedness), and
3.4with (which is close to *m* when *m* is small). The condition for *p* to increase is *F*(*p*) > 0, and in particular, by taking the limit we obtain the condition for types A to proliferate when rare as
3.5

Table 1 and figure 1 provide support for the conclusions summarized above, based on numerical simulations. In our simulations, group sizes were chosen in the range from 20 to 320 and offspring distributions were Poisson. This last assumption implies relative variability of group sizes from generation to generation of the order of which could be as large as Such conditions proved to be compatible with the theoretical approximate beta distribution of *x*, required in the derivation of (3.3), provided we used for relatedness in the groups the empirical value of so that This is natural since, when the frequency of cooperators in the population is *p*, the distribution of *x* is approximately beta with parameters *pl* and *ql*, which has mean *p* and variance implying that the empirical *F*_{st} should be 1/(1 + *l*). As expected, the agreement with the beta distribution, the value of *Q* and the predicted evolution of *p* improve with increasing *n*_{0}. But agreement is still very good even for the smaller values of *n*_{0}.

The quantity *Q* is an important population parameter that, in the ILGR setting, measures the relative strength of local regulation and migration in keeping group sizes close to *n*_{0}. Its appearance in the formulae above can be explained qualitatively as follows. Observing the type of a focal individual tells us something about the composition of its group, not only at the present time, but also in the recent past, since the lineage of the focal must have been in the group for a time of order 1/*m* generations. This implies that the average pay-off in the recent past correlates with the type of the focal. Hence, the current size of the group also correlates with the focal's type through a term of order And this affects the fitness of the focal individual through a term of the same order, as a result of group regulation. The computation of this effect in the electronic supplementary material leads to the term that appears in (3.3) and (3.5). Thus *Q* is an ecological parameter that tells us how much a small change in average fitness of group members in the past affects their current fitness by affecting group size. Note that *λ*_{g} is absent in (3.3) and (3.4). Global regulation affects all the groups in the same way and therefore does not produce correlations between group composition or focal type and group size. Global regulation does play an important role, though, in restraining average group size variability as *p* changes (see §5 of the electronic supplementary material).

To elaborate further on the intuitive meaning of *Q* in ILGR, we assume now that types A are cooperators that provide some costly benefit to the members of the group, in the sense that increases with *x* and When groups with more cooperators are driven to the typical size *n*_{0} mostly by migration, i.e. by producing more emigrants. The average fitness of members of such groups remains larger than 1 in quasi-equilibrium. The frequency of cooperators may increase or decrease in time, depending on how the average values of and compare, and this is precisely what (3.2) and (3.3) entail in this case, as *Q* ≈ 0. We call this regime the ‘Hamilton regime’. But when *λ* is comparable to *m*, a high level of cooperation in a group increases the size of the group to an equilibrium slightly (meaning order *δ*) above *n*_{0}, in which local regulation reduces the effects of further cooperation. In the extreme case groups with more cooperators are larger, but are in an equilibrium produced primarily by group regulation, in which its members have average fitness 1. Cooperators are then selected against as, in groups with both types, their fitness must be less than 1, while that of non-cooperators must be larger than 1. Again, this is precisely what (3.2) and (3.3) entail, as now and hence *F*(*p*) < 0. We call this regime the ‘crowded regime’. (For more on the intuition in this paragraph, see [14].)

We turn now to the population structure 2lFW. We provide a brief description of the model and the results from [13,14] that will be needed and refer the reader to these papers for further details. In 2lFW, generations are again non-overlapping, and the number of groups in each generation is again fixed as *g*. Groups have exactly the same size *n*. In each generation, each group selects independently a group from the previous generation to be its parent group, with probabilities proportional to the average fitness of members of each group in the previous generation. The membership in each group in the new generation is determined then by each member of the group selecting independently a member of the parent group with probabilities proportional to the fitnesses of these individuals. Once reproduction has occurred in this fashion and all the new *g* groups have been created, a fraction *m* of the individuals, chosen at random, is removed from their groups and relocated at random, preserving the size *n* of the groups. We suppose that there are two types, A and N, that the relative fitnesses of the individuals depend on the fraction *x* of types A in the group, being given by and that offspring inherit the type of the parent. (The absolute fitnesses are simply the relative ones divided by the average value of the relative ones in the whole population. This average value has the form and therefore the absolute fitnesses have the form where *w*_{0} is a common value for all individuals in the population in each generation.) In [13], we studied invasion in this setting, under strong or weak selection. In the relevant case for us here, in which groups are large, migration *m* low and selection is weak, we showed that the condition for invasion to occur is precisely (3.5), with *Q* = 0 (see display (3.3), [13]). We also explained in [14] that not only (3.5), but also the more general (3.2), (3.3) holds for 2lFW, with *Q* = 0.

Intuition about why the same beta distribution turns out to be relevant in the case of ILGR and 2lFW is as follows [14]. When one tracks the lineages of the members of a group back in time, in these two distinct settings, one obtains the same coalescent process, and that determines the distribution of *x*, which is known to be beta with parameters *lp* and *lq* [24]. Even when groups are smaller and group size fluctuations are relevant, we observe the beta distributions in computer simulations as good approximations (even with *n*_{0} as small as 10). And even in simulations of population structures in which groups split, or groups become extinct at a low rate and recolonized, we have observed the betas. This can be explained by the fact that the relevant time-scale of the coalescent is given by the typical time 1/*m* needed for a lineage to exit the group. When *m* is small, this time-scale is much longer than the time-scale at which groups fluctuate in size and we obtain the same kind of coalescent, with rates of coalescence of lineages given by averages over the time-scales of the fluctuations in size.

Intuition about why *Q* = 0 in the 2lFW case is as follows. Information about the type * of the focal individual affects our knowledge of its current fitness *w* in two ways. One is the first arrow is mediated by relatedness *R*, and this is the basis of Hamilton's work. The other is where is the average fitness in the group in the recent past. In ILGR, *n* is elastic and this channel is important and is mediated by *Q*. In population structures that have a fixed group size, this channel is absent, or equivalently,
3.6A number of previous readers of this paper felt tempted to take a limit in ILGR, in order to produce a model with ‘infinite rigidity’ meaning fixed group sizes, and were puzzled by the fact that the right-hand side of (3.4) is then converging to −1 rather than to 0. In fact, if one violates the assumed conditions of ILGR, which require *λ* < 1, and considers the case of very large *λ*, one runs into a situation in which becomes an unstable fixed point, and our analysis of ILGR, including (3.4), does not apply.

The two channels discussed above lead to a decomposition with the first term corresponding to the first channel and the second term to the second channel and including the factor *Q*. The discussion above and in [14], and the computation of in the electronic supplementary material suggest that (3.2) and (3.3) should apply to a broader class of population structures, including the possibilities of groups splitting or becoming extinct and being recolonized. The fashion in which *Q* relates to population parameters (migration rate, regulation rates, rates of group extinction, etc.) must be population-structure-dependent (in the same way that the relatedness *R* is). But we conjecture that (3.2) and (3.3) will apply quite broadly, with *R* and *Q* fully summarizing the role of the population structure in affecting the direction and speed of selection.

## 4. Linear public goods game: comparison with some of the related literature

In the case of a linear public goods game, and (3.2) and (3.3) become (see the electronic supplementary material) 4.1

The condition for types A to increase in frequency is then This is deceptively similar to the condition in display (16) of Gardner & West [17], with their parameter *a* in place of our −*Q* (see pages 1711–1713 in that paper for background on that condition and its relation with their display (10)). But the meaning of −*Q* and *a* are completely different, as one can see from the fact that their population structures have a fixed size and therefore the quantity that we call *Q* is 0 in their setting (see (3.6)). Moreover, our *δb* and *δc* are (Hamilton's) fitness costs and benefits (when the population is in quasi-equilibrium, and up to errors of order *δ*^{2}), while theirs are vital rates that relate only indirectly to fitnesses. We explain these two claims next.

In our setting, consider first ILGR with *h*(*s*) = 1. In this case, the absolute fitness (3.1) is given by where the symbol 1_{A} takes the value 1 when * is A and 0 when * is N. This means that the behaviour of each type A increases the absolute fitness of all members of its group (self-included) by *δb*/*n*, and additionally decreases its own fitness by *δc*. In the case of ILGR with arbitrary *h*(*s*), we know from the electronic supplementary material that in quasi-equilibrium and therefore also so that (3.1) becomes justifying again our claim above. Similarly, in the case of 2lFW, we again have absolute fitnesses justifying our claim.

The fact that the parameters *c* and *b* that appear in [17,20], and related models are typically not fitness costs and benefits is well known and is discussed in detail in [18], but since it is a pivotal issue in our discussion, we explain it again in the context of Taylor [20] (similar analysis applies to the more elaborate models of [17], which extend the model of Taylor [20]). The population structure introduced in [20] has fixed-size groups. In each generation, adults produce a very large (ideally infinite) number of offspring, given (in our notation, and with *L* as a large number) by A fraction *m* of the juveniles disperses to randomly chosen groups. Competition among the juveniles in each group eliminates most of them and leaves exactly *n* of them in each group. They grow to adulthood and start the next cycle. A computation of the absolute fitness (e.g. [18]) gives This means that the behaviour of each type A increases the absolute fitness of all members of its group (self-included) by *δc*/*n*, and additionally decreases its own fitness by *δc*. The parameter *b* has no effect on fitness. This is very different from the situation in our setting, as discussed above. In [20], *Q* = 0, since groups have fixed size, but types A will be eliminated by selection whenever *R* < 1, as

In [14], we had referred to the regime of ILGR in which *Q* is close to −1, as the ‘Taylor regime’ because of similarities between the results of Taylor [20] discussed above, and what happens in our setting, in this regime. (In our case, we also have from (4.1), but note that the cancellation of *b* occurs not at the level of the computation of fitness components, but only at the level of the computation of Δ*p*.) We were aware of the important differences stressed above, but considered that name as appropriate if the differences were also kept in mind. Remarks by the referees convinced us that that name would rather produce confusion, and we changed it here to ‘crowded regime’. In this regime, types A find themselves in groups in which the effect of their ancestors in the same group increased the group density (made it crowded), so that local regulation now cancels the fitness effect that they still have at the current time on their group members. The cancellation that occurs here happens through the effects of past types A on the local ecology (crowding), while in [20] the cancellation is in the computation of the effects of the behaviour of the current types A on fitness.

The referees recommended that we compare our approach with that introduced in [21], and further applied in [22]. There are several distinctions to make. First, Rousset & Ronce [21] analyse the evolution in time not of *p*, but of a weighted average of reproductive values of the types A an N. The direction in which this quantity changes is also an indication of the direction in which *p* varies, but its analysis is, in principle, harder. In [21], the quantity denoted by *S* plays a role similar to our *F*(*p*), giving the direction of selection. In displays (23), (24) and (25) of that paper, they provide a partition that is worth comparing with our partition In addition to being partitions of different quantities, their reflects future effects on the fate of the types A and N, due to distinct reproductive values of offspring in different groups in the next generation. In contrast, our reflects the effects from the past actions of types A and N on their current differences in absolute fitnesses. In other words, we are considering different objects, and partitioning them in ways that are conceptually different. Second, in [21,22], the assumption of additive gene action is made, which restricts the pay-off function to that of the linear public goods game. (Technically, this assumption is made as the assumption that fitness functions are differentiable, an important restriction that we discuss in detail in [25].) As our main interest in this paper is in iterated games with behaviour contingent on threshold number of participants, gene action across individuals is non-additive. For instance, the fitness effects of having 20% of types A in a group are not necessarily twice that of having 10% of types A in the group. Therefore, we needed to develop a methodology that would not require additive gene action. And this flexibility in our methodology ( is arbitrary in (3.3)) is indeed one of its qualities. Third, and perhaps even more important in the context of the comparison with [21,22], Lehmann *et al*. [22, p. 1142] concede that in situations in which group size is variable, they cannot compute explicitly, and rather analyse only its sign. One of the most relevant contributions in the current paper is the explicit computation of both terms in *F*(*p*), with the extra-Hamilton one yielding the factor *Q*, that we explicitly computed as (3.4) in the ILGR population structure that includes group size variability as a fundamental ingredient. The very simple expression that we obtained for *Q* clarifies the competitive effects of local group size regulation and migration in a quantitative and transparent fashion.

## 5. Invasion in iterated games

In this section, we will apply (3.5) to the pay-offs discussed in §2. This means that if and if if and if Applying these to (3.5), the condition for invasion is
5.1where The integrals in the above equation are relatively simple, since However, instead of integrating, we will exploit the fact that when relatedness *R* is low, the exponent is large (the situation to keep in mind is which implies ), so that this density function decreases rapidly with *x*. This means that the integrals put much more weight on the values of when *x* is close to the left end of the integration interval than on the values when *x* is further to the right. To use this observation, care has to be taken with the normalization when we restrict the distribution to an interval. For this purpose, we define as the average value of with respect to the beta distribution conditioned to being in the interval [*a*, *b*]. The conditional probability density, properly normalized, is and the steep decrease of (1 − *x*)* ^{l}* as

*x*grows, implies that (The rigorous statement is .) Motivated by these observations, we rewrite (5.1) as 5.2and then use the approximation just discussed, and the fact that to replace it with the approximate condition 5.3

We now assume, as we did in §2, that *v*_{0} < 0. Then (5.3) implies that invasion requires and *T* sufficiently large:
5.4and
5.5

When the pay-offs are given, (5.4) becomes a condition on *Q*. It is interesting to observe that (5.4) can be satisfied even with a negative provided that *Q* < 0 and is sufficiently negative. This situation characterizes spiteful behaviour by the cooperators, in which at a cost to themselves the cooperators harm the non-cooperators. We defer a detailed analysis of this case, observing only that in this situation, the more negative *Q* is, the better for the spread of cooperators. We assume from now on, as we did in §2, that If then when *Q* = 0, we have so that (5.4) is satisfied, and when we have and (5.4) is also satisfied. And if then the square bracket in (5.4) is positive and this condition becomes
5.6

In summary, assuming and invasion happens when 5.7

The impossibility of invasion when and extends the crowded regime. And when or the condition on *T* provides intuition on when the underlying game, the threshold *θ* and the relatedness *R* as well as *Q* (those two being the only inputs determined by the population structure) allow for invasion. The effect of *Q* in this inequality is restricted to the presence of rather than there. To understand its effect, observe (see (5.4)) that is a linear function of *Q* in the interval [*Q _{c}*, 0], which takes the value 0 at

*Q*and the value at 0. This means that if

_{c}*Q*is close to

*Q*(only possible if ), then

_{c}*T*will have to be very large for cooperation to invade. But if

*Q*is far from

*Q*, the effect of

_{c}*Q*on the order of magnitude of the needed

*T*is small. For instance, if then and having in (5.5) instead of amounts at most to a factor of about 1.1 in the needed

*T*.

If and are of similar order of magnitude, then the order of magnitude of *T* will be given by the factor which does not depend at all on the underlying game that is iterated. This factor depends on the population structure only through the level of relatedness *R*, which provides and on the threshold *θ*. It is also very sensitive to these two inputs, as the following examples show. To explore the effect that *θ* has on suppose that *R* = 0.07, which yields Then with , we have with we have with we have and with we have And to explore the effect of *R* on suppose that Then with *R* = 0.02, we have and with we have and with we have and and with we have and When additional intuition on the dependence of on *R* and can be obtained from the approximation For arbitrary , this becomes an inequality, The exponential form of the dependence of on explains the strong sensitivity to the values of *R* and *θ* illustrated by the numerical examples above. This is the reason why invasion is relatively easy when *θ* is low and very hard when *θ* is large.

In the electronic supplementary material, we computed the integrals in (5.1) for two kinds of pay-off functions: step functions and linear public goods games. And we used the resulting detailed formulae to analyse the conditions under which the approximation in (5.3) and (5.7) is good.

We have performed numerical simulations, covering a range of parameters (table 2). In these, (5.7) provides a value of *T* that is reasonably close to the empirical results. One should keep in mind that when *p* is very low (invasion conditions), drift is a powerful force competing with selection and adding randomness to the evolution. This noise is reflected in the lesser accuracy of the predictions of the critical *T* as compared to the predictions in table 1, even when we used the full theory (3.3) based on the beta distribution. Considering the biological reality of drift as a source of significant noise when an invading gene still has low frequency (e.g. [24], ch. 4), the level of agreement in table 2 is reassuring of the value of the theory even in these extreme conditions.

## 6. Discussion

Inequality (5.3) and the more detailed (5.7) give simple approximations for the conditions necessary for contingent cooperative strategies to increase when rare. They do not depend on the form of the underlying fitness function, but do depend on various assumptions. (i) Contingent strategies that support cooperation lead individuals to continue cooperating if the number of cooperators exceeds a threshold. (ii) The assortment necessary for cooperative strategies to increase when rare results from an elastic island model population structure [14], (§4), or from groups competing for the production of new groups [13,17,18], or from other population structures for which (3.5) holds, with a constant *Q* that depends on the population structure. We conjecture that this is the case for many population structures (see §3 and [14]). (iii) The derivation of (5.3) and (5.7) also depend on the assumption that groups are very large, migration rates are low, and relatedness in groups is also low. However, numerical calculations suggest that they also give useful approximations in a wide and reasonable range of parameters.

The simple rule given by (5.7) provides a number of insights. It shows the relative importance of the population structure (through *R* and *Q*), of the pay-offs, of the threshold *θ* and of the number of iterations *T*.

It suggests that the evolution of contingent cooperation is very sensitive to relatedness (*R*) and to the threshold number of iterations necessary for cooperation to persist (*θ*). As long as and the fitness parameters and are of comparable order, the order of magnitude of the threshold number of interactions necessary for contingent cooperation to increase will be mainly determined by which depends only on *θ* and *R*. When *θ* is large, under realistic levels of relatedness, invasion will require unreasonably large numbers of iterations. For instance, if and then On the other hand, when *θ* is small, invasion can occur at very low levels of relatedness. For instance, if relatedness is 0.02, and *θ* = 0.1, then

This sensitive dependence of the required level of relatedness on *θ* suggests that the high levels of cooperation observed in humans are more likely to have evolved by contingent punishment than by contingent cooperation. Costly contingent punishment that persists at a low threshold *θ* can invade much more easily than costly contingent cooperation that persists only at a high threshold *θ*. But even a small fraction of punishers in a group can induce massive group cooperation. In the model presented in [12], individuals punish non-cooperators if enough other individuals in the group are also willing to punish non-cooperators. Because even a modest fraction of punishers can motivate others to cooperate, such contingent punishment strategies can increase when rare at relatively low levels of relatedness and still stabilize cooperation at a high level. Strategies that continue cooperating even when only a small fraction of others cooperate typically reach a stable polymorphic equilibrium in which the population displays a mix of cooperative and non-cooperative strategies [7,8]. Strategies that tolerate more defectors achieve lower frequencies of cooperators at equilibrium. Thus contingent strategies that behave altruistically when a small fraction of the group also behave altruistically can support ongoing cooperation, but will produce equilibria in which most individuals in the group do not contribute. Cooperation of this kind is observed. For example, in the United States, public radio is supported by voluntary contributions by a small fraction of listeners—most free ride. However, such strategies cannot support the widespread cooperation observed in many contexts. For example, virtually all Turkana men participate in warfare, even though the Turkana lack formal coercive institutions [26]. Our result is consistent with the idea that such widespread cooperation is supported by coordinated punishment of non-cooperators, by individuals that are willing to persistently punish non-cooperators at a cost to themselves, provided a small threshold number of punishers is achieved in the group.

## Competing interests

We declare we have no competing interests.

## Funding

We received no funding for this study.

## Acknowledgements

We are especially grateful to Renato Vicente for his collaboration in earlier stages of this project, and to Maciek Chudek for computer simulations that helped convince us that beta distributions are expected even when groups fluctuate substantially in size, and even when they split. We are also grateful to them and to Clark Barrett, Marek Biskup, Sam Bowles, Nestor Caticha, Daniel Fessler, Kevin Foster, Willem Frankenhuis, Herb Gintis, Bailey House, Anne Kandler, Laurent Lehmann, Glauco Machado, Sarah Mathew, Diogo Meyer, Cristina Moya, Peter Nonacs, Karthik Panchanathan, Susan Perry, Joan Silk, Jennifer Smith, Jeremy van Cleve and Ming Xue for nice conversations and feedback on various aspects of this one and/or related project. We also thank two referees for their comments and suggestions.

## Footnotes

One contribution of 18 to a theme issue ‘The evolution of cooperation based on direct fitness benefits’.

- Accepted September 28, 2015.

- © 2016 The Author(s)